discourse-legacysite-perl/site/slowtwitch.com/cgi-bin/articles/GT/SQL/Tree.pm

# ==================================================================
# Gossamer Threads Module Library - http://gossamer-threads.com/
#
#   GT::SQL::Table
#   Author: Jason Rhinelander
#   CVS Info : 087,071,086,086,085
#   $Id: Tree.pm,v 1.30 2008/06/11 06:55:26 brewt Exp $
#
# Copyright (c) 2004 Gossamer Threads Inc.  All Rights Reserved.
# ==================================================================
#
# Description:
#   Class used to manage a tree structure in a table.
#
#
# The comments through this document reference "record hash refs" -
# a record hash ref consists of 5 keys:
#    - tree_id_fk     => the ID
#    - tree_anc_id_fk => the ancestor ID
#    - tree_dist      => The 'distance' between the id and the ancestor. If the
#                      ancestor is the father, this is 1; for the grandfather, 2
#
# Most things have a common return, which looks like this:
#     { id => [{ record }, { record2 }, { record3 }], id2 => [], ... }
# Where id, id2, ... are the ID's you pass in, and record, record2, record3, ...
# are the record hash refs mentioned above with the relationship requested (parents,
# children, siblings, etc.)
#
package GT::SQL::Tree;
# ===============================================================
use strict;
use GT::SQL::Condition;
use GT::SQL::Base;
use GT::SQL::Table;
use GT::AutoLoader;
use vars qw/$DEBUG $VERSION $ERROR_MESSAGE @ISA $AUTOLOAD/;

use constants
    TREE_COLS_ROOT   => 0,
    TREE_COLS_FATHER => 1,
    TREE_COLS_DEPTH  => 2;

@ISA           = qw/GT::SQL::Base/;
$DEBUG         = 0;
$VERSION       = sprintf "%d.%03d", q$Revision: 1.30 $ =~ /(\d+)\.(\d+)/;
$ERROR_MESSAGE = 'GT::SQL';

sub new {
    my $this = shift;
    my $opts = $this->common_param(@_) or return $this->error(BADARGS => FATAL => '$obj->new({ table => $table_obj })');

    my $self = bless {}, $this;

    $self->{table} = $opts->{table} or return $self->error(BADARGS => FATAL => '$obj->new({ table => $table_obj })');

    $self->{connect} = $self->{table}->{connect};

    $self->{_debug} = $opts->{debug} || $DEBUG || 0;

    my $tree_table = $self->{table}->name . "_tree"; # ->name returns the table _prefixed_
    my $name = $self->{connect}->{def_path} . '/' . $tree_table . '.def';
    -e $name or return $self->error(FILENOEXISTS => FATAL => $name);

    $tree_table = $self->new_table($tree_table);

    $self->{tree} = $tree_table;

    return $self;
}

sub DESTROY {}

$COMPILE{create} = __LINE__ . <<'END_OF_SUB';
sub create {
# -----------------------------------------------------------
#   GT::SQL::Tree->create(...)
#   Create a new table, $tablename . "_tree".
#   The arguments are as follows:
#       table => $table_obj, # This is the table object the tree is to be built upon.
#       father => 'father_id_fk', # The column in the table that contains the father ID. It must already exist.
#       root => 'root_id_fk', # The column in the table that contains the root ID. It must already exist.
#       depth => 'rec_depth', # The column in the table that keeps track of the depth (below the root) of the record.
#
#       Optional arguments:
#           force => 'force', # Specifies to argument to GT::SQL::Creator->create. Typically, 'force' or 'check'.
#           debug => $debug_level, # Specifies to debug level for the GT::SQL::Tree object.
#           rebuild => $rebuild, # A GT::SQL::Tree::Rebuild object
#   You'll get back a GT::SQL::Tree object, just as if you had called new() for
#   a tree that already existed.
#
#   The new table created will have the following keys:
#       tree_id_fk     : A foreign key to the primary key of the table passed in
#       tree_anc_id_fk : Also a foreign key to the primary key, this one stores an ancestor of id_fk
#       tree_dist      : This stores the distance (levels) between the ID and the ancestor.
#
#   To give an example of how this will all look, let's say we have a structure like this:
#       a
#       - b
#       - c
#         - d
#           - e
#   Where b and c are children of a, d is a child of c, and e is a child of d.
#   There will be the normal records, one per element. So, the main table looks
#   like this:
#
#       +-------+------+--------------+------------+-----------+
#       | pk_id | name | father_id_fk | root_id_fk | rec_depth |
#       +-------+------+--------------+------------+-----------+
#       |     1 | a    |            0 |          0 |         0 |
#       |     2 | b    |            1 |          1 |         1 |
#       |     3 | c    |            1 |          1 |         1 |
#       |     4 | d    |            3 |          1 |         2 |
#       |     5 | e    |            4 |          1 |         3 |
#       +-------+------+--------------+------------+-----------+
#
#   For this example, the associated tree table will look like this:
#
#       +------------+----------------+-----------+
#       | tree_id_fk | tree_anc_id_fk | tree_dist |
#       +------------+----------------+-----------+
#       |          2 |              1 |         1 |
#       |          3 |              1 |         1 |
#       |          4 |              3 |         1 |
#       |          4 |              1 |         2 |
#       |          5 |              4 |         1 |
#       |          5 |              3 |         2 |
#       |          5 |              1 |         3 |
#       +------------+----------------+-----------+
#
#   This format allows GT::SQL::Tree to easily (one simply query) select all
#   descendants or ancestors given an ID.
#
#   Calling ->create() on a table with data may take quite some time as it will
#   create a tree for that table.  You can, however, use this to recreate the
#   tree for a particular table.
#
    my $class = shift;
    my $input = $class->common_param(@_) or return $class->error(BADARGS => FATAL => 'GT::SQL::Tree->create(HASH or HASH REF)');

    my $self = {};

    bless $self, ref $class || $class;
    $self->{_debug} = $input->{debug} if $input->{debug};

    my $table = $input->{table};
    $table and $table->name or return $self->error(BADARGS => FATAL => 'GT::SQL::Tree->create(..., table => $table_obj, ...)');
    $input->{father}        or return $self->error(BADARGS => FATAL => 'GT::SQL::Tree->create(..., father => \'father_col\', ...)');
    $input->{root}          or return $self->error(BADARGS => FATAL => 'GT::SQL::Tree->create(..., root => \'root_col\', ...)');
    $input->{depth}         or return $self->error(BADARGS => FATAL => 'GT::SQL::Tree->create(..., depth => \'depth_col\', ...)');

    $self->{connect} = $table->{connect};

    $table->pk and @{$table->pk} == 1 or return $self->error(TREEBADPK => FATAL => $table->name);

    # If a rebuild object was passed in, let it do its stuff.
    if ($input->{rebuild}) {
        $input->{rebuild}->_rebuild($table->pk->[0], @$input{qw/root father depth/});
    }

    my $tree = $table->name . "_tree";

    my $c = $self->creator($tree);

    $c->cols([
        tree_id_fk     => { type => 'INT', unsigned => 1, not_null => 1, form_display => 'ID' },
        tree_anc_id_fk => { type => 'INT', unsigned => 1, not_null => 1, form_display => 'Ancestor' },
        tree_dist      => { type => 'INT', unsigned => 1, not_null => 1, form_display => 'Distance' }
    ]);

    my $table_name = $table->name();
    $table_name =~ s/^\Q$self->{connect}->{PREFIX}\E//;
    my $pk = $table->pk()->[0];
    $c->fk({
        $table_name => { tree_id_fk => $pk, tree_anc_id_fk => $pk }
    });

    $c->subclass({
        relation => { "${table_name}\0${table_name}_tree" => 'GT::SQL::Tree::Relation' }
    });

    my $tree_i_prefix = lc substr($table_name, 0, 4);

    $c->index({
        "${tree_i_prefix}_tri" => ['tree_id_fk'],
        "${tree_i_prefix}_tra" => ['tree_anc_id_fk', 'tree_dist']
    });

    $c->{table}->{schema}->{tree_cols}->[TREE_COLS_ROOT]   = $input->{root};
    $c->{table}->{schema}->{tree_cols}->[TREE_COLS_FATHER] = $input->{father};
    $c->{table}->{schema}->{tree_cols}->[TREE_COLS_DEPTH]  = $input->{depth};

    $self->debug("Creating tree table '$tree'") if $self->{_debug};
    my $ok = $c->create($input->{force} || 'force');

    if (!$ok) {
        if ($GT::SQL::errcode eq 'TBLEXISTS') {
            $c->set_defaults();
            $c->save_schema();
        }
        else {
            return;
        }
    }

    $table->fk($table_name => { $input->{father} => $pk });

    $table->{schema}->{tree} = 1;
    $self->debug("Saving tree existance in parent schema") if $self->{_debug};
    $table->save_state();
    $self->{table} = $table;
    $self->{tree} = $self->new_table($tree);

    return $self unless $ok and $table->count(); # $ok will be false if we were instructed NOT to overwrite the table

    # Uh oh, this is fun... it means we have to create the tree from the existing table.
    $self->debug("$table_name already has rows; building new tree table data") if $self->{_debug};
    $self->{tree}->delete_all();

    my ($root_col, $depth_col, $father_col) = ($self->root_id_col, $self->depth_col, $self->father_id_col);

    my $top = $table->select("MAX($pk)")->fetchrow;
    my $count = $table->count();
    my $roots = $table->count($root_col => 0);
    $self->debug("Building ancestor tree ...") if $self->{_debug};
    my ($j, %parents, %depth); # %parent = ( id => [parents], id => [parents], ... ), %depth = ( $id => $depth, $id => $depth, ... )

    for (my $i = 0; $i < $top; $i += 500) { # Get 500 threads at a time
        $table->select_options("ORDER BY $root_col, $depth_col");
        my $cond = GT::SQL::Condition->new($root_col => '>' => $i, $root_col => '<=' => $i + 500);

        my $sth = $table->select($pk, $root_col, $father_col, $depth_col => $cond);

        my $last_root = 0;
        %parents = ();
        while (my ($id, $root, $parent, $depth) = $sth->fetchrow) {
            if ($parent == $root) {
                $parents{$id} = [$parent];
            }
            else {
                $parents{$id} = [@{$parents{$parent} || []}, $parent];
            }
            $depth{$id} = $depth;
            $self->debug("Processed $j records...") if $self->{_debug} and (++$j % 5000) == 0;
        }
        my @inserts;
        if (keys %parents) {
            for my $id (keys %parents) {
                for my $anc (@{$parents{$id}}) {
                    push @inserts, [$id, $anc, $depth{$id} - ($depth{$anc} || 0)];
                }
            }
        }

        $self->{tree}->insert_multiple(['tree_id_fk', 'tree_anc_id_fk', 'tree_dist'], @inserts) if @inserts;
    }

    $self->debug("$j non-root nodes found.") if $self->{_debug};

    return $self;
}
END_OF_SUB

$COMPILE{destroy} = __LINE__ . <<'END_OF_SUB';
sub destroy {
# -----------------------------------------------------------
#   $obj->destroy
#   Drops the tree for the table of the current object.

    my $self = shift;
    my $c = $self->creator($self->{table}->name . "_tree");

    $c->drop_table;

    delete $self->{table}->{schema}->{tree};
    $self->{table}->save_state();

    return 1;
}
END_OF_SUB

sub root_id_col {
# -----------------------------------------------------------
#   $tree->father_id_col
#   Returns the father_id column. Takes no arguments.
    shift->{tree}->{schema}->{tree_cols}->[TREE_COLS_ROOT];
}

sub father_id_col {
# -----------------------------------------------------------
#   $tree->father_id_col
#   Returns the father_id column. Takes no arguments.
    shift->{tree}->{schema}->{tree_cols}->[TREE_COLS_FATHER];
}

sub depth_col {
# -----------------------------------------------------------
#   $tree->father_id_col
#   Returns the father_id column. Takes no arguments.
    shift->{tree}->{schema}->{tree_cols}->[TREE_COLS_DEPTH];
}

$COMPILE{insert} = __LINE__ . <<'END_OF_SUB';
sub insert {
# -----------------------------------------------------------
#   $tree->insert(insert_id => $inserted_id, data => $insert_hash);
#   This will insert the approriate record into the tree table.
#   $inserted_id should be the insert_id of the new record and
#   $insert_hash should contain at least the father, root, and
#   depth columns.
#   The number of rows inserted into the tree table is returned
#   on success. Note that 0 is returned as 0e0 for a root.
    my $self = shift;
    my $input = $self->common_param(@_) or return $self->error(BADARGS => FATAL => '$tree->insert(HASH or HASH_REF)');

    my $table = $self->{tree} or return $self->error(NOTREEOBJ => FATAL => '$tree->insert()');

    my $insert_id = $input->{insert_id};
    my $data = $input->{data};

    my $f = $self->father_id_col;

    return "0e0" unless my $fid = $data->{$f}; # If there is no father, it's a root, so we don't do anything.

    my $parents = $self->parents(id => $fid);

    push @$parents, { tree_id_fk => $fid, tree_anc_id_fk => $fid, tree_dist => 0 }; # tree_id_fk isn't used, and dist will have one added to it to get the node-father row

    my @insertions;
    for (@$parents) {
        my ($anc, $depth) = @$_{'tree_anc_id_fk', 'tree_dist'};

        push @insertions, [$insert_id, $anc, $depth + 1];
    }
    $table->insert_multiple(['tree_id_fk', 'tree_anc_id_fk', 'tree_dist'], @insertions);

    return scalar @insertions;
}
END_OF_SUB

$COMPILE{pre_update} = __LINE__ . <<'END_OF_SUB';
sub pre_update {
# -----------------------------------------------------------------------------
#   $tree->update(where => $condition, data => $update_hash);
#   $update_hash should contain the father_id column. This should only be
#   called (by GT::SQL::Table) when an update occurs that changes the
#   father_id. $update_hash must be the hash reference that will be used for
#   the update because it is going to be changed for the root and depth fields.
#   You're going to get back some sort of data structure from this (subject to
#   change). Pass the data structure into "update" after the update occurs
#   successfully.
    my $self = shift;
    my $input = $self->common_param(@_) or return $self->error(BADARGS => FATAL => '$tree->update(HASH or HASH REF)');

    my $update_hash = $input->{data} or return $self->error(BADARGS => FATAL => '$tree->update(... data => $update_hash ...)');

    my $where = $input->{where} or return $self->error(BADARGS => FATAL => '$tree->update(... where => $condition ...)');

    my ($pk, $r, $f, $d) = ($self->{table}->pk()->[0], $self->root_id_col, $self->father_id_col, $self->depth_col);

    my $new_father = $input->{data}->{$f};
    my ($table, $tree) = ($self->{table}, $self->{tree});
    my %ids = $self->{table}->select($pk, $d => $where)->fetchall_list;
    if ($new_father and exists $ids{$new_father}) {
        # Cannot update a row to be a child of itself
        return $self->error(TREEFATHER => 'WARN');
    }
    # keys %ids are the ID's of the records being moved. The values are the depth BEFORE moving.
    my $old_parents = $self->parent_ids(id => [keys %ids]);
    my $children = $self->child_ids(id => [keys %ids], include_dist => 1);

    my $delete_cond;
    for my $parent (keys %ids) {
        my @p = @{$old_parents->{$parent}};
        my @c = keys %{$children->{$parent}};
        for (@c) {
            if ($_ == $new_father) {
                # We can't update a row to be a child of it's children
                return $self->error(TREEFATHER => 'WARN');
            }
        }

        next unless @p; # If there aren't any old parents, this record already is a root and isn't changing.

        $delete_cond ||= GT::SQL::Condition->new('OR');

        $delete_cond->add(
            GT::SQL::Condition->new(
                tree_anc_id_fk => IN => \@p,
                tree_id_fk => IN => [$parent, keys %{$children->{$parent}}]
            )
        );
    }

    my ($new_depth, $new_root_id, $update, @insert) = (0, 0);
    if ($new_father) {
        my %new_parents = ($new_father => 0, %{$self->parent_ids(id => $new_father, include_dist => 1)});
        my %insert_seen;
        for my $new (sort { $ids{$b} <=> $ids{$a} } keys %ids) {
            for my $new_child ($new, keys %{$children->{$new}}) {
                next if $insert_seen{$new_child}++; # If it's already seen, it means it's already been handled. This can occur when moving both a child and parent to be children of a new node - the child will be a sibling of its old parent
                for my $new_anc (keys %new_parents) {
                    my $child_dist = $new_child == $new ? 0 : $children->{$new}->{$new_child};
                    push @insert, [$new_anc, $new_child, $new_parents{$new_anc} + 1 + $child_dist] unless $insert_seen{"$new_anc\0$new_child"}++;
                }
            }
        }

        ($new_depth, $new_root_id) = $self->{table}->select($d, $r => { $pk => $new_father })->fetchrow;
        $new_root_id ||= $new_father;
        $new_depth++;

        my %seen;
        push @$update, { set => { $r => $new_root_id }, where => { $pk => [grep !$seen{$_}++, keys %ids, map { keys %{$children->{$_}} } keys %$children] } };
    }
    else {
        $update_hash->{$r} = 0;
        my %seen;
        for (sort { $ids{$b} <=> $ids{$a} } keys %ids) {
            push @$update, { set => { $r => $_ }, where => { $pk => [grep !$seen{$_}++, keys %{$children->{$_}}] } };
        }
    }

    my ($delta, %updates, %seen);
    for my $parent (sort { $ids{$b} <=> $ids{$a} } keys %ids) {
        $delta = $new_depth - $ids{$parent};
        next if !$delta or $seen{$parent}++;
        push @{$updates{$delta}}, $parent;
        for (keys %{$children->{$parent}}) {
            unless ($seen{$_}++) {
                $self->debug("Adjusting depth of $_ by $delta") if $self->{_debug};
                push @{$updates{$delta}}, $_;
            }
        }
    }

    for my $delta (keys %updates) {
        push @$update, { set => { $d => \"$d + $delta" }, where => { $pk => $updates{$delta} } };
    }

    return { delete => $delete_cond, insert_multiple => [[qw/tree_anc_id_fk tree_id_fk tree_dist/], @insert], update => $update };
}
END_OF_SUB

$COMPILE{update} = __LINE__ . <<'END_OF_SUB';
sub update {
# ---------------------------------------------------------
# This basically executes whatever is decided above. pre_update
# is where everything important is decided.
    my $self = shift;
    my $input = shift; # This should be whatever pre_update returned.
    if ($input->{delete}) {
        $self->debug("Deleting now-invalid tree records") if $self->{_debug} >= 1;
        $self->{tree}->delete($input->{delete});
    }
    if ($input->{insert_multiple} and @{$input->{insert_multiple}} >= 2) {
        $self->debug("Inserting new tree records required") if $self->{_debug} >= 1;
        $self->{tree}->insert_multiple(@{$input->{insert_multiple}});
    }
    if ($input->{update}) {
        $self->debug("Updating tree depths required after an update") if $self->{_debug} >= 1;
        for (@{$input->{update}}) {
            $self->{table}->update($_->{set}, $_->{where});
        }
    }
}
END_OF_SUB

sub children {
# -----------------------------------------------------------
#   $tree->children(id => [$pkval1, $pkval2, ...], max_depth => $max_depth)
#
    my $self = shift;
    my $input = $self->common_param(@_) or return $self->error(BADARGS => FATAL => '$tree->children(HASH or HASH_REF)');

    my $ids = $input->{id};
    my $ref = ref $ids;
    $ids = [$ids] if defined $ids and not ref $ids;
    $ids and @$ids or return $self->error(TREENOIDS => FATAL => '$tree->children()');
    for (@$ids) {
        $ids = 0 if not $_;
    }

    my $parent = $self->{table}->name();
    my $tree = $self->{tree}->name();
    my $roots_only = $input->{roots_only};
    $roots_only = 1 if not $ids;
    my ($select_from, $left_join);
    if ($roots_only and ref $input->{select_from}) {
        $select_from = $input->{select_from};
        $left_join = $input->{left_join};
    }
    elsif ($ids and !$roots_only) {
        $select_from = $self->{table}->new_relation($parent, $tree);
    }
    else {
        $select_from = $self->{table};
    }

    my $max_depth = $input->{max_depth};
    my $root_col = $self->root_id_col;
    my $depth_col = $self->depth_col;
    my $father_col = $self->father_id_col;
    my $pk = $self->{table}->pk()->[0];
    my $cond;

    my $sort_col   = $input->{sort_col}   || [];
    my $sort_order = $input->{sort_order} || [];
    $sort_col   = [$sort_col]   if $sort_col   and not ref $sort_col;
    $sort_order = [$sort_order] if $sort_order and not ref $sort_order;
    my $sort_col_saved = [@$sort_col];
    my $order_by;
    if ($sort_col) {
        if (@$sort_order) {
            for (0 .. $#$sort_col) {
                last if $_ > $#$sort_order;
                $sort_col->[$_] .= " $sort_order->[$_]" if $sort_order->[$_];
            }
        }
        $order_by = "ORDER BY " . join ", ", @$sort_col if @$sort_col;
    }

    if ($input->{condition} and UNIVERSAL::isa($input->{condition}, 'GT::SQL::Condition')) {
        $cond = new GT::SQL::Condition;
        $cond->add($input->{condition});
    }
    my %roots_order; # We might need this, if using the roots_order_by option.
    if ($ids) {
        $cond ||= new GT::SQL::Condition;
        if ($roots_only) {
            $cond->add("$parent.$root_col" => IN => $ids);
            $cond->add("$parent.$depth_col" => '<=' => $max_depth) if $max_depth;
        }
        else {
            $cond->add("$tree.tree_anc_id_fk" => IN => $ids);
            $cond->add("$tree.tree_dist" => '<=' => $max_depth) if $max_depth;
        }
    }
    else {
        if ($roots_only and $input->{limit}) {
            # The following only applies when a limit is being used - otherwise, everything will be returned.
            my $c = new GT::SQL::Condition;
            $c->add($cond) if $cond;
            $c->add($root_col => '=' => 0);

            if ($input->{roots_order_by}) {
                $self->{table}->select_options('ORDER BY ' . $input->{roots_order_by});
            }
            else {
                $self->{table}->select_options($order_by);
            }
            $self->{table}->select_options("LIMIT $input->{limit}");

            my @roots = $self->{table}->select($pk => $c)->fetchall_list;
            if ($input->{roots_order_by}) {
                my $r;
                %roots_order = map { ($_ => $r++) } @roots;
            }
            my @children = $self->{table}->select($pk => { $root_col => \@roots })->fetchall_list;
            $cond ||= new GT::SQL::Condition;
            $cond->add("$parent.$pk" => IN => [@roots, @children]);
        }
        $cond ||= new GT::SQL::Condition;
        $cond->add("$parent.$depth_col" => '<=' => $max_depth) if $max_depth;
    }

    my $get_cols = $input->{cols};
    $get_cols = [$get_cols] if $get_cols and not ref $get_cols;
    if ($get_cols) {
        my ($found_root, $found_father, $found_depth, $found_anc);
        for (@$get_cols) {
            last if $found_root and $found_father and $found_depth;
            $found_anc++ if not $found_anc and $_ eq 'tree_anc_id_fk';
            $found_root++ if not $found_root and $_ eq $root_col;
            $found_depth++ if not $found_depth and $_ eq $depth_col;
            $found_father++ if not $found_father and $_ eq $father_col;
        }
        push @$get_cols, $root_col if not $found_root;
        push @$get_cols, $depth_col if not $found_depth;
        push @$get_cols, $father_col if not $found_father;
        push @$get_cols, 'tree_anc_id_fk' unless $found_anc or $roots_only;
        push @$get_cols, 'tree_dist' unless $roots_only;
    }

    $select_from->select_options($order_by) if $order_by;
    my $sth = $select_from->select($left_join ? ('left_join') : (), $get_cols || (), $cond || ());

    my $return = $self->_sort($sth, !$ids, $roots_only, (keys %roots_order ? \%roots_order : ()));

    if ($ids) {
        for (@$ids) {
            $return->{$_} ||= [];
        }
    }
    return $ref ? $return : $return->{$ids ? $ids->[0] : 0};
}

sub _sort {
# -----------------------------------------------------------
#   Used internally. Sorts an array ref of hash refs into the
#   proper order for a tree.
    my ($self, $sth, $from_root, $roots_only, $rp) = @_;
    my $pk = $self->{table}->pk()->[0];
    my $root_col = $self->root_id_col;
    my $depth_col = $self->depth_col;
    my $father_col = $self->father_id_col;
    my (@recs, %children, %root_pos, $r);
# When we're done this first part, @recs and %children will look like:
#
# @recs = (
#     [$thread1_immediate_child1, $thread1_immediate_child2, ...],
#     [$thread2_immediate_child1, $thread2_immediate_child2, ...],
#     ...
# );
# %children = (
#     $ancestor_id => {
#         $child_level_1_rec_1_id => [$child_level_2_rec_1, $child_level_2_rec_2, ...],
#         $child_level_1_rec_2_id => [$child_level_2_rec_1, $child_level_2_rec_2, ...],
#         ...
#     },
#     $ancestor_id => { ... },
#     ...
# )
#
# Each element in @recs contains the immediate children of a requested base row
# (often a root, but not necessarily).  Root positions are stored in %root_pos,
# so that all appropriate rows of a tree are grouped together.
#
# The $ancestor_id in %children is the requested ID.  If requesting just roots,
# this is the root ID, otherwise it is the ancestor ID.
#
# To determine the final list, each element will have its children placed
# immediately after itself in a recursive-like way, though not implemented here
# with recursion.
#
# Also note that duplicates are possible, when a requested "root" is really a
# child/descendant of another requested root.

# $anc_col is how a thread relates; typically this is the root_id, but isn't
# required to be when not using roots_only.
    my $anc_col = $roots_only ? $root_col : 'tree_anc_id_fk';

    while (my $rec = $sth->fetchrow_hashref) {
        if (not exists $root_pos{$rec->{$anc_col} || $rec->{$pk}}) { # We haven't encountered this root yet.
            $root_pos{$rec->{$anc_col} || $rec->{$pk}} = $from_root ? 0 : $r++;
        }
        if ($roots_only) {
            push @{$children{$rec->{$anc_col}}->{$rec->{$father_col}}}, $rec
                if $rec->{$anc_col};

            push @{$recs[$root_pos{$rec->{$anc_col} || $rec->{$pk}}]}, $rec
                if $rec->{$depth_col} == ($from_root ? 0 : 1);
        }
        else {
            if ($rec->{tree_dist} > 1) {
                push @{$children{$rec->{$anc_col}}->{$rec->{$father_col}}}, $rec;
            }
            else {
                push @{$recs[$root_pos{$rec->{$anc_col}}]}, $rec;
            }
        }
    }

    my @sorted;
# The goal here is to make @sorted look like this:
# @sorted = (
#     [$reply1, $reply2, ...],
#     [$reply1, $reply2, ...],
#     ...
# );
# Each array ref corresponds to one tree. Note that $reply1 could be a root, not a reply :)

# The mess below properly sorts out a thread, paying attention to both the
# parent and, if specified, sort_col and sort_order.

    # Go through all threads in @recs - each element is a thread
    for my $thread (@recs) {
        while (@$thread) {
            my $this = shift @$thread;
            if (my $children = $children{$this->{$anc_col} || $this->{$pk}}->{$this->{$pk}}) {
                unshift @$thread, @$children;
            }
            my $sort_i = $root_pos{$this->{$anc_col} || $this->{$pk}};
            push @{$sorted[$sort_i]}, $this;
        }
    }

    if ($from_root and $rp) { # If $rp was passed in, order the array refs according to $rp->{$root_id}
# $sort[0] is sorted for all the elements. What we have to do now is group them into threads.
        my $i;
        my %cur_pos = map { ("$_" => $i++) } @{$sorted[0]};
        $sorted[0] = [
            sort {
                ( # This bit sorts by root ID
                    $rp->{$a->{$anc_col} || $a->{$pk}}
                        <=>
                    $rp->{$b->{$anc_col} || $b->{$pk}}
                )
                ||
                ($cur_pos{$a} <=> $cur_pos{$b}) # Keep the order for elements with the same root id
            }
            @{$sorted[0]}
        ];
    }

    my $return = {};
    for my $tree (@sorted) {
        my $root = $from_root ? 0 : $tree->[0]->{$anc_col};
        push @{$return->{$root}}, @$tree;
    }

    $return;
}

$COMPILE{parents} = __LINE__ . <<'END_OF_SUB';
sub parents {
# -----------------------------------------------------------
#   $tree->parents(id => [$pkval1, $pkval2, ...])
#
    my $self = shift;
    my $input = $self->common_param(@_) or return $self->error(BADARGS => FATAL => '$tree->parents(HASH or HASH_REF)');

    $self->{tree} and $self->{table} or return $self->error(NOTREEOBJ => FATAL => '$tree->parents()');

    my $parent = $self->{table}->name();
    $parent =~ s/^\Q$self->{connect}->{PREFIX}\E//;
    my $tree = $self->{tree}->name();
    $tree =~ s/^\Q$self->{connect}->{PREFIX}\E//;

    my $rel = $self->{table}->new_relation($parent, $tree);

    my $get = $input->{cols};
    $get = [] unless ref $get eq 'ARRAY';
    my $depth = $self->depth_col;
    if (@$get) { # If $get is empty, everything will be returned.
        my ($found_t, $found_d);
        for (@$get) {
            $found_t++ if $_ eq 'tree_id_fk';
            $found_d++ if $_ eq $depth;
            last if $found_t and $found_d;
        }
        push @$get, 'tree_id_fk' if not $found_t;
        push @$get, $depth if not $found_d;
    }

    my $ids = $input->{id};
    my $ref = ref $ids;
    $ids = [$ids] if $ids and not $ref;
    $ids and @$ids or return $self->error(TREENOIDS => FATAL => '$tree->parents()');

    $GT::SQL::Tree::Relation::Anc_Join = 1;
    my $sth = $rel->select(@$get => { tree_id_fk => $ids });
    $GT::SQL::Tree::Relation::Anc_Join = 0;

    my $return = { map { ($_ => []) } @$ids };

    while (my $rec = $sth->fetchrow_hashref) {
        push @{$return->{$rec->{tree_id_fk}}}, $rec;
    }

    for (@$ids) {
        @{$return->{$_}} = sort { $a->{$depth} <=> $b->{$depth} } @{$return->{$_}};
    }

    return $ref ? $return : $return->{$ids->[0]};
}
END_OF_SUB

$COMPILE{child_ids} = __LINE__ . <<'END_OF_SUB';
sub child_ids {
# -----------------------------------------------------------
#   $tree->child_ids(id => [$pkval1, $pkval2, ...], include_dist => 1)
#       IN : A hash or hash ref containing at least an 'id' key.
#            The value of the 'id' key is an array reference of ancestor ID's whose
#            descendants (children, children's children, etc.) you are looking for.
#            max_depth can be specified to limit a maximum child depth to return.
#       OUT: Depends on include_dist.
#            Without include_dist: hash ref of array ref. There will be one key for
#              each ID you pass in. If there are no children, the array ref value will
#              contain no elements. Each array element is a child ID.
#            With include_dist: hash ref of hash refs. One key for each ID you pass
#              in. The inner hash refs have keys of the ID's and values of the
#              distance between what you passed in and the element. Essentially,
#              keys() of an include_dist hash is the same as the array ref without
#              include depth.
#
    my $self = shift;
    my $input = $self->common_param(@_) or return $self->error(BADARGS => FATAL => '$tree->child_ids(HASH or HASH_REF)');

    my $ids = $input->{id};
    my $ref = ref $ids;
    $ids = [$ids] if $ids and not ref $ids;
    $ids and @$ids or return $self->error(TREENOIDS => FATAL => '$tree->child_ids()');

    my @get = qw/tree_anc_id_fk tree_id_fk/;
    push @get, 'tree_dist' if $input->{include_dist};
    my $sth = $self->{tree}->select(@get => { tree_anc_id_fk => $ids });

    my $return = { map { ($_ => $input->{include_dist} ? {} : []) } @$ids };

    while (my ($anc, $id, $dist) = $sth->fetchrow) {
        if ($input->{include_dist}) {
            $return->{$anc}->{$id} = $dist;
        }
        else {
            push @{$return->{$anc}}, $id;
        }
    }

    return $ref ? $return : $return->{$ids->[0]};
}
END_OF_SUB

$COMPILE{parent_ids} = __LINE__ . <<'END_OF_SUB';
sub parent_ids {
# -----------------------------------------------------------
#   $tree->parent_ids(id => [$pkval1, $pkval2, ...], include_dist => 1)
#       IN : A hash or hash ref containing an 'id' key.
#            The value of the 'id' key is an array reference of children ID's whose
#            ancestors (parents, parents' parents, etc.) you are looking for.
#       OUT: hash ref of array refs. There will be one key for each ID you pass in.
#            Each array ref contains the ID's of the parents.
#            Liks child_ids, the return is different if you pass in "include_dist".
#            See child_ids for a description.
#
    my $self = shift;
    my $input = $self->common_param(@_) or return $self->error(BADARGS => FATAL => '$tree->parent_ids(HASH or HASH_REF)');

    my $ids = $input->{id};
    my $ref = ref $ids;
    $ids = [$ids] if $ids and not ref $ids;
    $ids and @$ids or return $self->error(TREENOIDS => FATAL => '$tree->parent_ids()');

    my @get = qw/tree_id_fk tree_anc_id_fk/;
    push @get, 'tree_dist' if $input->{include_dist};
    my $sth = $self->{tree}->select(@get => { tree_id_fk => $ids });

    my $return = { map { ($_ => $input->{include_dist} ? {} : []) } @$ids };

    while (my ($id, $anc, $dist) = $sth->fetchrow) {
        if ($input->{include_dist}) {
            $return->{$id}->{$anc} = $dist;
        }
        else {
            push @{$return->{$id}}, $anc;
        }
    }

    return $ref ? $return : $return->{$ids->[0]};
}
END_OF_SUB

$COMPILE{num_children} = __LINE__ . <<'END_OF_SUB';
sub num_children {
# -----------------------------------------------------------------------------
#   $tree->num_children([$pkval1, $pkval2, ...])
#       IN : A list or array reference of of parents ID's whose child counts
#            you are looking for.
#       OUT: Hash reference of ID => NUM_CHILDREN pairs.  Note that this is the
#            number of children (i.e. depth = 1), not descendants.
#
    my $self = shift;

    my @ids = map {
        ref eq 'ARRAY'
            ? @$_
            : ref()
                ? $self->error(BADARGS => FATAL => '$tree->num_children(ARRAY or ARRAYREF)')
                : $_
    } @_;

    @ids or return $self->error(TREENOIDS => FATAL => '$tree->num_children()');

    $self->{tree}->select_options('GROUP BY tree_anc_id_fk');
    my %return = $self->{tree}->select(tree_anc_id_fk => 'COUNT(*)', { tree_anc_id_fk => \@ids, tree_dist => 1 })->fetchall_list;

    for (@ids) { $return{$_} ||= 0 }

    return \%return;
}
END_OF_SUB


package GT::SQL::Tree::Relation;
# This is here to subclass the table->tree relation so that selects work properly

use GT::SQL::Relation;
use vars qw/@ISA $ERROR_MESSAGE $Anc_Join/; # $Anc_Join is set by the tree module when the join should be on tree_anc_id_fk rather than tree_id_fk
@ISA = $ERROR_MESSAGE = 'GT::SQL::Relation';

sub _join_query {
# -------------------------------------------------------------------
# Figures out the join clause between tables.
#
    my $self = shift;
    my $relations = shift;
    if (@$relations != 2) {
        return $self->error(TREEBADJOIN => FATAL => "@$relations");
    }
    my ($table, $tree) = @$relations;
    ($table, $tree) = ($tree, $table) if !$relations->[0]->{schema}->{tree};

    return "$tree->{name}." . ($Anc_Join ? 'tree_anc_id_fk' : 'tree_id_fk') . " = $table->{name}." . $table->pk()->[0];
}

1;

__END__

=head1 NAME

GT::SQL::Tree - Helps create and manage a tree in an SQL database.

=head1 SYNOPSIS

    use GT::SQL::Tree;

    my $tree = $table->tree;
    my $children = $tree->children(id => [1,2,3], max_depth => 2);

    my $parents = $tree->parents(id => [4,5,6]);

=head1 DESCRIPTION

GT::SQL::Tree is designed to implement a tree structure with a SQL table. Most
of the work on managing the table is performed automatically behind the scenes,
however there are a couple of front end methods to retrieving the tree nodes
from a GT::SQL::Tree object.

=head1 METHODS

=head2 new, tree

Typically, the way to get a tree object is to call ->tree on a table object. The
table object then calls GT::SQL::Tree->new for you and returns the results,
which is a GT::SQL::Tree object. Typically you should not call ->new directly,
but instead let $table->tree call it with the proper arguments.

=head2 create, add_tree

To use GT::SQL::Tree, you need to first call create(). You shouldn't call it
directly, but instead call ->add_tree() on an editor object. The arguments to
add_tree are passed through to create, so that they are essentially the same
(there is one exception - add_tree passed in C<table =E<gt> $table_object>).

create() will create a tree table, with the name passed on the name of the table
passed in. For example, if you wish to build a tree on 'MyTable', the tree table
that is created by create() will be named MyTable_tree. The tree table provides
easy one-query access to all of a nodes parents or children, and also keeps
track of the number of hops between a node and its descendant, allowing you to
limit how far you descend into the tree.

The following arguments are required:

=over 4

=item table

This contains the table object for the table the tree is to be built upon. Note
that when calling add_tree you B<should not> specify this - add_tree passes it
along on its own.

=item father

This must specify the name of the father ID column. The father ID column
controls the relationship between father/child.

For example, if your primary key is "my_id" and your father id column is
"my_father_id", you would pass in "my_father_id" as the value to C<father>.

=item root

This is used to specify the name of the root column. For example, if your
primary key is "my_id" and your root id column is "my_root_id", you would pass
in "my_root_id" as the value to C<root>.

=item depth

This is used to specify the name of the depth column for the table. For example,
if you are using a column named "my_depth" to keep track of the depth of a node,
you would pass in "my_depth" as the value to C<depth>.

=back

The following are optional arguments to create/add_tree:

=over 4

=item force

Takes a value such as 'force' or 'check'.  This value is passed on to the
GT::SQL table creation subroutine.

=item rebuild

You can pass in a GT::SQL::Tree::Rebuild object if you have an incomplete or
invalid table structure. See L<GT::SQL::Tree::Rebuild> for more details.

=item debug

Sets the debug level of the tree object. add_tree() automatically passes in the
debug value for the table object, so it normally is not necessary to set this.

=back

=head2 destroy, drop_tree

You can call C<$tree-E<gt>destroy> to destroy a tree. This involves dropping the
tree table and deleting the tree reference from the table the tree was on. This
can be called by calling C<$tree-E<gt>destroy()> on a GT::SQL::Tree object,
however this is typically invoked by calling C<$editor-E<gt>drop_tree()> on a
table editor object.

Neither C<$tree-E<gt>destroy()> nor C<$editor-E<gt>drop_tree()> take any
arguments.

=head2 root_id_col, father_id_co, depth_col

These three tree object methods return the name of the associated column in the
main table. Usually you will already know them, and these methods are primarily
used internally.

=head2 children

This is where the usefulness of the tree module comes into play.
C<$tree-E<gt>children> is used to access all of the children of a particular
node. It takes a wide variety of arguments to control the return.

Usually, the return will be either a hash reference of array references each
containing hash references, or else an array reference of hash references. Which
reference you get depends on what you request via the C<id> parameter, described
below. Each inner hash reference is a row from the database, typically a joined
row from the table the tree is on with the tree table, however the
C<roots_only>, C<cols>, and C<select_from> parameters all change this behaviour.

The arguments to C<children()> are as follows:

=over 4

=item id

The value of the id key is either a scalar value, or an array reference. The
value/values to id should be the id whose descendants you are looking for. For
example, if you are looking for the children of ID 3 and ID 4, you would pass in
C<id =E<gt> [3, 4]>. The return value of children will be a hash reference
containing two keys: 3 and 4.

If you are looking for the children of a single ID and pass the id as a scalar
value, you will get back an array reference as described above.

So, basically, if the value to id is an array reference, you will get back a
hash reference of array references of hash references; if it is a scalar value,
you will get back an array reference of hash references.
    $tree->children(id => [1])->{1};
and
    $tree->children(id => 1);
will result in the same thing.

To get all the trees in a single query, you pass in 0 as the value. This is as
if you are requesting the children of the imaginary root to which all roots
belong.

C<id> is the only required parameter.

=item max_depth

You can specify a max_depth value to specify that the records returned should
not be more a certain distance from the node. For example, supposing you have
this tree:
 a
  b
   c
    d
Selecting the children of a with a max_depth of 1 would return just b, not c or
d. A max_depth of 2 would return b and c.

Not specifying max_depth means that you do not want to limit the maximum
distance from the parent of the returned values.

=item cols

You can specify an array reference as the value to C<cols> to alter the values
returned. Instead of doing "SELECT * FROM ...", the query will be "SELECT <what
you specify> FROM ...". Note, however, that the father, root, and depth columns
are required and will be present in the rows returned whether or not you specify
them.

=item sort_col, sort_order

Where the C<sort> option sorts the results based on tree levels, C<sort_col> and
C<sort_order> control the sorting for nodes with the same father ID. For
example, with this tree:
 a
  b
  c
C<sort_col> and C<sort_order> affect whether or not b comes before or after c.
The value of each can either be a scalar value or an array reference. There is
essentially no difference, the scalar value is just a little easier when you are
only sorting on a single column. The values of C<sort_col> should be column
names, and the values of C<sort_order> 'ASC' or 'DESC', per sort column
respectively. For example:
 sort_col => ['a','b'], sort_order => ['ASC', 'DESC']
will sort first in ascending order based on the value of a, then descending
order based on the value of column b. This correlates directly to SQL - it
becomes "ORDER BY a ASC, b DESC".

You can specify a different sort order for roots by using the C<roots_order_by>
option, when using C<id =E<gt> 0>. See below.

=item condition

If you want to limit the results, you can pass a GT::SQL::Condition object into
C<children()> via the condition key. The condition will apply to the select
performed. For example, if you want to select rows with a column "a" having a
value less than 20, you could do:
 my $cond = GT::SQL::Condition->new(a => '<' => 20)
 my $children = $tree->children(..., condition => $cond);

=item limit

Like condition, you can specify any valid LIMIT _____ value here, for example
"50, 25". This option is only used when using C<id =E<gt> 0> - it will limit the
number of roots returned, taking into account the sort_col and sort_order.

=item roots_only

If you specify this option, it will assume that what you passed in via C<id>
consists only of root_ids. Doing so makes a join with the tree table
unneccessary and allows you to use the C<select_from> option. This option can be
used (and generally this is a good idea) when specifying C<id =E<gt> 0>.

=item roots_order_by

This option controlls the order of root posts, when selecting roots using
C<id =E<gt> 0> and a limit. C<sort_order> above will affect the order of
children of the roots, but the order of the roots themselves will be controlled
by whatever C<ORDER BY> value you specify here.

Again, this option requires that C<id =E<gt> 0>, C<roots_only>, and C<limit> are
also being used.

If this option is omitted, the C<ORDER BY> will be generated from the values of
the C<sort_col> and C<sort_order> options.

=item select_from

If you are using roots_only, you can also specify the C<select_from> option.
This option allows you to perform the selects from a GT::SQL::Relation object
instead of just the table associated with the tree. Note that the table
associated with the tree must be part of the relation, however you can have as
many other tables as you like.

=item left_join

If the select_from relation should be a left join, pass C<left_join =E<gt> 1>.
This simply passes the C<left_join> option to ->select. This option is only
applicable when select_from is used.

=back

=head2 parents

This is effectively the opposite of children. Instead of getting back all of the
children nodes, it gives the parents, all the way up to the root for any given
node. The return value is the same as that of C<children>, so see that section.

Each array returned by C<children> is sorted by depth from root to parent.

=over 4

=item id

C<id> is the only required parameter for C<parents()>. It should be either a
scalar value or an array reference. You specify the ID's of children whose
parents you are looking for. The type of argument (scalar or array ref) affects
the return in the same way as C<children()>.

=item cols

C<cols> works in a similar way to the C<cols> parameter to C<children>. You
specify the columns you want in the return as an array ref. What you get back
will have these columns in it. If C<cols> is not specified, you'll get back all
columns.

Note that 'tree_id_fk' and the depth column for the table are required fields
and will be added if not specified.

=back

=head2 child_ids

If you are looking for just the ID's of the children of a particular node, you
should use this. The return value is one of the following, depending on what you
pass in:

hash reference of array references:
    { ID => [ID, ID, ...], ... }
with one ID in the hash reference for each id you specify. The array reference
contains the child ID's of the key ID.

hash reference of hash references:
    { ID => { ID => dist, ID => dist, ... }, ... }
with one ID in the other hash reference for each id you specify. The inner hash
reference is made of child_id => child_distance key-value pairs.

array reference or hash reference:
    [ID, ID, ...]
hash reference:
    { ID => dist, ID => dist }

The first two apply when passing in an array reference for C<id>, the latter two
when passing a scalar value for C<id>. The first and third are without
C<include_dist> specified, the second and fourth occur when you specify
C<include_dist>.

=over 4

=item id

Like all other accessors, child_ids takes a scalar value or array reference as
the C<id> value. Return as noted above.

=item include_dist

This changes the return as noted above - instead of just getting an array
reference of child ID's, you get the child ID's as the keys of a hash reference,
and the distances of the child from the parent you requested as the values.

=back

=head2 parent_ids

Exactly the same as child_ids, except that this works I<up> the tree instead of
I<down>. Takes the same arguments, gives the same possible returns.

=head1 INDICES

A tree requires a few indices to get optimal performance out of it. If the table
is never expected to be more than just a few rows, you won't notice a
substantial difference, however, as with any table, as the table grows the
performance proper indexing provides becomes more appreciable.

Two indices are created automatically on the tree table, one on tree_id_fk, and
the other on tree_anc_id_fk,tree_dist, so you don't need to worry about that
table.

Obviously, the usage of the tree affects how many indices you want, this section
is simply to provide some general guidelines for the indices required.

Because the roots_only option is based solely on the main table and not the
tree, if you are using roots_only (calling children with id => 0 automatically
turns on the roots_only option), you want to make sure you have an index on the
root column. If you also use the max_depth depth option, add the depth column to
this index.

Keep in mind that you may need to mix other columns in here if you are using a
condition with children(). This also applies when using the C<sort_col> and
C<sort_order> parameters - basically you need to figure out what your indices
are, and then add in the root column and, if using max_depth, the depth column.

=head1 COPYRIGHT

Copyright (c) 2004 Gossamer Threads Inc.  All Rights Reserved.
http://www.gossamer-threads.com/

=head1 VERSION

Revision: $Id: Tree.pm,v 1.30 2008/06/11 06:55:26 brewt Exp $

=cut