GT::SQL::Search - internal driver for searching
This implements the query string based searching scheme for GT::SQL. Driver based, it is designed to take advantage of the different indexing schemes available on different database engines.
Instead of describing how Search.pm is interfaced* this will describe how a driver should be structured and how a new driver can be implemented.
* as it is never accessed directly by the programmer as it was designed to be called through the functions GT::SQL::Table::query and GT::SQL::Table::query_sth
A driver has two parts. The Indexer and the Search packages are the most important. Howserver, for any driver in the search, there must exist a directory with the name of the driver in ALL CAPS. For exampel, MYSQL for MySQL, POSTGRES for Postgres. Within each driver directory, The Indexer and Search portions of the driver contains all the information required for initializing the database table and searching the database.
The Indexing package of the driver handles all the data that is manipulated in the database and also the initializes and the database for indexing.
The Search package handles the queries and retrieves results for the eventual consumption by the calling program.
Drivers are simply subclasses of the base driver module, GT::SQL::Search::Base and operate by overriding certain key functions.
The next few sections will cover how to create a search driver, and assumes a fair bit of familiarity with GT::SQL.
The following is an absolutely simple skeleton driver that does nothing and but called ``CUSTOM''. Found in the CUSTOM directory, this is the search package, and would be call Search.pm in the GT/SQL/Search/CUSTOM library directory.
package GT::SQL::Search::CUSTOM::Search; #------------------------------------------ use strict; use vars qw/ @ISA /; use GT::SQL::Search::Base::Search; @ISA = qw( GT::SQL::Search::Base::Search ); sub load { my $package_name = shift; return GT::SQL::Search::CUSTOM::Search->new(@_) }; # overrides would go here 1;
For the indexer, another file, Indexer.pm would be found in the GT/SQL/Search/CUSTOM directory.
package GT::SQL::Search::CUSTOM::Indexer; #------------------------------------------ use strict; use vars qw/ @ISA /; use GT::SQL::Search::Base; @ISA = qw/ GT::SQL::Search::Base::Indexer /; sub load { my $package_name = shift; return GT::SQL::Search::CUSTOM::Indexer->new(@_) }; # overrides would go here 1;
The almost empty subs that immediately return with a value are functions that can be overridden to do special tasks. More will be detailed later.
The Driver has been split into two packages. The original package name, GT::SQL::Search::Nothing, houses the Search package. GT::SQL::Search::Nothing::Indexer is the Indexing portion of the seach system. ``::Indexer'' must be appended to the orginial search name for the indexer.
Each of the override functions are triggered at points just before and after a major event occurs in GT::SQL. Depending on the type of actions you require, you pick and chose which events you'd like your driver to attach to.
The Indexer is responsible for creating all the indexes, maintaining them and when the table is dropped, removing all the associated indexes.
The following header must be defined for the Indexer. GT::SQL::Search::Base::Indexer is the superclass that our driver inherits from.
package GT::SQL::Search::CUSTOM::Indexer; #------------------------------------------ use strict; use vars qw/ @ISA /; use GT::Base; use GT::SQL::Search::Base::Indexer; @ISA = qw/ GT::SQL::Search::Base::Indexer /;
In addition to the header, the following function must be defined. GT::SQL::Search::Driver::Indexer::load creates the new object and allows for special preinitialization that must occur. You can also create another driver silently (such as defaulting to INTERNAL after a version check fails).
sub load { my $package_name = shift; return GT::SQL::Search::CUSTOM::Indexer->new(@_) };
Finally, there are the overrides. None of the override functions need be defined in your driver. Any calls made to undefined methods will silently fallback to the superclass driver's methods. When a method has been overridden, the function must return a true value when it is successful, otherwise the action will fail and an error generated.
Whenever a object is created it will receive one property $self->{table} which is the table that is being worked upon. This property is available in all the method calls and is required for methods such as _create_table and _drop_search_driver methods.
When a table is first created or when a table is destroyed the following two functions are called. They are not passed any special values, however, these are all class methods and $self->{table} will be a reference to the current table in use.
This set of overrides are used by GT::SQL::Creator when the ::create method is called. They are called just prior and then after the create table sql query has been executed.
This next set of functions take place in GT::SQL::Editor.
pre_add_column accepts $name (of column), $col (hashref of column attributes). The method will only be called if the column has a weight associated with it. The function must return a non-zero value if successful. Note that the returned value will be passed into the post_add_column so temporary values can be passed through if required.
post_add_column accepts $name (of column), $col (hashref of column attributes), $results (of pre_add_column). This method is called just after the column has been inserted into the database.
pre_delete_column accepts $name (of column), $col (hashref of column attributes). The method will only be called if the column has a weight associated with it. The function must return a non-zero value if successful. Note that the returned value will be passed into the post_delete_column so temporary values can be passed through if required.
post_delete_column accepts $name (of column), $col (hashref of column attributes), $results (of pre_add_column). This method is called just after the column has been dropped from the database.
pre_drop_table receives no arguments. It can find a copy of the current table and columns associated in $self->{table}.
post_drop_table receives one argument, which is the result of the pre_drop_table.
The following set of functions take place in GT::SQL::Table
pre_add_record will receive one argument, $rec, hashref, which is the record that will be inserted into the database. Table information can be found by accessing $self->{table} Much like the other functions, on success the result will be cached and fed into the post_add_record function.
post_add_record receives $rec, a hashref to describing the new result, the $sth of the insert query, and the result of the pre_add_record method. The result from $sth->insert_id if there is a ai field will be the new unique primary key.
pre_update_record receives two parameters, $set_cond, $where_cond. $set_cond is a hashref containing the new values that must be set, and $where_cond is a GT::SQL::Condition object selecting records to update. The result once again, is cached and if undef is considered an error.
post_update_record takes the same parameters as pre_update_record, except one extra paremeter, the result of pre_update_record.
pre_delete_record, has only one parameter, $where, a GT::SQL::Condition object telling which records to delete. The results of this method are passed to post_delete_record.
post_delete_record, has one addition parameter to pre_delete_record and like most post_ methods, is the result of the pre_delete_record method.
Neither function is passed any special data, except for post_delete_all_records which receives the rsults of the pre_delete_all_records method.
The Searcher is responsible for only one thing, to return results from a query search. You can override the parser, however, subclassing the following methods will have full parsing for all things such as +/-, string parsing and substring matching.
The structures passed into the methods get a little complicated so beware!
ALL the following functions receive two parameters, the first is a search parameters detailing the words/phrases to search for, the second parameter is the current result set of IDs => scores.
There are two types of search parameters, one for words and the other for phrases. The structure is a little messy so I'll detail them here.
For words, the structure is like the following:
$word_search = { 'word' => { substring => '1', # set to 1 if this is substring match phrase => 0, # not a phrase keyword => 1, # is a keyword mode => '', # can also be must, cannot to mean +/- }, 'word2' => ... }
For phrases the structure will become:
$phrase_search => { 'phrase' => { substring => undef # never required phrase => [ 'word1', 'word2', 'word3', ... ], # for searching by indiv word if required keyword => 0, # not a keyword mode => '' # can also be must, cannot }, 'phrase2' => ... }
Based on these structures, hopefully it will be easy enough to build whatever is required to grab the appropriate records.
Finally, the second item passed in will be a hash filled with ID => score values of search results. They look something like this:
$results = { 1 => 56, 2 => 31, 4 => 6 }
It is important for all the methods to take the results and return the results, as the result set will be daisychained down like a set to be operated on by various searching schemes.
At the end of the query, the results in this set will be sorted and returned to the user as an sth.
Operations on this set are preformed by the following five methods.
Two parameters are passed in, ( $input, $buckets ). $input is a hash that contains all the form/cgi parameters passed to the $tbl->query function and $buckets is s the structure that is created after the query string is parsed. You may also call $self->SUPER::_query( $input, $buckets ) to pass the request along normally.
You must return undef or an STH from this function.
This method must also implement substring searching.
This method must also implement substring searching.
This method must also implement substring searching.
This method accepts a $CGI or a $HASH object and performs the following
Options: - paging mh : max hits nh : number hit (or page of hits) sb : column to sort by (default is by score)
- searching ww : whole word ma : 1 => OR match, 0 => AND match, undefined => QUERY substring : search for substrings of words bool : 'and' => and search, 'or' => or search, '' => regular query query : the string of things to ask for
- filtering field_name : value # Find all rows with field_name = value field_name : ">value" # Find all rows with field_name > value. field_name : "<value" # Find all rows with field_name < value. field_name-gt : value # Find all rows with field_name > value. field_name-lt : value # Find all rows with field_name < value.
The function must return a STH object. However, you may find useful the GT::SQL::Search::STH object, which will automatically handle mh, nh, and alternative sorting requests. All you will have to do is
sub query { ... your code ... return $self->sth( $results ); }
Where results is a hashref containing primarykeyvalue => scorevalues.
Accepting 2 parameters, ( $drivername, $input ), where $drivername is the name of the driver you'd like to use and $input is the parameters passed to the method. Returned is an $sth value (undef if an error has occured). This method was used in the INTERNAL driver to shunt to NONINDEXED if it found the search would take too long.
Copyright (c) 2004 Gossamer Threads Inc. All Rights Reserved. http://www.gossamer-threads.com/
Revision: $Id: Search.pm,v 1.60 2004/08/28 03:53:43 jagerman Exp $