GT::URI::HTTPS - HTTPS request broker. Can be used stand-alone or through GT::URI
use GT::URI::HTTPS;
print GT::URI::HTTPS->get( "http://www.gossamer-threads.com" );
GT::URI::HTTPS makes requests and retrieves resources from http servers (not limited to text).
Socket Handling
sub pending() Returns true if data awaiting sub EOF() Returns open/closed status of socket sub gulp_read() Alias to do_iteration sub do_iteration() Basic looping function that downloads resources in the background
Acquisition
sub fetch() Tell the object which URL to acquire sub method() The method of acquisition sub load_parameter() Add a item for CGI parameters sub delete_parameter() Delete a CGI parameter sub resource_attrib() Headers related to resource and server sub get() Simple resource aquisition function
Support Methods (must be imported)
sub parse_url() Decomposes a URL into constituent parts sub deparse_url() Takes those parts and builds an URL sub build_path() Takes a list of directories and builds a path sub build_parameters() Takes a hash of parameter->values and builds a CGI request string
Just want a single item? Call GT::URI::HTTPS->get and all the magic will be done for you.
use GT::URI::HTTPS; my $buf = GT::URI::HTTPS->get( "http://www.gossamer-threads.com/" );
Get based requests are permissable as well:
use GT::URI::HTTPS; my $buf = GT::URI::HTTPS->get( "http://search.yahoo.com/bin/search?p=gossamer+threads" );
If extra options need to be set, simply append the options to the parameter list, like follows.
use GT::URI::HTTPS; my $buf = GT::URI::HTTPS->get( "http://search.yahoo.com/bin/search?p=gossamer+threads", { request_method => 'POST' } );
If a new GT::URI::HTTPS object is instantiated, much more control is available, including facilities for non-blocking downloading of pages.
To create a GT::URI::HTTPS object, call new with all the options required:
use GT::URI::HTTPS;
my $http = new GT::URI::HTTPS(
# URL to acquire (optional) 'URL' => '',
# Can also be set to POST/GET/HEAD (optional) 'request_method' => 'GET',
# a hash of keys pointing to an arrayref of values to be sent to the server # { # 'key' => [ 'value1', 'value2'... ], # } # (optional) 'parameters' => {},
# Name portion of the User-Agent: string the server acquires (optional) 'agent_name' => 'Mozilla/4.73 [en]',
# Host-from portion of the User-Agent: string the server acquires (optional) 'agent_host' => 'X11; I; Linux 2.2.15-4mdk i586',
# To prevent downloading of 80Tb files, but if you still wanted to, set this to 0 (optional) 'max_down' => 200000
);
If URL has been specified in the options, for interactions with a CGI, you can set extra parameters with $http->load_parameter(). Finally, loop on $http->do_iteration()
until the value is defined. To replicate the ``simple get'' example:
use GT::URI::HTTPS; $|++;
my $http = new GT::URI::HTTPS(
URL => 'http://search.yahoo.com/bin/search',
# can also use the following: parameters => { 'p' => [ 'gossamer threads' ] }
);
my $doc; while ( not defined( $doc = $http->do_iteration() ) ) { # do something here while waiting for the resource to arrive print "." }
print $doc, "\n\n";
Beyond the resource, the http server often supplies extra information in a header. To access this information, use $http->resource_attrib().
Appending this code to the previous example, a list of all the associated server headers can be seen:
my $attribs = $http->resource_attrib(); foreach my $key ( sort keys %{$attribs} ) { print "$key => $attribs->{$key}\n"; }
In addition to the basic fetching abilities, since the module must parse HTTPS URLs, the methods used to do so have been made public.
These methods decompose URLs into datastructures that make URLs easily studied or modified and then reconstructed.
However, these routines have not been polished for useability so beware! The following is a very basic example:
use GT::URI::HTTPS qw/ parse_url deparse_url build_path build_parameters /;
# fragment the URL my ( $host, $port, $dirs, $file, $params ) = parse_url( 'http://www.gossamer-threads.com/perl/forum/showflat.pl?Cat=&Board=GosDisc&Number=113355&page=0&view=' );
print "Parsed Data:\n\n"; print "Host: $host\n"; print "Port: $port\n"; print "Dirs:\n"; foreach my $dir ( @{$dirs} ) { print " $dir/\n"; } print "Resource Filename: $file\n"; print "Params:\n"; foreach my $key ( sort keys %{$params} ) { print " $key: "; my $values = ( $params->{$key} || {} ); foreach my $value ( sort @{$values} ) { print "'", quotemeta($value), "' "; } print "\n"; }
# put the data back together again my $url = deparse_url( $host, $port, $dirs, $file, $params ); print "\nDeparsed Data:\n\n"; print "URL: http://$url\n";
Takes an array ref of directory names and an optional filename and returns a filepath.
use GT::URI::HTTPS qw/ build_path /; print build_path( [ 'topdir', 'middir', 'bottomdir' ], 'file.html' );
This function must be imported.
Builds a CGI request string from list of keys and values. The function has the ability to handle keys with more than one parameter, simply use an arrayref with multiple values.
use GT::URI::HTTPS qw/ build_parameters /; my $params = { 'simplekey' => 'value' 'onekey' => [ 'one value' ], 'anotherkey' => [ 'another value', 'and yet anotherone!' ], }; print build_parameters($params);
This function must be imported.
When loading the object with parameters before a request, it is possible to delete an entire set of keys and values.
This builds an entire URL from basic parameters.
For an example of this function, see the example in ``Support Methods''.
This function must be imported.
The basic iteration function. This function will return undef until the resource is received which, upon receipt will return the resource data.
The function can return an empty string, so it is important to checked defined'ness. If the return is an empty string, check the ERROR_CODE in resource_attrib to find out if the script simply can't connect to the host or the resource is empty.
Returns '1' or '0' depending if the object has stopped receiving/sending data to the remote server.
Tells the server the URL to retreive the resource of. If CGI parameters are required pass in a hash of keys and values.
Simplest resource aquision method. Give it the URL and any options and the function will return after the resource has been downloaded.
This is just an alias to the function do_iteration. This method is used by GT::URI in its mass resource aquisition runs.
Unless you feel like being different, you shouldn't need to use this.
Takes a list of keys and values and loads the values into the list of CGI parameters to be sent to the remote server.
Sets the acquisition method for the resource. Currently, GET/POST/HEAD are supported.
If no parameters are supplied the function simply returns the current acquisition method.
Takes an URL and decomposes it into easily manipulated datastructures. The output can be fed back into deparse_url to reconstruct an URL.
This function must be imported.
If there is data available to be downloaded, this function returns '1', otherwise '0'. This is another function used by GT::URI in it's mass downloads and unlikely to be of any use to anyone using this module directly. This function exists because it is lighter than do_iteration which can be quite a load as opposed to this if there were 100 racked downloads, all being polled every tenth of a second!
If a key is requested, function will return the value associated with the resource attribute. If not, the function will return a hashref keyed by server parameter to its corresponding value.
All the server keys have been converted into lower-case. This prevents conflict with two very important keys, ERROR_CODE, and ERROR_MESSAGE, which carry the HTTPS error code and message associated with the aquisition of this page.
#!/usr/bin/perl
use GT::URI::HTTPS;
if ( not @ARGV ) { print qq! SYNOPSIS
$0 url [-f/-h] [ cgi_parameter1=value1 cgi_parameter2=value2 ... ]
basic HTTPS requestor
OPTIONS
-f : full information; headers and resource. Usually only a dump of the resource is provided. -h : just the headers\n\n!; exit; }
# parse out the command line # first argument, URL $url = shift @ARGV;
# next arguments, parameters foreach my $item ( @ARGV ) {
# ... check for special requests if ( $item =~ /^-([fd])$/ ) { $mode = $1; }
# ... is not a special request, but probably a parameter ( $key, $value ) = ( $item =~ /([^=]+)=(.*)/ ); $key ||= $item; push @{$parameters->{$key}}, $value; }
# setup and send the request $http = new GT::URI::HTTPS(
# if we're only looking to use the head request_method => ( $mode eq 'h' ? 'HEAD' : 'GET' )
); $http->fetch( $url, $parameters );
# get the resource while ( not defined ( $doc = $http->do_iteration(-1) ) ) {}
# and print out the headers if wanted if ( $mode ) { $headers = $http->resource_attrib(); foreach $key ( sort keys %{$headers || {}} ) { print "$key: $headers->{$key}\n"; } print "\n"; }
# and output the resource... print $doc;
Copyright (c) 2000 Gossamer Threads Inc. All Rights Reserved. http://www.gossamer-threads.com/
Revision: $Id: HTTPS.pm,v 1.10 2004/08/23 20:07:44 jagerman Exp $