NAME

GT::WWW - Multi-protocol retrieving and posting, related in functionality to LWP.


DESCRIPTION

GT::WWW is designed to provide a common interface for multiple protocols (as of this writing, only HTTP and HTTPS, however others are planned) and handles HEAD, GET, and POST requests. For non-HTTP-based protocols, what, exactly, a ``HEAD'', ``GET'', or ``POST'' request is depends on the module in question. For example, with FTP ``GET'' might download a file, while ``POST'' might upload one to the server, and ``HEAD'' might return just the size of the file.

The modules under GT::WWW should not be used directly; this module should be used instead. The documentation here describes the options common to all protocols - however you should check the POD of the protocol subclasses (GT::WWW::http, GT::WWW::https, etc.) to see any extra options or methods that those modules provide.


SYNOPSIS

Quick way:

    use GT::WWW;
    my $www = GT::WWW->get("http://server.com/page";);
    ...     = GT::WWW->post("http://server.com/page";);
    ...     = GT::WWW->head("http://server.com/page";);
    ...     = GT::WWW->...("http://user:pass@server.com/page";);
    ...     = GT::WWW->...("https://server.com/page");
    # This query string will be parsed and passed as POST input:
    ...     = GT::WWW->post("http://server.com/page?foo=bar;bar=foo";);

Longer, but more capable way:

    use GT::WWW;
    my $request = GT::WWW->new();
    $request->protocol("http");
    $request->host("server.com");
    $request->port(8080);
    $request->path("/path/foo.cgi");
    $request->username("user");
    $request->password("pass");
    $request->parameters(foo => "bar", bar => "foo");

equivelant to the above, using ->url():

    $request->url("http://user:pass@server.com:8080/path/foo.cgi?foo=bar;bar=foo";);

Now call $request->get(), $request->post(), or $request->head().

Very quick way to print a page:

    perl -MGT::WWW=get -e 'print get("http://server.com/page?foo=bar&bar=foo";)'


METHODS

Note that all methods that set values (such as host(), port(), etc.) also return the value when called without any argument.

new

Call new() to get a new GT::WWW object. You can call it without arguments to get a generic GT::WWW object, or use arguments as described below.

URL
You can call new with a single scalar argument - a URL to be parsed. The URL is of the same format as taken by the url() method.

HASH
You can alternatively call new() with a hash (or hash reference) of options. Each of the methods described below can be passed in to new in the form of key => value pairs - the methods will be called with the values specified automatically.

head

get

post

These are the methods used to tell the module to actually connect to the server and download the requested page.

When used as GT::WWW class methods or function calls (but NOT as methods on GT::WWW objects or sub-objects), they take a single URL as an argument. This call creates an internal GT::WWW object, turns on fatal_errors(1), passes the URL to url(), then calls the appropriate get(), head(), or post() method of the resulting protocol-specific object.

Note, however, that once you have specified a protocol (either via protocol(), or as part of a url passed to url()) your object ceases to be a GT::WWW object and becomes a protocol-specific GT::WWW subclass. All subclasses provide their own versions of these methods.

The subclassed methods are not described here because they may not be supported for all protocols, and their return value(s) may differ from one protocol to the next. For more details, see the modules listed in the SEE ALSO section.

Generally, get() and post() return an overloaded object that can be used as a string to get the content (i.e. for printing), but see the notes in the CAVEATS section of the GT::WWW::http::Response manpage for anything more complicated than concatenation or printing.

url

Takes a URL as argument. The URL is parsed into several fields: protocol, username, password, host, port, path, and query_string, then each of those properties are set for the current object. Also note that calling url() on an existing object resets the host, port, username, password, and all parameters.

Interally, this method calls parse_url().

parse_url

Takes a URI, and returns the following 7 element list:

    #    0          1          2        3      4      5          6
    ($protocol, $username, $password, $host, $port, $path, $query_string) =
        GT::WWW->parse_url($url);

URL's require, at a minimum, protocol and host, in URI form:

    PROTOCOL://HOST

The URL can extend up to:

    PROTOCOL://USERNAME:PASSWORD@HOST:PORT/PATH?QUERY_STRING

Only protocols known to GT::WWW are acceptable. To check if a URL is valid, check $protocol.

This method can be called as a class or object method, but not as a function. If called as an object method, the strict option as currently set for the object will be used; as a class method or function, an optional second parameter can be passed in - if true, strict query string parsing mode will be enabled.

protocol

Takes a protocol, such as 'http', 'https', 'ftp', etc. Note that when you call protocol, you object ceases being a GT::WWW object, by becoming a GT::WWW subclass (such as GT::WWW::http, GT::WWW::https, etc.). Before trying an unknown protocol, you should generally call the protocol_supported method - calling protocol(...) with an unsupported protocol will result in a fatal error.

protocol_supported

This method takes a protocol, such as 'http', 'https', 'ftp', etc. In order to make sure the protocol is supported, this checks to see that it is an internally supported protocol, and also tries to load the module to make sure that the module can be loaded.

valid_host

Returns true in scalar context if the host appears valid, or the host and port in list context if the host is valid. Note that no check is performed to see whether or not the host resolves or is reachable - this simply verifies that the host is at least valid enough to warrant a lookup.

host

Sets the host, and optionally the port (assuming the argument is of the form: 'hostname:port'). Returns a fatal error if the host is not valid. Note that setting the host will reset the port to the protocol's default value, so this method must be called before port().

port

Sets the port for the connection. This can be a name, such as ``smtp'', or a numeric value. Note that the port value will be reset when the host() method is called, so setting a port must happen after setting the host.

reset_port

Resets the port so that the next request will use the default port.

username

Sets or retrieves the login username.

reset_username

Removes the login username.

password

Sets the login password.

reset_password

Removes the login password.

connection_timeout

Specifies a timeout for connections, in seconds. By default, a value of 10 is used. If you specify a false value here, the connection time out will be system dependent; typically this is from one to several minutes. Note, however, that the timeout is not supported on Windows systems and so should not be depended on in code that runs on Windows systems.

path

Sets the path for the request. Any HTTP escapes (e.g. %20) are automatically converted to the actual value (e.g. `` ''). If required, the path will be automatically re-escaped before being sent to the server.

parameters

Takes a list (not a hash, since duplicate keys are permitted) of key => value pairs. Optionally takes an extra argument - if true, the parameters are added, not replaced - if omitted (or false), any existing parameters are deleted.

To specify a valueless parameter without a value, such as b in this example query string:

    a=1&b&c=3

Pass undef as b's value. Passing ``'' as the value will result in:

    a=1&b=&c=3

For example, to set to two query strings above would require the following two sets of arguments, respectively:

    $www->parameters(a => 1, b => undef, c => 3);
    $www->parameters(a => 1, b => "", c => 3);

To then add a ``d=4'' parameter to either one, you would call:

    $www->parameters(d => 4, 1);

Omitting the extra ``1'' would cause you to erase the previously set parameters.

Values specified should not be URL encoded.

If called without arguments, the list of key/value pairs is returned.

reset_parameters

Resets the parameters. You want to make sure you do this between each request on the same object, unless using url(), which calls this for you.

query_string

This function serves the same purpose as parameters(), except that it takes a query string as input instead of a list. Like parameters(), the default behaviour is to replace any existing parameters unless a second, true argument is provided.

Note that if you already have your parameters in some sort of list, it is preferable to pass them to parameters() than to join them into a query string and pass them into this function, because this function just splits them back up into a list again.

You can also provide a query string (along with a host, path, and possibly other data) using the url() method.

If called without arguments, the current parameters will be joined into a valid query string and returned.

strict

This function is used to tell the GT::WWW object to allow/disallow standard-violating responses. This has a global effect of allowing query strings to contain _any_ characters except for ``\r'', ``\n'', and ``#'' - normally, characters such as /, ?, and various extended characters much be escaped into %XX format. The strict option may have other protocol-specific effects, which will be indicated in each protocol's documentation.

The option defaults to non-strict.

post_data

This function allows you to pass in raw data to be posted. The data will not be encoded. If you pass in a code reference, the data will be posted in chunks.

agent

Used to set or retrieve the User-Agent string that will be sent to the server. If the agent string you pass starts or ends with whitespace or a comma, the default agent will be added at the beginning of end of the User-Agent string, respectively. This value is only meaningful to protocols supporting something similar to the HTTP User-Agent.

default_agent

Returns the default user agent string. This will be automatically used if no agent has been set, or if an agent ending with whitespace is specified. This value is dependent on the protocol being used, but is typically something like ``GT::WWW::http/1.23''. This method is read-only.

chunk

chunk_size

chunk and chunk_size are used to perform a large download in chunks. The chunk() method takes a code reference that will be called when a chunk of data has been retrieved from the server, or a value of undef to clear any currently set chunk code. chunk_size() takes a integer containing the number bytes that you wish to retrieve at a time from the server; the chunk code reference will be called with a scalar reference containing up to chunk_size bytes.

Note that when using chunked downloading, the data will not be available using the normal content retrieval interface.

Also note that, as of 1.024, the chunk code reference only applies to the next get() or post() request - after each get() or post() request, the chunk_code is cleared (in order to avoid self-references and possible memory leaks).

cancel

cancelled

The cancel method can be used in conjunction with the chunk option to abort a download in progress. The chunk code will not be called again, and the server connection will be closed. This should be used sparingly and with care. cancelled simply return a true/false value indicating whether the operation has been cancelled. This value is reset at the beginning of each operation.

Note that cancelling an operation is never performed automatically, and only happens - if ever - in the chunk code reference, so checking the cancellation status is rarely needed.

debug_level

This is used to set or retrieve the debug level. 0 = no debugging 1 = debugging related to current operation 2 = adds operation details to debugging level 1 3 = adds data debugging (very large!) to debugging level 2

When passed as part of a hash to new(), the key for this option can be specified as debug instead of debug_level.

error

This method will return a string containing an error that has occured. Note that an error may be generated even for methods that _seem_ to be correct - for example, if a server unexpectedly closes the connection before properly finishing the transfer, a successful return will result since the transfer was partially successful, but an error message will still be set.

fatal_errors

This method will alter the current object's error handling behaviour such that any errors that occur will be propogated to fatal errors. It is enabled automatically when using the quick (i.e. objectless) forms of get(), head(), and post() methods which have no associated object on which ->error can be called.

file

This method is used to create a parameter for uploading a file. It takes either one or two arguments:

2 argument form: First argument is a remote filename, second argument is either a local filename, or a GLOB reference to an open filehandle.

1 argument form: Argument is a filename to read.

Example usage:

    my $file = $www->file("foo.txt");
    $www->parameters(foobar => $file, 1);
    my $response = $www->post();

This will upload the file from disk named ``foo.txt'', using a form parameter named ``foobar''. This is similar to uploading a file named ``foo.txt'' via the following HTML element:

    <input type="file" name="foobar">

The two argument form with two filenames is used to lie to the server about the actual name of the file. Using a filehandle as the second argument is for use when a filename is not available - such as an opened socket, or a file that has been opened elsewhere in the code.

Examples:

    my $file = $www->file("foo.txt", "bar.txt");
    my $file2 = $www->file("foo2.txt", \*FH);
    $www->parameters(foobar => $file, foobar2 => $file2, 1);
    my $response = $www->post();

This will upload two files - a file named foo.txt (which is actually read from the bar.txt file) specified as form parameter foobar, and a second file, specified as parameter foobar2, whose content is read from the filehandle FH.


SEE ALSO

the GT::WWW::http manpage the GT::WWW::https manpage


MAINTAINER

Jason Rhinelander


COPYRIGHT

Copyright (c) 2004 Gossamer Threads Inc. All Rights Reserved. http://www.gossamer-threads.com/


VERSION

Revision: $Id: WWW.pm,v 1.25 2005/04/08 19:25:31 jagerman Exp $