GT::WWW - Multi-protocol retrieving and posting, related in functionality to LWP.
GT::WWW is designed to provide a common interface for multiple protocols (as of this writing, only HTTP and HTTPS, however others are planned) and handles HEAD, GET, and POST requests. For non-HTTP-based protocols, what, exactly, a ``HEAD'', ``GET'', or ``POST'' request is depends on the module in question. For example, with FTP ``GET'' might download a file, while ``POST'' might upload one to the server, and ``HEAD'' might return just the size of the file.
The modules under GT::WWW should not be used directly; this module should be used instead. The documentation here describes the options common to all protocols - however you should check the POD of the protocol subclasses (GT::WWW::http, GT::WWW::https, etc.) to see any extra options or methods that those modules provide.
Quick way:
use GT::WWW; my $www = GT::WWW->get("http://server.com/page"); ... = GT::WWW->post("http://server.com/page"); ... = GT::WWW->head("http://server.com/page"); ... = GT::WWW->...("http://user:pass@server.com/page"); ... = GT::WWW->...("https://server.com/page");
# This query string will be parsed and passed as POST input: ... = GT::WWW->post("http://server.com/page?foo=bar;bar=foo");
Longer, but more capable way:
use GT::WWW; my $request = GT::WWW->new();
$request->protocol("http"); $request->host("server.com"); $request->port(8080); $request->path("/path/foo.cgi"); $request->username("user"); $request->password("pass"); $request->parameters(foo => "bar", bar => "foo");
equivelant to the above, using ->url():
$request->url("http://user:pass@server.com:8080/path/foo.cgi?foo=bar;bar=foo");
Now call $request->get(), $request->post(), or $request->head().
Very quick way to print a page:
perl -MGT::WWW=get -e 'print get("http://server.com/page?foo=bar&bar=foo")'
Note that all methods that set values (such as host(), port(), etc.) also return the value when called without any argument.
Call new()
to get a new GT::WWW object. You can call it without arguments to
get a generic GT::WWW object, or use arguments as described below.
url()
method.
new()
with a hash (or hash reference) of options.
Each of the methods described below can be passed in to new in the form of
key => value
pairs - the methods will be called with the values
specified automatically.
These are the methods used to tell the module to actually connect to the server and download the requested page.
When used as GT::WWW class methods or function calls (but NOT as methods on
GT::WWW objects or sub-objects), they take a single URL as an argument. This
call creates an internal GT::WWW object, turns on
fatal_errors(1)
, passes the URL to url()
, then
calls the appropriate get()
, head()
, or post()
method of the resulting
protocol-specific object.
Note, however, that once you have specified a protocol (either via
protocol()
, or as part of a url passed to url()
)
your object ceases to be a GT::WWW object and becomes a protocol-specific
GT::WWW subclass. All subclasses provide their own versions of these methods.
The subclassed methods are not described here because they may not be supported
for all protocols, and their return value(s)
may differ from one protocol to
the next. For more details, see the modules listed in the
SEE ALSO section.
Generally, get()
and post()
return an overloaded object that can be used as a
string to get the content (i.e. for printing), but see the notes in the CAVEATS
section of the GT::WWW::http::Response manpage for anything more complicated than
concatenation or printing.
Takes a URL as argument. The URL is parsed into several fields: protocol
,
username
, password
, host
, port
, path
, and query_string
, then
each of those properties are set for the current object. Also note that
calling url()
on an existing object resets the host, port, username, password,
and all parameters.
Interally, this method calls parse_url()
.
Takes a URI, and returns the following 7 element list:
# 0 1 2 3 4 5 6 ($protocol, $username, $password, $host, $port, $path, $query_string) = GT::WWW->parse_url($url);
URL's require, at a minimum, protocol and host, in URI form:
PROTOCOL://HOST
The URL can extend up to:
PROTOCOL://USERNAME:PASSWORD@HOST:PORT/PATH?QUERY_STRING
Only protocols known to GT::WWW are acceptable. To check if a URL is valid,
check $protocol
.
This method can be called as a class or object method, but not as a function. If called as an object method, the strict option as currently set for the object will be used; as a class method or function, an optional second parameter can be passed in - if true, strict query string parsing mode will be enabled.
Takes a protocol, such as 'http', 'https', 'ftp', etc. Note that when you call
protocol, you object ceases being a GT::WWW object, by becoming a GT::WWW subclass
(such as GT::WWW::http, GT::WWW::https, etc.). Before trying an unknown protocol,
you should generally call the protocol_supported method - calling
protocol(...)
with an unsupported protocol will result in a fatal error.
This method takes a protocol, such as 'http', 'https', 'ftp', etc. In order to make sure the protocol is supported, this checks to see that it is an internally supported protocol, and also tries to load the module to make sure that the module can be loaded.
Returns true in scalar context if the host appears valid, or the host and port in list context if the host is valid. Note that no check is performed to see whether or not the host resolves or is reachable - this simply verifies that the host is at least valid enough to warrant a lookup.
Sets the host, and optionally the port (assuming the argument is of the form: 'hostname:port'). Returns a fatal error if the host is not valid. Note that setting the host will reset the port to the protocol's default value, so this method must be called before port().
Sets the port for the connection. This can be a name, such as ``smtp'', or a
numeric value. Note that the port value will be reset when the host()
method is called, so setting a port must happen after setting the host.
Resets the port so that the next request will use the default port.
Sets or retrieves the login username.
Removes the login username.
Sets the login password.
Removes the login password.
Specifies a timeout for connections, in seconds. By default, a value of 10 is used. If you specify a false value here, the connection time out will be system dependent; typically this is from one to several minutes. Note, however, that the timeout is not supported on Windows systems and so should not be depended on in code that runs on Windows systems.
Sets the path for the request. Any HTTP escapes (e.g. %20) are automatically converted to the actual value (e.g. `` ''). If required, the path will be automatically re-escaped before being sent to the server.
Takes a list (not a hash, since duplicate keys are permitted) of key => value pairs. Optionally takes an extra argument - if true, the parameters are added, not replaced - if omitted (or false), any existing parameters are deleted.
To specify a valueless parameter without a value, such as b in this example query string:
a=1&b&c=3
Pass undef as b's value. Passing ``'' as the value will result in:
a=1&b=&c=3
For example, to set to two query strings above would require the following two sets of arguments, respectively:
$www->parameters(a => 1, b => undef, c => 3);
$www->parameters(a => 1, b => "", c => 3);
To then add a ``d=4'' parameter to either one, you would call:
$www->parameters(d => 4, 1);
Omitting the extra ``1'' would cause you to erase the previously set parameters.
Values specified should not be URL encoded.
If called without arguments, the list of key/value pairs is returned.
Resets the parameters. You want to make sure you do this between each request
on the same object, unless using url()
, which calls this for you.
This function serves the same purpose as parameters()
, except
that it takes a query string as input instead of a list. Like parameters()
,
the default behaviour is to replace any existing parameters unless a second,
true argument is provided.
Note that if you already have your parameters in some sort of list, it is
preferable to pass them to parameters()
than to join them into a query
string and pass them into this function, because this function just splits them
back up into a list again.
You can also provide a query string (along with a host, path, and possibly
other data) using the url()
method.
If called without arguments, the current parameters will be joined into a valid query string and returned.
This function is used to tell the GT::WWW object to allow/disallow
standard-violating responses. This has a global effect of allowing query
strings to contain _any_ characters except for ``\r'', ``\n'', and ``#'' - normally,
characters such as /, ?, and various extended characters much be escaped into
%XX format. The strict
option may have other protocol-specific effects,
which will be indicated in each protocol's documentation.
The option defaults to non-strict.
This function allows you to pass in raw data to be posted. The data will not be encoded. If you pass in a code reference, the data will be posted in chunks.
Used to set or retrieve the User-Agent string that will be sent to the server. If the agent string you pass starts or ends with whitespace or a comma, the default agent will be added at the beginning of end of the User-Agent string, respectively. This value is only meaningful to protocols supporting something similar to the HTTP User-Agent.
Returns the default user agent string. This will be automatically used if no agent has been set, or if an agent ending with whitespace is specified. This value is dependent on the protocol being used, but is typically something like ``GT::WWW::http/1.23''. This method is read-only.
chunk
and chunk_size
are used to perform a large download in chunks. The
chunk()
method takes a code reference that will be called when a chunk of
data has been retrieved from the server, or a value of undef
to clear any
currently set chunk code. chunk_size()
takes a integer containing the
number bytes that you wish to retrieve at a time from the server; the chunk
code reference will be called with a scalar reference containing up to
chunk_size
bytes.
Note that when using chunked downloading, the data will not be available using the normal content retrieval interface.
Also note that, as of 1.024, the chunk code reference only applies to the next
get()
or post()
request - after each get()
or post()
request, the chunk_code is
cleared (in order to avoid self-references and possible memory leaks).
The cancel
method can be used in conjunction with the chunk
option to abort a download in progress. The chunk code will not be called
again, and the server connection will be closed. This should be used sparingly
and with care. cancelled
simply return a true/false value indicating
whether the operation has been cancelled. This value is reset at the beginning
of each operation.
Note that cancelling an operation is never performed automatically, and only
happens - if ever - in the chunk
code reference, so checking the
cancellation status is rarely needed.
This is used to set or retrieve the debug level. 0 = no debugging 1 = debugging related to current operation 2 = adds operation details to debugging level 1 3 = adds data debugging (very large!) to debugging level 2
When passed as part of a hash to new(), the key for this option can be specified
as debug
instead of debug_level
.
This method will return a string containing an error that has occured. Note that an error may be generated even for methods that _seem_ to be correct - for example, if a server unexpectedly closes the connection before properly finishing the transfer, a successful return will result since the transfer was partially successful, but an error message will still be set.
This method will alter the current object's error handling behaviour such that
any errors that occur will be propogated to fatal errors. It is enabled
automatically when using the quick (i.e. objectless) forms of get()
,
head()
, and post()
methods which have no associated object on which
->error can be called.
This method is used to create a parameter for uploading a file. It takes either one or two arguments:
2 argument form: First argument is a remote filename, second argument is either a local filename, or a GLOB reference to an open filehandle.
1 argument form: Argument is a filename to read.
Example usage:
my $file = $www->file("foo.txt"); $www->parameters(foobar => $file, 1); my $response = $www->post();
This will upload the file from disk named ``foo.txt'', using a form parameter named ``foobar''. This is similar to uploading a file named ``foo.txt'' via the following HTML element:
<input type="file" name="foobar">
The two argument form with two filenames is used to lie to the server about the actual name of the file. Using a filehandle as the second argument is for use when a filename is not available - such as an opened socket, or a file that has been opened elsewhere in the code.
Examples:
my $file = $www->file("foo.txt", "bar.txt"); my $file2 = $www->file("foo2.txt", \*FH); $www->parameters(foobar => $file, foobar2 => $file2, 1); my $response = $www->post();
This will upload two files - a file named foo.txt (which is actually read
from the bar.txt
file) specified as form parameter foobar
, and a second
file, specified as parameter foobar2
, whose content is read from the
filehandle FH
.
the GT::WWW::http manpage the GT::WWW::https manpage
Jason Rhinelander
Copyright (c) 2004 Gossamer Threads Inc. All Rights Reserved. http://www.gossamer-threads.com/
Revision: $Id: WWW.pm,v 1.25 2005/04/08 19:25:31 jagerman Exp $