Brettb.Com
  HOME | ABOUT ME | BIOTECHNOLOGY | ARTICLES | GALLERY | CONTACT
Search: Go
TECHNICAL ARTICLES
 ASP
 ASP.NET
 JavaScript
 Transact SQL
 Other Articles
 Software Reviews

PHOTO GALLERIES
 Canon EOS 300D Samples
 Akihabara Maids!
 More Galleries...

TRAVEL LOG
 2009: China
 2008: Tokyo
 2007: Tokyo
 2006: Hong Kong
 2005: New York City

MORE STUFF
 Search Engine Optimisation
 Build an ASP Search Engine
 My Tropical Fishtank
 Autoglass
 SQL Month Name
 SQL Get Date Today
 SQL Year Month
 Other New Stuff...

POPULAR STUFF
 Regular Expressions
 Index Server & ASP
 JavaScript Ad Rotator

Home > Articles

Using the HTTP protocol with PerlScript and ASP

One topic often discussed by ASP programmers is how to access content from other servers using protocols such as HTTP. There are many uses of such procedures, such as ensuring a user entering details into a web form enters a valid URL, or for pulling stock quotes from one site and publishing them via another.

There are several approaches to obtaining content from other servers, and in particular using the HTTP protocol to programmatically access one web page from within another. ASP developers using VBScript or JScript might like to take a look at this article, which describes using an ActiveX object to achieve this. Alternatively the AspHTTP™ component from ServerObjects Inc. is popular with developers.

An alternative approach is to use the PerlScript ActiveX scripting engine. This allows developers to write ASP documents in Perl, rather than the traditional VBScript or JScript. Like VBScript and JScript, Perl is an interpreted language, and is relatively easy to learn. It has long been the language of choice for many web developers, and due to the long association of Perl with the Internet, it is also unsurprising to find that it offers excellent support for the development of Internet applications. Perl is also a good choice when writing a script to extracting and parsing content from other servers due to its superior text handling capabilities.

Using PerlScript

If you want to write an ASP document in PerlScript, then you may want to add the following as the first line of your document:

<%@ LANGUAGE="PerlScript" %>

All the code added to this page between the <% %> tags will then be interpreted as PerlScript instead of the server’s default scripting language (which is usually VBScript).

Although you can, in theory, mix VBScript, JScript and PerlScript within the same document, this will lead to decreased server performance when compared to using a single scripting engine. More importantly, you run the risk of your ASP document outputting content from the various scripting engines in a different order to that which you might have intended. 

One further warning is that there will likely be all kinds of security risks from letting your web pages take input from other web pages. You should, therefore, use this sample code with care, or perhaps restrict its use to an Intranet environment rather than on a publicly accessible Internet site. Don’t forget as well that extracting content from third party web services could bring you into legal difficulties unless you have explicit permission to do so!

Anyway, onto the code samples. The first is a function called CheckURL that will determine whether a specified URL exists. The script uses the libwww Perl library, a collection of modules that can be used to programmatically access the web.

<%
sub CheckURL {
# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to check
$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)");

# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);

#Initialise the HTTP request
$request = new HTTP::Request 'GET' => $url_to_check;

# Set the UserAgent to receive HTML
$request->header('Accept' => 'text/html');

# Send the HTTP request
$result = $ua->request($request);

# Check the outcome of the HTTP request
if ($result->is_success) {
$url_status = "
$url_to_check was detected";
} else {
$url_status = "
$url_to_check was not detected";
}

# Return a string with the status of the request
return $url_status;

}
%>

This function can then be called using the following PerlScript (changing the required URL as appropriate):

<%
$Response->Write(CheckURL("
http://www.brettb.com/"));
%>

Extending the script

PerlScript offers a wealth of ways for extending the basic script shown above. For example, using the following as the last line of the CheckURL function will cause the script to return the actual HTML from the HTTP request:

return $result->content;

This is useful if you want to parse the HTML in order to extract portions of it.

Alternatively, if you are interested in the precise error message returned from a server, then the following code will be useful:

return $result->error_as_HTML;

If a URL is not found, then the function will return the following:

An Error Occurred
404 Object Not Found

Writing a link extractor

The following code demonstrates how PerlScript can be used to extract all of the hyperlinks from a document requested using HTTP. There are two functions: ExtractLinks and LinkCollector. ExtractLinks is the main function. LinkCollector is called from ExtractLinks, and is used to gather the requested document’s hyperlinks into a list. The two functions are shown below:

sub ExtractLinks{

# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to extract links from

$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Use the link extracting HTML parser
use HTML::LinkExtor;

# The URL module is used here to expand URLs by including their base reference
use URI::URL;

# Create a list that will be used to contain details of the links within the document
@LinksList= (); 

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)");

# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);

# Create an instance of the link extracting HTML parser
$parser = HTML::LinkExtor->new(\&LinkCollector);

#Initialise the HTTP request
$result = $ua->request(HTTP::Request->new(GET => $url_to_check),
sub {$parser->parse($_[0])});

# Expand URLs to include the base reference
$base = $result->base;
@LinksList = map { $_ = url($_, $base)->abs; } @LinksList;

# Check the outcome of the HTTP request
# If successful, then return a list of links in the requested document
# otherwise, return an error message

if ($result->is_success) {

for (@LinksList) {
$LinksList = $LinksList . "
$_<br>";
}

return "$LinksList";

} else {
return "
$url_to_check was not detected";
}

}

# A short subroutine to collect the links into a list
sub LinkCollector {

($tag, %attr) = @_;
push(@LinksList, values %attr);

}
%>

The ExtractLinks subroutine can then be called using something like:

<%
$Response->Write(ExtractLinks("
http://www.brettb.com/"));
%>

Further reading

If you want to install ActivePerl on your web server, then download it (free of charge) from the ActiveState website. The installation routine creates an extensive library of documentation, including reference guides to the Perl modules and functions described in this article.

There are plenty of online resources for learning Perl, with http://www.perl.com and http://www.perl.org  being two of the best starting points.

You might also like to invest in one of these featured books:

Learning Perl (2nd Edition)  Effective Perl Programming: Writing Better Programs With Perl

  Site Map | Privacy Policy

All content is 1995 - 2012