Transact SQL Other Articles Software Reviews
Using the HTTP protocol with PerlScript and ASP
One topic often discussed by ASP programmers is how to access content from other servers using protocols such as HTTP. There are many uses of such procedures, such as ensuring a user entering details into a web form enters a valid URL, or for pulling stock quotes from one site and publishing them via another.
There are several approaches to obtaining content from other servers, and in particular using the HTTP protocol to programmatically access one web page from within another. ASP developers using VBScript or JScript might like to take a look at this article, which describes using an ActiveX object to achieve this. Alternatively the AspHTTP component from ServerObjects Inc. is popular with developers.
An alternative approach is to use the PerlScript ActiveX scripting engine. This allows developers to write ASP documents in Perl, rather than the traditional VBScript or JScript. Like VBScript and JScript, Perl is an interpreted language, and is relatively easy to learn. It has long been the language of choice for many web developers, and due to the long association of Perl with the Internet, it is also unsurprising to find that it offers excellent support for the development of Internet applications. Perl is also a good choice when writing a script to extracting and parsing content from other servers due to its superior text handling capabilities.
If you want to write an ASP document in PerlScript, then you may want to add the following as the first line of your document:
<%@ LANGUAGE="PerlScript" %>
All the code added to this page between the <% %> tags will then be interpreted as PerlScript instead of the servers default scripting language (which is usually VBScript).
Although you can, in theory, mix VBScript, JScript and PerlScript within the same document, this will lead to decreased server performance when compared to using a single scripting engine. More importantly, you run the risk of your ASP document outputting content from the various scripting engines in a different order to that which you might have intended.
One further warning is that there will likely be all kinds of security risks from letting your web pages take input from other web pages. You should, therefore, use this sample code with care, or perhaps restrict its use to an Intranet environment rather than on a publicly accessible Internet site. Dont forget as well that extracting content from third party web services could bring you into legal difficulties unless you have explicit permission to do so!
Anyway, onto the code samples. The first is a function called CheckURL that will determine whether a specified URL exists. The script uses the libwww Perl library, a collection of modules that can be used to programmatically access the web.
This function can then be called using the following PerlScript (changing the required URL as appropriate):
Extending the script
PerlScript offers a wealth of ways for extending the basic script shown above. For example, using the following as the last line of the CheckURL function will cause the script to return the actual HTML from the HTTP request:
This is useful if you want to parse the HTML in order to extract portions of it.
Alternatively, if you are interested in the precise error message returned from a server, then the following code will be useful:
If a URL is not found, then the function will return the following:
An Error Occurred
Writing a link extractor
The following code demonstrates how PerlScript can be used to extract all of the hyperlinks from a document requested using HTTP. There are two functions: ExtractLinks and LinkCollector. ExtractLinks is the main function. LinkCollector is called from ExtractLinks, and is used to gather the requested documents hyperlinks into a list. The two functions are shown below:
The ExtractLinks subroutine can then be called using something like:
If you want to install ActivePerl on your web server, then download it (free of charge)
from the ActiveState website. The installation
routine creates an extensive library of documentation, including reference guides to the
Perl modules and functions described in this article.
You might also like to invest in one of these featured books: