Chapter 24

Scripting for the Unknown: The Control of Chaos


CONTENTS


In this chapter, I introduce techniques for creating applications where the potential response to the client is unknown to the developer or unpredictable. I explore three areas:

Applications that maintain state over more than one HTTP connection.  For building queries where the query is formed by the client, either on a single screen or a simple linear series of predictable screens, maintaining state is a convenience for the user but is not strictly necessary. Recall how state is a convenience to the user in Chapter 21's discussion of the Stock Ticker-SEC EDGAR filing application; the input and output are tightly controlled, and state is used as a mechanism to save preferences. In this case, state saves keystrokes and eliminates redundant choices, but the query domain is well defined-the universe of SEC corporate and fund filings.

Suppose that the query screens that a client selects are not a predictable series, however. You might have a large catalog application with many different categories that don't fit into a linear hierarchy, for example. In this case, maintaining state becomes necessary. A few basic techniques for maintaining state are presented, followed by a more comprehensive example combining these techniques.

I also present an example using Lincoln Stein's Perl 5 CGI.pm module with Netscape Cookies to show Netscape's vision of simplified state maintenance.

Applications that create graphic images on-the-fly.  It is possible to create new images in response to a client's request. The developer doesn't need to anticipate every possible client request. A simple graph of the "hourly summary of bytes transmitted" could be generated on the host server machine by a regularly scheduled job. Of course, the developer doesn't want or need to waste system resources by continually generating images that might never be seen. Instead, graphics can be generated only in response to a client request.

Examples of several graphics packages are given, including gnuplot, a charting program; netpbm, a collection of image-manipulation programs; and gd1.1.1, a C library written by Thomas Boutell. Each of these packages can accept input via the command line or stdin, and they work well in the CGI environment. And, in a grand finale of this section, I present an application that uses both CGI.pm and GD.pm Perl 5 modules to allow a user to vote on movies, have those votes preserved across sessions, and request a dynamically generated graph of the top five movies.

Applications that retrieve data from another server.  Developers can write applications that retrieve data from a separate server that is not under their control. Because they don't know what data the client will request, there often is no practical way for them to retrieve the data on their own machines and store it locally. Or, the remote server might be in a constant state of updating its own data. It therefore is necessary to build applications that retrieve the requested data only when a client request is received.

Two techniques are presented. The first uses Expect, an extension to Tcl, to open a Telnet connection to another server to send and retrieve data. The second uses urlget.pl, a Perl library, to open an HTTP connection to another server to send and retrieve data.

Bridging the Unknown by Maintaining State

Recall that HTTP is a stateless protocol; the client issues a request, the server responds, and the connection closes. At this point, the server-and any gateway programs to which it talks-have presumably "forgotten" about the client and its original request. The clever developer can overcome the statelessness of HTTP by including data in the gateway program response that the client then can use to issue a new request. Three common methods of maintaining state follow:

Via URL data, either in the QUERY_STRING or the PATH_INFO environment variables.

Via HTML form variables with the value set at request time. The variables can be visible to the client, and the user then can alter and resubmit them; or, the variables can be hidden ones that the user cannot alter.

Via Netscape Cookies. Think of the Cookie as a small data token with a peculiar name; the server can set its name and value and pass it to the client, where it is written on the client's local file system. The next time the client accesses the same location, it compares the Cookie domain and path and, if there is a match, it sends the value back to the server. In this way, the server can save state. Furthermore, the server can attach an expiration timestamp to the Cookie. Cookies are preserved even if the client logs off and restarts the Web browser at a later time. In mid-1996, Microsoft's Web browser, Internet Explorer, also started to support Cookies, but its implementation differs in a few significant ways. If a server sets a Cookie's value to null, for example, it indeed has no value in Netscape, but in MSIE, it retains its old value. See http://www.illuminatus.com/cookie_pages/tidbits.html for more details.

Using the QUERY_STRING Variable

I start with a simple script to modify the QUERY_STRING environment variable. Listing 24.1 presents an input box that will set the QUERY_STRING variable and then redisplay the same screen with the text just typed in the input box. The user then can change the value of QUERY_STRING.


Listing 24.1. Modifying the QUERY_STRING environment variable.
#!/usr/local/bin/perl
# modify_query.pl
# modify QUERY_STRING env variable

$thisfile = "modify_query.pl";
$cgipath = "/cgi-bin/book";
print "Content-type:  text/html\n\n";

if($ENV{QUERY_STRING} eq "")
  { $new_query = ""; }
else {
  ($junk, $new_query) = split(/=/, $ENV{QUERY_STRING});
  $new_query =~ tr/+/ /;
  $new_query =~ s/%(..)/pack("c",hex($1))/ge;
  }

print "<B>Modify QUERY_STRING Sample</B><P>";
print "<FORM METHOD=GET ACTION=\"$cgipath/$thisfile\">";
print "Add to query: <INPUT NAME=QUERY VALUE=\"$new_query\"><P>";
print "<INPUT TYPE=SUBMIT>";
print "</FORM>";
exit;

The if test at the start checks for no value for QUERY_STRING, initializes the variable $new_query, and then displays the screen. The next time around, if the user inputs a value, the else statement takes over, decoding the QUERY_STRING variable and updating the value of $new_query. At this point, the developer can take some action, such as searching a database, while retaining the value of QUERY_STRING, which the user then can modify to submit another request.

Using PATH_INFO

In a similar fashion, the PATH_INFO variable also can be used to maintain state. In Listing 24.2, there are four "fields" stored in the extra path information: the first field contains the name of a subroutine to execute, and the other three contain data obtained based on the user's selection of a URL.


Listing 24.2. Using the PATH_INFO variable helps maintain state.
#!/usr/local/bin/perl
# menu.pl
# builds a 'dinner order' using the PATH_INFO environment variable

$thisfile = "menu.pl";
$cgipath  = "/cgi-bin/book";

@entrees = (" ",
            "Surf 'n Turf   /19.95",
            "Pot Roast      /12.95",
            "Fried Chicken  /9.95",
            "Pork Chops     /10.95",
            "Steamed Shrimp /14.00");
@drinks  = (" ", "Beer           /0.95", "Martini        /2.50",
                 "Coffee         /0.75", "Soda           /0.95");
@desserts= (" ", "Cheesecake     /2.95", "Ice Cream      /1.75",
                 "Fresh Fruit    /3.50");

print "Content-type: text/html\n\n";

# The path_info variable split on / and placed into $ variables.
# Note that since path_info contains a lead '/', a throw-away variable,
# $pl, is included in the split statement
($pl, $submenu, $en, $dr, $dt)  = split(/\//, $ENV{PATH_INFO});

# The next statement tests for one of two conditions:
# If this is the first time the script is executed, $submenu is empty,
# or, if the user is coming from a submenu, the first value in path_info
# will be set to 'm'.  In both cases the default "main menu" is displayed.

if( ($submenu eq "") || ($submenu eq "m") ) {
  print "<CENTER><B>Tonight's Menu:</B></CENTER>\n\n";

  print <<ENDOFMAIN;
<CENTER>
<B><A HREF=$cgipath/$thisfile/&en/$en/$dr/$dt>Entrees</B></A><BR>
<B><A HREF=$cgipath/$thisfile/&dr/$en/$dr/$dt>Drinks</B></A><BR>
<B><A HREF=$cgipath/$thisfile/&dt/$en/$dr/$dt>Desserts</B></A><BR>
</CENTER><BR>
Select a link to view tonight's choices.<HR>
ENDOFMAIN

# The decode subroutine reads whatever is in the $en, $dr and $dt
# variables and prints the values at the bottom of the screen.
  &decode; }

# If the value of $submenu is not "" or "m", execute whatever
# subroutine $submenu has the value of...
else {
  eval $submenu; }

# NOTE: the use of eval is always a potential security hole.
# See Chapter 25 under the section "Security Pitfalls of CGI Programming."

exit;

sub en {
print "<B>Select an Entree</B><BR><BR>\n";
$num = 1;
foreach $it (@entrees) {
   ($item, $price) = split(/\//, $it); if($price == 0) {next;}
   print "<B><A HREF=$cgipath/$thisfile/m/$num/$dr/$dt>$item</B>
</A> ($price)<BR>";
   $num++;
   }
}

sub dr {
print "<B>Select a Drink</B><BR><BR>\n";
$num = 1;
foreach $it (@drinks) {
   ($item, $price) = split(/\//, $it); if($price == 0) {next;}
   print "<B><A HREF=$cgipath/$thisfile/m/$en/$num/$dt>$item</B>
</A> ($price)<BR>";
   $num++;
   }
}

sub dt {
print "<B>Select a Dessert</B><BR><BR>\n";
$num = 1;
foreach $it (@desserts) {
   ($item, $price) = split(/\//, $it); if($price == 0) {next;}
   print "<B><A HREF=$cgipath/$thisfile/m/$en/$dr/$num>$item</B>
</A> ($price)<BR>";
   $num++;
   }
}

sub decode {

print "Current Order:<BR>\n";
print "<PRE>\n";
$total = 0;
@order=($entrees[$en], $drinks[$dr], $desserts[$dt]);
   foreach $a (@order) {
   ($item, $price) = split(/\//, $a);
   if($price != 0)
   { printf("%s\t %5.2f \n", $item, $price);
     $total = $total + $price; }

   }

printf("Total cost:\t\$%5.2f\n", $total);
print "</PRE>\n";
}

Listing 24.2 shows a simple method of maintaining the state of a few variables while enabling the user to navigate between various pages. In the "real world," a developer should avoid hardcoding variable data into a Perl script. Instead, the script can be written to read this data from a separate file, as you will see in the next example.

Form Variables

Form variables also can be used to maintain state, and there is a special class available to the developer: hidden variables. Hidden variables are visible to the client using a browser's View Source menu option, but they are hidden because they are not displayed in the HTML response to the client and the client has no capability to alter them with a POST method form. These variables still are "active," though; if the user resubmits the form, the server can use the data in these variables.

Chapter 21, "Gateway Programming I: Programming Libraries and Databases," showed how hidden variables are a possible choice to save user preferences in an ad-hoc database query; Listing 24.3 shows a dynamic order form example in which hidden variables get a lot more exercise. Here, hidden variables are used to store the values of the various fields used. These values also are displayed on-screen, except for the part number, which the client doesn't need to know. When a user places an order, I want to be able to log all the field values to a file on the server (orders_log) and send an e-mail receipt to the user.

The variable data is stored in a separate file-in this case, bolts.dat. This is a fixed-column width file, with each line containing data on one product. The fields in this file are type, item, unit (for example, number of nails per box), price (per unit), and code.


Listing 24.3. Using hidden variables to maintain state.
#!/usr/local/bin/perl
# calc.pl
# Example of maintaining state using form variables.

$thisfile = "calc.pl";
$cgipath  = "/cgi-bin/book";
$input_file = "./bolts.dat";

print "Content-type: text/html\n\n";
print "<TITLE>Nuts and Bolts Order Form</TITLE>\n";

  if($ENV{'QUERY_STRING'} eq "exit")  {&exit;}
  if($ENV{'QUERY_STRING'} eq "order") {&order;}
  if($ENV{'CONTENT_LENGTH'} == 0)     { &setup }

else
{ read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
$buffer =~ tr/+/ /; $buffer =~ s/%(..)/pack("c",hex($1))/ge;

@line=split(/&/,$buffer);

print "<PRE><FORM METHOD=POST ACTION=\"$cgipath/$thisfile\">\n";
print "Item             Unit Type/Price    Quantity        Total\n";
print "----             ---------------    --------        -----\n";

$counter=0; $grand_total=0; $prevtype=""; $order ="";

while($line[$counter] ne "") {

  ($junk, $type) = split(/=/, $line[$counter]);
  print "<INPUT TYPE=hidden NAME=type VALUE=\"$type\">";
  if($type ne $prevtype) {
    print "\n<H3>$type</H3>";
    $prevtype = $type; }

  $counter++;
  ($junk, $item) = split(/=/, $line[$counter]);
  print "$item         ";
  print "<INPUT TYPE=hidden NAME=item VALUE=\"$item\">";
  $order=$order." ".$item;

  $counter++;
  ($junk, $unit) = split(/=/, $line[$counter]);
  print "$unit    ";
  print "<INPUT TYPE=hidden NAME=unit VALUE=\"$unit\">";
  $order = $order." ".$unit;

  $counter++;
  ($junk, $price) = split(/=/, $line[$counter]);
  print "$price    ";
  print "<INPUT TYPE=hidden NAME=price VALUE=\"$price\">";
  $order = $order." ".$price;

  $counter++;
  ($junk, $code) = split(/=/, $line[$counter]);
  print "<INPUT TYPE=HIDDEN NAME=code VALUE=\"$code\">";
  $order = $order." ".$code;

  $counter++;
  ($junk, $quantity) = split(/=/, $line[$counter]);
  print "<INPUT NAME=quantity VALUE=\"$quantity\" SIZE=6>";
  $order = $order." ".$price;

  $total = $price * $quantity;
  $grand_total = $grand_total + $total;
  $out = sprintf("\t %9.2f", $total);
  print "$out\n";
  $order = $order." ".$out."\n";
  $counter = $counter + 1;
}

$line = sprintf("\t\t\t\t\t\t  =======");
$grand_out = sprintf("\t\t\t\t\t\t %8.2f", $grand_total);
print "$line\n";
print "$grand_out<BR>\n";
print "<INPUT TYPE=submit VALUE=\"Calculate current order\">";
print "</FORM>";

print "<FORM METHOD=POST ACTION=\"$cgipath/$thisfile?order\"><INPUT \
TYPE=hidden NAME=order VALUE=\"$order $type $unit $price $total\"><INPUT \
TYPE=SUBMIT VALUE=\"Place Order\"></FORM>\n";

print "<FORM METHOD=POST ACTION=\"$cgipath/$thisfile\"><INPUT TYPE=SUBMIT \
VALUE=\"Erase form and start over\"></FORM>";

print "<FORM METHOD=POST ACTION=\"$cgipath/$thisfile?exit\">
<INPUT TYPE=SUBMIT \VALUE=\"Cancel and Exit\"></FORM><BR>\n";
print "</PRE>";
}
exit;

######
sub setup
{
print "<PRE><FORM METHOD=POST ACTION=\"$cgipath/$thisfile\">\n";
print "Item             Unit Type/Price    Quantity     Total\n";
print "----             ---------------    --------     -----\n";

open(INPUT, $input_file) || die "cannot open $input_file in sub setup\n\n";
$prevtype = "";
while(<INPUT>)
{ $total=0.00; chop;
($type, $item, $unit, $price, $code) = split(/\:/);
print "<INPUT TYPE=hidden NAME=type  VALUE=\"$type\">";
print "<INPUT TYPE=hidden NAME=item  VALUE=\"$item\">";
print "<INPUT TYPE=hidden NAME=unit  VALUE=\"$unit\">";
print "<INPUT TYPE=hidden NAME=price VALUE=\"$price\">";
print "<INPUT TYPE=hidden NAME=code  VALUE=\"$code\">";

if($type ne $prevtype)
  { print "\n<H3>$type</H3>";
    $prevtype = $type; }

#print "$item        $price per $unit   ";
print "$item         $unit    $price   ";

print " <INPUT NAME=\"quantity\" VALUE=0 SIZE=6>              \n";
} #end while
close(INPUT);

print "<INPUT TYPE=submit VALUE=\"Calculate current order\"></FORM>";

print "<FORM METHOD=POST ACTION=\"$cgipath/$thisfile?exit\">
<INPUT TYPE=SUBMIT \VALUE=\"Cancel and Exit\"></FORM><PRE><BR>\n";

} #end of sub setup

sub exit {
print "Thanks for looking... please come back and spend money\n";
# insert logging code here...
exit; }

sub order {
print "Your order will be delivered promptly... thanks!\n";
# insert logging and e-mail code here...
exit; }

The first time the URL is requested, all the amount fields are set to zero. A sample screen, after the user inputs an order, is shown in Figure 24.1. Note that all of the hidden field values are displayed on-screen, except the part number, but that only the quantity field can be modified by the client.

Figure 24.1 : The Nuts and Bolts order form.
Note that in this case, because the form is short, the GET method also could be used. Remember, though, that the amount of data a
query_string can hold is limited, and the user can try out any value by opening the URL

Combining Methods of Maintaining State

In the next example of maintaining state, both the PATH_INFO environment variable and form variables are used to pass data between HTTP connections. This is a simple catalog/shopping cart application; the user can view products by category and add them to a shopping list by clicking on the image of the product. At any time, the user can switch to another category or go to an order page displaying the products selected so far. On the order page, the user can change the quantity to order or submit the order for processing.

First, the user is presented with a list of product categories, as shown in Figure 24.2.

Figure 24.2 : The Chip's 'n Things selection screen.

Listing 24.4 shows the .html file for this screen.


Listing 24.4. An order-entry application.
<TITLE>Chips Catalog</TITLE>
<H1><CENTER>Chips 'n Things</CENTER></H1>

Select a category:
<IMG align=right SRC=/icons/chipx.gif>
<BR>
<BR>
<A HREF=/cgi-bin/book/chips/chips.pl//kb>Keyboards</A><BR>
<A HREF=/cgi-bin/book/chips/chips.pl//dd>Disk Drives</A><BR>
<A HREF=/cgi-bin/book/chips/chips.pl//cr>Cards</A><BR>
<A HREF=/cgi-bin/book/chips/chips.pl//cs>Computer Systems</A><BR>
<A HREF=/cgi-bin/book/chips/chips.pl//pr>Printers</A><BR>
<A HREF=/cgi-bin/book/chips/chips.pl//pe>Peripherals</A><BR>
<BR>

The PATH_INFO data for each URL on this screen includes the product code that will be used to find all the products matching the category selected. (The first path_info field is blank; this will be used shortly.) The product data is stored in a flat file containing fields delimited by colons, as shown in this code:

cs001:AT BLOWOUT!:Speedy 12Mhz Chip! Priced to Move! Must sell!
 3 Serial/2 Parallel \Ports Status Lights and More :327.69
cs002:386 SPECIAL:Ultra-Fast 17.5Mhz Chip! Priced to Move! Must sell!
 Includes \Z-80 Emulation! Status Lights and More :164.88

The fields in this file are product code, product name, text description, and unit price.

After the user selects a category, a random session ID number is generated and placed in the first PATH_INFO field on subsequent screens.

The script then opens the product data file and finds and displays all text and images matching the category selected. If no image is available for a product, a URL and the text still are provided.

Figure 24.3 shows a sample product display screen.

Figure 24.3 : A product display screen with an image.

After choosing some products, the user can display a list of products selected, as shown in Figure 24.4.

Figure 24.4 : A sample list of products selected.

At this point, the user can change the quantity of products and update the page or submit the order for processing. Listing 24.5 shows the "shopping cart" catalog script.


Listing 24.5. The shopping cart script, chips.pl.
#!/usr/local/bin/perl
# chips.pl
# Simple "shopping cart" catalog

$product_data = "product.data";
$image_dir = "/web/clients/icons/chips/";

print "Content-type: text/html\n\n";
($path1, $session_id, $current, $code) = split(/\//, $ENV{PATH_INFO});

if($session_id eq "") {&set_session_id;}
$order_file = "./orders/$session_id.tmp";

read(STDIN, $post_query, $ENV{'CONTENT_LENGTH'});
%post_query = &decode_url(split(/[&=]/, $post_query));
$action = $post_query{"order"};

if($code) {$amount = 1; &add_product;}
if($action =~ m/recalc/) { &recalc; }
if($action =~ m/order/) { &show_order; }
if($action =~ m/place/) { &place; }

&show_products;
exit;

sub show_products {
print "<B>Click on the image</B> to add a product to your shopping cart.
<HR>\n";
open(INPUT, "$product_data") || die "cannot open $product_data\n";
while(<INPUT>) {
  ($code, $name, $text, $price) = split(/:/);
  $image_name = $code.".gif";

     if($code =~ m/$current/) {
     print "<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/$current/$code>";
     if(-e "$image_dir$image_name")
     {print "<IMAGE SRC=/icons/chips/$image_name>";}
     else {print "(No Image Available)<BR><BR>\n";}
     print "</A><BR>\n";
   
     print "<B>$name</B><BR>\n";
     print "Product code: $code<BR>\n";
     print "Price: \$$price<BR><BR>\n";
     print "$text<HR>\n";
     } #endif

} #end while

&print_links;
print "<FORM METHOD=POST ACTION=/cgi-bin/book/chips/chips.pl
/$session_id/$current><BR>\n";
print "<INPUT TYPE=HIDDEN NAME=\"order\" VALUE=\"&show_order\">";
print "<CENTER>";
print "<INPUT TYPE=SUBMIT VALUE=\"View Current Order\">\n";
print "</CENTER></FORM>\n";

} #end show_products

sub show_order {
print "<CENTER><B>Current Order</B></CENTER>\n";
print "<FORM METHOD=POST ACTION=/cgi-bin/book/chips/chips.pl
/$session_id/$current><BR>\n";

print "<PRE>\n";
print "Code  Product                Price   Quantity        Total\n";
print "----  -------                -----   --------        -----\n";
open(INPUT, "$order_file") || die "cant open $order_file\n";
open(LOOKUP, "$product_data") || die "cant open $product_data\n";

$total = 0; $grandtotal = 0;
while(<INPUT>) {
chop;
($current_code, $amount) = split(/:/);
  open(LOOKUP, "$product_data") || die "cant open $product_data\n";
  while(<LOOKUP>) {
  chop;
  ($code, $name, $text, $price) = split(/:/);
  if($current_code =~ m/$code/) {
    $total = $price * $amount; $grandtotal = $grandtotal + $total;
    $padln = 20 - length($name); $pad = " " x $padln;
    print "$code $name $pad $price      <INPUT TYPE=TEXT SIZE=4 NAME=$code
 VALUE=$amount>";
    $out = sprintf("%9.2f", $total); print "    $out\n";
    close(LOOKUP);
    last; }
  }

}
close(INPUT);
$out = sprintf("\t\t\t\t\t\t  ========\n\t\t\t\t\t\t %9.2f", $grandtotal);
print "$out";
print "</PRE>\n";

print "<INPUT TYPE=HIDDEN NAME=\"order\" VALUE=\"recalc\">";
print "Change the quantity of items and :  ";
print "<INPUT TYPE=SUBMIT VALUE=\"Update Page\"><BR><BR></CENTER>\n";

print "</FORM><BR>\n";
print "Go back to product category:<BR>\n";
&print_links;

print "<FORM METHOD=POST ACTION=/cgi-bin/book/chips/chips.pl
/$session_id/$current><BR>\n";
print "To place this order, input your e-mail address:<BR>
<INPUT TYPE=TEXT NAME=EMAIL>  \n";
print "<INPUT TYPE=HIDDEN NAME=\"order\" VALUE=\"place\">";
print "<INPUT TYPE=SUBMIT VALUE=\"Place Order\"><BR>\n";
print "</FORM>\n";
exit;
}

sub place {
$email = $post_query{"EMAIL"};
$email =~ s/['";\s+]//g;
open(OUTPUT, ">>$email.order") || die "Cant open $email order file...\n";
open(INPUT, "$order_file") || "die Cant open $order_file in sub place\n";
while(<INPUT>) {
  print(OUTPUT);
}
close(INPUT); close(OUTPUT);
print "<B>Thank you</B> for the order... a package will be
 arriving shortly<BR>\n";

exit;
}

sub recalc {
open(OUTPUT, ">$order_file") || die "cant open $order_file in sub recalc\n";
  while (($subscript, $value) = each(%post_query)) {
     if($subscript =~ m/order/) {next;}
  print(OUTPUT "$subscript:$value\n");
  }
close(OUTPUT);
&show_order;
}


sub set_session_id {
srand();
$session_id = int(rand(10000));
}

sub add_product {
open(OUTPUT, ">>$order_file") || die "cant open $order_file\n";
print(OUTPUT "$code:$amount\n");
close(OUTPUT);
}

sub decode_url {
  foreach (@_) {
  tr/+/ /;
  s/%(..)/pack("c",hex($1))/ge; }
  @_; }

sub print_links {
print "<PRE>";
print<<ENDOFLINKS;
<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/kb>Keyboards</A>            \
<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/dd>Disk Drives</A>     \
<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/cr>Cards</A>
<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/cs>Computer Systems</A>     \
<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/pr>Printers</A>        \
<A HREF=/cgi-bin/book/chips/chips.pl/$session_id/pe>Peripherals</A>
ENDOFLINKS
print "</PRE>";

}

sub debug {
print "post_query= $post_query<BR>\n";
print "action = $action<BR>\n";
print "path1 = $path1<BR>\n";
print "session_id = $session_id<BR>\n";
print "current category = $current<BR>\n";
print "code = $code<BR>\n";
exit;
}

The purpose of this example is to demonstrate a few possibilities for combining methods to maintain state that are available to the developer.

There are other ways to accomplish the same functionality, and the developer should consider the following questions when approaching such a task:

How much error-checking will be necessary when using the path_info and query_string variables?  Listing 24.5 does no checking of the path_info and query_string values. If the data being passed through these variables is difficult to validate, the developer could run into problems passing data this way.

How will the screen look to the client?  To pass data via a method=post form, it is necessary to include a type=submit button or a type=image tag. In addition, to offer different values for the same variable, it becomes necessary to have multiple <form> and </form> tags on the same screen for each set of values. The image type might not work with all browsers, and a screen can become aesthetically undesirable with multiple Submit buttons spread out over the screen.

The developer must balance the security and ease of use offered by post method forms against the compatibility and aesthetics of using or including other methods. These issues are beyond the scope of this chapter. By studying sites on-line and understanding the methods those sites use, however, the developer can choose which methods are suitable for a given task.

Handling Netscape Cookies with CGI.pm

The Netscape Cookie achieves a de facto persistent connection between client and server by enabling a bi-directional data token, the Cookie, to be passed between them. The server can set the name of one or more Cookies, with an arbitrary value, an expiration, a path, a domain, and an expiration timestamp for each. The client writes the Cookie information to the local file system (in UNIX, typically in the ~/.netscape directory) and, on subsequent connections to the same server, does a comparison match. If the domain and path match, the client sends all matching Cookies back to the server. Netscape mentions that there are limits on the number of cookies the client can store simultaneously and specifies a minimum capacity of 300 total Cookies, 4KB per Cookie, and 20 Cookies per server or domain.

In practice, CGI scripts are altered slightly to add the Cookie information to HTTP headers. Netscape gives the general syntax(See note) of the Cookie data format as the following:

Set Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN_NAME; secure

Explanations of this syntax follow:

NAME  The Cookie name chosen by the developer.

VALUE  The arbitrary data with which the developer fills this NAME.

expires=DATE  An optional parameter. DATE is a value that must be formatted as follows: day of the week, DD-MM-YY HH:MM:SS GMT. No other time zones are permitted-for example,

expires=Friday, 05-Jul-96 22:14:48 GMT

domain=DOMAIN_NAME  An optional tag. If the domain is specified, only hosts in that domain are allowed to set a Cookie.

path=PATH  An optional tag; if none is specified, the PATH is taken as the document root of the server (/).

secure  Also is optional; if it is specified, the Cookie is transmitted only on a secure channel (Netscape's Secure Sockets Layer (SSL); see Chapter 25).

The Cookie format is chosen purposely to resemble standard CGI name/value variable pairs.

The general scheme is straightforward: The server sets the Cookie, and the client writes the Cookie on the client file system. On subsequent connections, the client first looks at the domain starting at the right and proceeding toward the left. Domains ending with COM, EDU, NET, ORG, GOV, MIL, or INT require only two periods in the domain name (for example, www.sec.gov); others require three. If the domain matches, the client then examines the path (if specified in the Cookie). If both the domain and path match, this is a signal for the client to send back the Cookie information to the server, where a CGI process can perform logical checks. If all multiple Cookies match, they are sent back to the server with the more specific path matches sent first and the more general sent last. A Cookie with a path match of / (the document root) is sent after a Cookie with a path match of /cgi-bin. If the server wants to explicitly delete a client Cookie altogether, it just needs to send a Cookie with an expiration tag that is prior to the current date and time.

Implementing ideas in this discussion will become clearer with an example. Fortunately, Perl developers can use a Perl 5 module, CGI.pm, to facilitate Netscape Cookie handling.

Cookie Application Using CGI.pm

The sample Cookie application is a simple guessing game.(See note) The client is supposed to guess which chess Grandmaster comes from Odessa. If the guess is incorrect, the bad guesses pile up on the right. The correct guess takes the user to a different page.

The use of Cookies comes into play to preserve the bad guesses for a certain amount of time (the expires tag), even if the client logs off altogether and comes back to the game later. Figure 24.5 shows the game board after a few incorrect guesses.

Figure 24.5 : The user hasn't guessed the right Grandmaster yet, and the bad guesses are preserved between sessions for two hours.

Notice the Cookie, which is echoed to the screen for education purposes in Figure 24.5. It follows the formatting guidelines discussed in the prior section. As users pile up bad guesses, the Cookie value continues to grow.

The users keep guessing until they pick the right entry: Lev Alburt (I didn't say this was an easy game!). In that case, the winning screen appears, as shown in Figure 24.6.

Figure 24.6 : The script uses the Location header to redirect the client to the winning page after a lucky guess.

Now look at the Cookie.pl code, shown in Listing 24.6, which is written in Perl 5. Using Perl 5, I can take advantage of the flexible CGI.pm module to call handy functions to set and get Cookies.


Listing 24.6. The Cookie.pl code.
#!/usr/local/bin/perl
#
#  Use Lincoln Stein's CGI.pm Version 2.21
#  Example of Netscape Cookies using CGI.pm
#####################################################

use CGI qw(:standard);

@GMS=('Alburt', 'Kupreichik', 'Geller', 'Chernin', 'Lein',
      'Fedorowicz', 'Kamsky');


@old_guesses = cookie('grandmasters');  # this retrieves cookie info from client

# Get the new Grandmaster guess from the form
$new_guess = param('new_gm');
#
# If the action is 'Guess', then check the guess to see if it's a winner.
# If it's not, check it to see if it has been guessed already.  If it's a new
# guess, push it onto the old_guesses pile.
#
# If the user instead clicks on Clear Guesses, wipe out the old_guesses pile.
#
if (param('action') eq 'Guess') {
    if ($new_guess eq "Alburt") {
       print "Location: /winner.html \n\n";
       exit 0; }  # winner is redirected to a winner page and game ends.
    $msg = "Thank you for guess $new_guess but unfortunately this gentleman does not come from Odessa...";

    $gmlist = join(//,@old_guesses);  # could also do this in a subroutine.

    if ($gmlist =~ /$new_guess/) {
        $msg = "You have already guessed $new_guess"; }
    else {  
        push(@old_guesses,$new_guess);  }  # push guess onto list
}  

elsif (param('action') eq 'Clear Guesses') {
   @old_guesses=" ";  }  # wipe out old guesses

$old_guesses = join(' ',@old_guesses);

# Add new grandmaster guess to the list of old ones, and put them in a cookie
$the_cookie = cookie(-name=>'grandmasters',                                     &n bsp;    -value=>$old_guesses,  # store a string of guesses
                    -path=>'/cgi-bin',
     -expires=>'+2h');  # shorthand for "2 hours from now# Print the header, incorporating the cookie and the expiration date...
print header(-cookie=>$the_cookie);

# Now we're ready to create our HTML page.
print start_html('Guess the Grandmaster');

$the_cookie =~ s/%(..)/pack("c",hex($1))/ge;  # unescape encodings

print "Cookie ===> <b> $the_cookie </b> <hr>";  # show cookie for demo
print "$msg ";                                  # show guess status

#
# Now show main game board
#
print <<EOF;
<h1>Guess the Grandmaster</h1>
Guess which Grandmaster comes from Odessa, and click 'Guess' or
click 'Clear Guesses' to wipe out the old guesses.  The
original Grandmaster guesses will be kept for 2 hours.


<p>
<em>You must be running Netscape browser for this to work. </em>
<p>
<center>
<table border>
<tr><th>Guess<th>Old Guesses
EOF
    ;

print "<tr><td>",start_form;
print scrolling_list(-name=>'new_gm',
             -values=>[@GMS],
             -size=>8),"<br>";

print submit(-name=>'action',-value=>'Guess'),
      submit(-name=>'action',-value=>'Clear Guesses');

print end_form;

print "<td>";
if (@old_guesses) {
#   print "<ul>\n";
    foreach $i (0 .. $#old_guesses) {print "$old_guesses[$i] <br>";
    }
    print "</ul>\n";
} else {
    print "<strong>no gms guessed yet</strong>\n";
}
print "</table></center>";
print end_html;
exit 0;

Code Discussion: Cookie.pl

The object-oriented look of this program is quite a change from the Perl 4.036 programs you saw in Chapters 19 through 22. The Cookie-handling facilities presented here, however, are actually quite simple once you get comfortable reading the code. By the way, Lincoln Stein makes available a full discussion of the CGI.pm syntax at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/.

You will notice logic in this code to avoid stacking up duplicate guesses (which is important in persistent connections, because often the client can lose track of prior guesses across logon sessions). The main output logic is simply to use a two-column Netscape table-the left column for the universe of possible guesses and the right to keep track of the incorrect guesses.

The Cookie.pl example expands the important Cookie-handling statements.

Cookie CodeWhat It Does
@old_guesses = cookie('grandmasters'); This line retrieves Cookie information from the client (if any) and sets the Cookie value equal to the array @old_guesses. Note that I could have set the Cookie value to a scalar or an associative array just as well, which is quite flexible.
$the_cookie =
       -value=>$old_guesses,
       -path=>'/cgi-bin',
      -expires=>'+2h');
I assign a scalar variable to the cookie contents. By this time, the variable $old_guesses already incorporates the client's new guess. This prepares me to send the updated Cookie to the client. The shorthand +2h writes a Cookie expiration timestamp of 2 hours from now in the peculiar GMT format style required by Netscape (as discussed previously).
print header(-cookie=>$the_cookie); This statement writes the HTTP header extension to send the Cookie to the client file system. The updated guess list is passed to the client, and the next time the client establishes a connection to this server and this path (/cgi-bin), this cookie (and all other matching Cookies) is sent to the server. If too much time elapses before the reconnection, though, the expiration time is reached and the Cookie expires.

An interesting final note: Netscape Cookies do not permit white space, commas, or semicolons in the Cookie value. However, CGI.pm handles this and automatically encodes impermissible values within the Cookie value. This is another handy feature of an excellent module. Netscape does not specify a particular encoding mechanism; the usual choice is the standard URL-encoding scheme that you saw in Chapters 19 and 20.

Generating Graphics at Runtime

The Common Gateway Interface makes it possible for the developer to write scripts that create new graphic images on-the-fly-at the time a client makes a request. Dynamic graphic manipulation is one of the most eye-catching classes of Web programming applications and is a testament to the flexible and extensible nature of the base HTTP protocol-properties stressed in Chapter 19, "Principles of Gateway Programming."

Most readers probably are familiar with page access counters and graphs of Web site statistics. Several Plug-and-Play types of packages are available to perform these functions. I illustrate two techniques to aid the developer in creating similar applications from scratch.(See note)

In addition to the fixed type of images, it is possible to create just about any type of image-either from preexisting files or wholly from scratch. I demonstrate the use of two well-known, freely available packages, NetPBM and gd1.1.1, and follow up with GD.pm, a Perl 5 module that allows you to execute Perl methods against the native gd libraries.

Before embarking on code samples, it's important to review the most important graphics file formats. Knowledge of the basic properties of these formats can come in very handy for web developers; even those who think they are stuck in a text-only Web site sooner or later are likely to be involved in a design effort involving graphics.

Image File Formats

A large number of graphics file formats is available to computer users. (See note) The GIF format is the standard format that graphical browsers accept for inline images. The JPEG format also is commonly recognized for inline images, and the other two formats described here might be of interest to the developer:

GIF  All graphical Web browsers support the Graphics Interchange Format for inline images. (See note) This format was developed by CompuServe and uses the LZW compression algorithm. GIF images support only 8-bit color-they are limited to 256 colors. The 1989 version of the format introduced multimedia extensions, which have largely been ignored, with two exceptions: transparency and animation.

Note
"How do I make my images transparent?" This same question seems to be posted to every Usenet newsgroup in the comp.infosystems.www.* hierarchy on a daily basis. The GIF89 specification allows the image to have one color defined as transparent, meaning that the color will appear as the same color as the background on which the image is displayed. Of course, if the image is composed of many different colors, there might be no suitable color to relegate to background status, and transparency then would be ineffective. A number of tools are available for most platforms to convert a plain GIF to a transparent GIF. (See note)

A patent on the LZW compression algorithm is held by Unisys, which it recently has decided to assert. (See note) Any commercial software created or modified after January 1, 1995 is subject to this patent. You can find more information on this at http://www.unisys.com/.

JPEG  Newer releases of popular browsers such as Netscape now support inline display of the Joint Photographic Expert Group's JPEG format. (See note) JPEGs support up to 24-bit color (16.8 million colors) and are compressed. The amount of compression can be varied to produce files with smaller size and poorer image quality, or vice versa. JPEGs do not have any extensions to allow for transparency, as with GIFs.

Two utilities that a developer will find handy, which are not included in the NetPBM package, are cjpeg and djpeg; these convert images to and from the JPEG format. (See note) These utilities function much like the NetPBM utilities. For example,

djpeg -colors 255 fish.jpg > fish.pnm

dumps the JPEG file to PNM format, reducing the number of colors to 255 in the process.

PNM (Portable Anymap), PPM (Portable Pix Map), PBM (Portable Bit Map), and so on  The developer will encounter these formats when using the NetPBM package. (See note) For the most part, these formats are interchangeable when using NetPBM utilities, with the exception of the monochrome PBM format. PBM files can't necessarily be mixed with the other formats, because PBM files are only monochrome, whereas the others are not. In this chapter's NetPBM section, I give examples of using these formats.

PNG  The Portable Network Graphics format is a newly proposed format currently under development as a replacement for GIF-partly in response to the Unisys patent claims and partly to overcome some of the limitations of GIFs. The specification is at release 10, considered stable, and code already is appearing to display and manipulate PNG images.(See note) After popular browsers such as Netscape and Mosaic support inline PNG format images, expect that a NetPBM utility program will appear. As for gd, I asked Tom Boutell whether he plans to make a PNG implementation, and he responded with the following:

Hello!

Yes, I do plan to write a version of gd (or something gd-like) that supports PNG as well as GIF. It'll take some doing, because gd is centered around the notion of palette-based images, and PNG supports both palette and truecolor images, but it'll happen... -T

Access Counters

The astute Web surfer might have noticed that pages with access counters embedded within the page are plain HTML; that is, the URL is not a program that generates HTML at the time the client makes the request. So, how does the new image get created? The following script illustrates this simple "trick"-using the tag <IMG SRC=[executable]>. In this example, I have an HTML page with the URL http://some.machine/today_in_chess.html:

<HTML><TITLE>Today in Chess</TITLE>
<CENTER><H1>Today in Chess</H1></CENTER>
This Day in Chess...
<img src = /cgi-bin/random.pl><BR>
</HTML>

The <IMG SRC=/cgi-bin/random.pl> tag executes the script shown in Listing 24.7 when the client retrieves the URL.

In this script, I have a series of recently created images of chess positions residing in a file directory, and the user sees a different, randomly selected image each time the page is loaded. (This technique even works with the enhanced Netscape body background tag-for example, <body background="/cgi-bin/random.pl">, which can lead to amusing displays.)


Listing 24.7. Displaying a random image with random.pl.
#!/usr/local/bin/perl
# random.pl
# display a random image from /web/clients/icons/icc/temp

$date = 'date';
$date =~ chop($date);
$image_dir = "/icons/icc/temp";
$doc_root = "/web";
@files = 'ls $doc_root$image_dir';
srand();

if(@files == 0) {
  print "Content-type: text/html\n\n";
  print "<B>Error - no files found</B><P>";}

else
{ $size = @files;
  $file_number = int(rand($size));
  $printname = $files[$file_number];
}  

print(STDOUT "Date: $date\n");
print(STDOUT "Last-Modified: $date\n");

print(STDOUT "Content-type: image/gif\n\n");
$data = 'cat $doc_root$image_dir/$printname';
print("$data");
exit;

Note
You can use the last few lines of Listing 24.7 to send a different type of media back to the user. For example,
print(STDOUT "Content-type: audio/wav\n\n");
$data = 'cat $image_dir/$printname';
print("$data");
will work if the script pointed to a library of .wav files and the client is configured to play .wav files. The Perl script, however, cannot be embedded within regular HTML; <img src=/[executable]> won't work, and there is no equivalent <audio src> or other MIME-type tag.

Now I'll use this technique to create an access-counter application. This script reads a file, access_count, which contains the current number of hits for the page referencing the script. The HTML page references the script by including the tag <img src = /cgi-bin/random.pl> as shown previously. In the directory in which the script executes are separate image files (in PNM format) for each digit, which are used to create the completed image.

Upon execution, the following steps are performed by the script:

  1. The current count is read and increased by 1.
  2. The new number is split up into digits into an array.
  3. A loop is used to create command-line input from the array consisting of the file names of the appropriate digits.
  4. The new image is constructed and sent back to the client.

Note that this script uses several utilities in the NetPBM package, which is described later in this chapter. Listing 24.8 shows the access-counter application.


Listing 24.8. Using the NetPBM utilities.
#!/usr/local/bin/perl
# access_count.pl

NEED to CLEAN UP PATHNAMES

$counter_file = ".access_count";
$pnm_file = "access_count.pnm";
$gif_file1 = "temp1.gif";
$gif_file2 = "temp2.gif";

$total = 'cat $counter_file';
$total++;
open(OUTPUT, ">$counter_file") || die "cant open $counter_file\n";
print(OUTPUT "$total");
close(OUTPUT);

@chars=split(//, $total);
$number = @chars;
$counter = 0;
while($counter < $number)
{ $cat = $cat." @chars[$counter].pnm";
  $counter++; }
$cat = "pnmcat -white -lr ".$cat;

eval 'rm -f $pnm_file $gif_file1 $gif_file2';
eval '$cat |pnmcrop | ppmtogif >$gif_file1';
eval 'interlace $gif_file1 $gif_file2 \n';
eval 'cp $gif_file2 /web/clients/icons/ebt/';

print(STDOUT "Date: $date\n");
print(STDOUT "Last-Modified: $date\n");
print(STDOUT "Content-type: image/gif\n\n");
$data = 'cat /web/clients/icons/ebt/$gif_file2';

print("$data");
exit;

By creating his own access counter, the developer gains flexibility in how the count is presented.

Gnuplot and Server Stats

Access graphs are another well-known type of on-the-fly graphic with which most readers are familiar. Gnuplot(See note) is a popular package for creating graphs and is available for a variety of platforms. This well-documented program can accept instructions from a file supplied on the command line and can output images in PPM format. The ppmtogif utility then is used to convert the file to GIF format for display to the client.

The script shown in Listing 24.9 reads the server's access log and produces a graph of bytes transmitted by hour for the current date.


Listing 24.9. Graphing the access log using gnuplot and the NetPBM utilities.
#!/usr/local/bin/perl
# chart.pl
# Produce a chart of current day's access in bytes
# from NCSA http access_log

$log_file = "/web/httpd/logs/access_log";
$pid = $$;
$today_log = "today.$pid.log";
$plot_data = "today.$pid.plot.data";
$gnu_file = "today.$pid.plot";
$ppm_file = "today.$pid.ppm";
$gif_file = "today.$pid.gif";


($dowk, $month, $day) = split(/\s+/, 'date');

eval 'grep "$day/$month" $log_file > $today_log';

open(INPUT, "$today_log") || die "can't open $today_log";
open(OUTPUT, ">$plot_data") || die "can't open $plot_data";
$hour_bytes = 0;
$current_hour = 0;
while(<INPUT>)
{
chop;

$test_byte_size = substr($_, -1);
if($test_byte_size eq " ") {next;}

($rhost, $ruser, $userid, $dtstamp, $junk1,
$action, $filename, $version, $result, $bytes) = split(/\s/, $_);

@dfields = split(/\:/, $dtstamp);
$hour = int($dfields[1]);
$hour_bytes = $hour_bytes + $bytes;

if ($hour != $current_hour)
{ $hour_bytes = $hour_bytes - $bytes;
  print(OUTPUT "$current_hour $hour_bytes\n");
  $hour_bytes = $bytes;
  $current_hour = $current_hour + 1;
}

}
print(OUTPUT "$current_hour $hour_bytes\n");
close(INPUT);
close(OUTPUT);
open(OUTPUT, ">$gnu_file") || die "couldn't open $gnu_file";

#NOTE: gnuplot expects "pbm", even though it actually writes out a PPM file
print(OUTPUT "set term pbm small color\n");

# the default size 1, 1 produces a 640¥480 size chart...
print(OUTPUT "set size 0.72, 0.54\n");
print(OUTPUT "set output \"$ppm_file\" \n");
print(OUTPUT "set title \"Hourly Bytes Transmitted for $month $day\" \n");
print(OUTPUT "set grid\n");
print(OUTPUT "plot \"today.$pid.plot.data\" using 2 with boxes\n");
close(OUTPUT);

eval 'rm -f today.$pid.ppm';
eval 'gnuplot today.$pid.plot';
eval 'rm -f /web/clients/icons/hydra/today.$pid.gif';
eval 'ppmtogif today.$pid.ppm > /web/clients/icons/hydra/today.$pid.gif';


print "Content-type: text/html\n\n";
print "<TITLE>Today's Byte Count</TITLE>";
print "<img src=http://www.hydra.com/icons/hydra/today.$pid.gif>";

exit;

Run this script, and a graph of the type shown in Figure 24.7 is sent to the client.

Figure 24.7 : A sample graph created by dchart.pl.

Listing 24.9 easily could be customized to accept user queries-for example, "Give me all data for a particular domain," "all data for a certain file directory," and so on. Gnuplot provides the Web master with a powerful and flexible method of quickly producing runtime charts.

NetPBM

The NetPBM package has become a standard tool for web developers. Originally available as PBMPlus and then subsequently enhanced by the Usenet community, NetPBM contains a huge collection of utility programs for converting and manipulating images. Most of the utilities read from stdin and write to stdout. In addition to one-step tasks, as in Listing 24.9, these utilities are well suited for tasks that require several steps.

An exhaustive study of each utility is not necessary; the package includes a comprehensive collection of man pages. In general, you need to perform two to three steps:

Convert the image to portable format (PNM, PPM, PBM, and so on).
Manipulate the image if necessary or desired.
Convert the image back to a format suitable for Web display (usually GIF).

Converting a BMP formatted file to GIF, for example, can be accomplished with the
following:

bmptoppm letter_a.bmp | ppmtogif > a_1.gif

Now I'll do a few manipulations with the image before outputting to GIF:

bmptoppm letter_a.bmp | pnminvert | ppmtogif > a_2.gif
bmptoppm letter_a.bmp | pnmrotate 45 | ppmtogif > a_3.gif
bmptoppm letter_a.bmp | pnmscale -xsize 30 -ysize 25 | ppmtogif >\
a_4.gif
bmptoppm letter_a.bmp | pnmenlarge 2 | ppmtogif >a_5.gif
bmptoppm letter_a.bmp | pnmcrop|pnmenlarge  2|pnmsmooth|pnmsmooth|\
pnmsmooth|ppmtogif>a_6.gif

This series of commands, performed on a BMP image of the letter A, produces the output shown in Figure 24.8.

Figure 24.8 : Output produced by the series of netpbm programs.

Tip
If at first you can't find the proper utility program, keep looking. After you know what you want to do with an image file, there's probably a way to do it with some combination of NetPBM programs. And, unlike most UNIX programs, the NetPBM utilities all have file names that actually indicate what function the program performs.

The majority of NetPBM utilities are for converting images to and from a NetPBM format. In addition to these, the other utilities are what makes NetPBM a standard tool for web developers. To aid the developer in finding which utility to use, Table 24.1 shows a rough categorization of those utilities according to function.

Table 24.1. NetPBM utilities.

OperationUtility Programs
Sizepbmpscale, pbmreduce, pnmenlarge, pnmscale
Orientationpnmflip, pnmrotate
Cut and Pastepbmmask, pnmarith, pnmcat, pnmcomp, pnmcrop, pnmcut, pnmmargin, pnmnlfilt, pnmpad, pnmshear, pnmtile, ppmmix, ppmshift, ppmspread
Colorpnmalias, pnmconvol, pnmdepth, pnmgamma, pnminvert, pnmsmooth, ppmbrighten, ppmchange, ppmdim, ppmdist, ppmdither, ppmflash, ppmnorm, ppmquant, ppmquantall, ppmqvga
Informationpnmfile, pnmhistmap, pnmindex, ppmhist
File Creationpbmmake, pbmtext, pbmupc, ppmmake, ppmntsc, ppmpat
Miscellaneouspbmclean, pbmlife, pnmnoraw, ppm3d, ppmforge, ppmrelief

An HTML Form to Make Buttons

To further demonstrate the use of NetPBM utilities, Listing 24.10 shows an HTML form and Perl script that allow the user to create customized buttons. First, a METHOD=POST form is displayed.


Listing 24.10. Creating customized form buttons with the NetPBM utilities.
<HTML>
<TITLE>Make Buttons</TITLE>
<FORM METHOD=POST ACTION=make_button.pl>
1. Select a button <I>type</I>:<BR>
<PRE><CENTER><INPUT NAME=TYPE TYPE=RADIO VALUE="arrow" CHECKED>\
<IMG SRC=/icons/buttons/arrow.gif>   \
<INPUT TYPE=RADIO NAME=TYPE VALUE="circle"><IMG SRC=/icons/buttons
/circle.gif>   \
<INPUT TYPE=RADIO NAME=TYPE VALUE="rectang"><IMG SRC=/icons/buttons
/rectang.gif>   \
<INPUT TYPE=RADIO NAME=TYPE VALUE="sq_in"><IMG SRC=/icons/buttons
/sq_in.gif>   \
<INPUT TYPE=RADIO NAME=TYPE VALUE="sq_out"><IMG SRC=/icons/buttons
/sq_out.gif>
</CENTER></PRE>

2. <I>Rotation</I> (clockwise):<PRE><center><INPUT NAME=ORIENT VALUE="0" \
TYPE=RADIO CHECKED>As Is
<INPUT NAME=ORIENT TYPE=RADIO VALUE="90">Left       <INPUT NAME=ORIENT \
TYPE=RADIO VALUE="270">Right
<INPUT NAME=ORIENT TYPE=RADIO VALUE="180">Upside Down
</CENTER></PRE>

3. <I>Text</I>:<CENTER><INPUT NAME=TEXT TYPE=TEXT SIZE=10 MAXLENGTH=10><BR>
<INPUT TYPE=submit VALUE="Make Button!">
</CENTER>
</FORM></HTML>

This HTML form displays the screen shown in Figure 24.9.

Figure 24.9 : The selection screen for make_button.html.

After the user enters a selection, the associated Perl script runs, as shown in Listing 24.11.


Listing 24.11. Creating the custom buttons.
#!/usr/local/bin/perl
# Make a button from make_button.html form input

$in_path="/web/icons/buttons/";
$out_path = "/web/icons/buttons/new/";
$pid = $$;

print "Content-type: text/html\n\n";
read(STDIN, $input, $ENV{'CONTENT_LENGTH'});

($field1, $field2, $field3)  = split(/\&/, $input);
($junk, $filename) = split(/=/, $field1);
($junk, $rotate) = split(/=/, $field2);
($junk, $text) = split(/=/, $field3);

$text =~ tr/+/ /;
$text =~ s/%(..)/pack("c",hex($1))/ge;

$in_file = $in_path.$filename.".gif";
$button_file = $pid.".$filename".".pnm";

if($rotate != 0)
  { eval 'giftopnm $in_file | pnmflip -r$rotate > $button_file'; }
else
  { eval 'giftopnm $in_file >$button_file'; }

$text_file = $pid."text".".pnm";
$out_file = $pid.".gif";
$write_name = $out_path.$out_file;
$text_pbm = $pid.".pbm";

%sizes = ("arrow", "48×57", "circle", "58×58", "rectang", "30×60",
 "sq_in", "58×58", "sq_out", "58×58");
if(($rotate == 90) || ($rotate == 270))
  { ($ys, $xs) = split(/x/, $sizes{$filename}); }
else
  { ($xs, $ys) = split(/x/, $sizes{$filename}); }


$text =~ s/[^a-z][^A-Z][^0-9]//g;
if($text ne "")
{eval 'pbmtext "$text" |pnmcrop -white |pnmpad -white -t3 -b3 -l3
 -r3|pnminvert> $text_pbm';
 eval 'anytopnm $text_pbm | pnmscale -xsize $xs -ysize $ys >$text_file';
 eval 'pnmarith -a $text_file $button_file | ppmtogif>$write_name';
}
else
{ eval 'ppmtogif $button_file >$write_name'; }

print "<CENTER>Here's your new Button:<BR><BR>\n";
print "<IMG SRC=/icons/buttons/new/$out_file></CENTER>\n";
exit;

The NetPBM package is a fairly comprehensive set of tools with which any web developer using on-the-fly graphics should become familiar. It is particularly useful when working with preexisting graphics files. For more complex operations, turn your attention to Thomas Boutell's gd library of C functions.

gd1.1.1

The gd library of C functions, developed by Thomas Boutell, picks up where NetPBM leaves off, giving the developer much finer control over graphics output.(See note) This package was designed specifically for creating on-the-fly GIFs. In addition to providing effects that are unavailable or difficult to achieve with NetPBM utilities, a single gd program executes faster than a long series of NetPBM utilities piping data to each other for complicated operations.

Although the developer needs to understand a bit of C, the documentation examples are easy to follow, and you can refer to any basic C book to fill in the blanks.(See note)

I will start off with a simple application, Fishpaper, which draws a fish tank filled with randomly placed fish. This would be a simple series of pasting operations except that I want to overlay irregularly shaped objects on top of each other without erasing or blocking out any of the underlying image. Although this might be possible with NetPBM tools, it wouldn't occur to me to even attempt it because this is a simple job with gd.

First, a Perl script is used to generate command-line arguments for the gd program and to execute the gd program, as shown in Listing 24.12.


Listing 24.12. fishpaper.pl constructs fish images using the gd package.
#!/usr/local/bin/perl
# fishpaper.pl
# randomly constructs fish image from a directory of transparent gifs

$iconpath = "/icons/fish/temp";

@files = ("seahorse.gif", "squid.gif", "anchovy.gif", "fishcor.gif",
 "bluefin.gif", "octopus.gif", "perch.gif", "sailfish.gif");
srand();
$pid = $$;
$out_file = "$pid.gif";
$command_line ="$out_file ";

foreach $filename(@files)
{  
   #$filename =~ chop($filename);
   $number_of_fish = rand(3);
   while($number_of_fish > 0)
   { $x = rand(550);  $x = $x + 50;
     $y = rand(190);  $y = $y + 50;
     $parameter = sprintf("%03d%03d%s", $x, $y, $filename);
     $command_line = $command_line." ".$parameter;
     $number_of_fish-;
   }
}

eval './fish $command_line';

print"Content-type:  text/html\n\n";
print"<TITLE>FishPaper!</TITLE>\n";
print"<IMG SRC=$iconpath/$out_file>\n";
print"<BR>\n\n";
print"<FORM ACTION=/cgi-bin/book/fishpaper.pl>\n";
print"<CENTER>";
print"<INPUT TYPE=submit VALUE=\"Make the fish move\"></FORM>";
print"</CENTER><BR>\n\n";
exit;

After executing the Perl script, an image such as Figure 24.10 is sent back to the client.

Figure 24.10 : An image generated by fishpaper.pl.

The C source uses the gdBrush function to draw the fish in the tank:

Listing 24.13 shows the implementation of these steps.


Listing 24.13. The fish.c code.
/* fish.c */
#include "gd.h"
#include <stdio.h>
#include <string.h>

gdImagePtr tank;
gdImagePtr fish;
int x, y, white;

char outfile[15];
char fishstring[12];

char *return_code;
char current_fish[50];
char new_fish[12];
char back[] = {"underwat.gif"};

char path[] = {"/web/icons/fish/"};
char outpath[] = {"/web/icons/fish/temp/"};
FILE *in;
FILE *out;

main(argc, argv)
int argc;
char *argv[];
{

int fish_counter;
int fish_number;

if (argc < 3)
  { printf("Wrong number of arguments!\n");
    printf("argc=%d\n", argc);
    return(1);
  }

return_code = strcpy(outfile, argv[1]);
fish_counter = argc - 2;


in = fopen(back, "rb");
tank = gdImageCreateFromGif(in);
fclose(out);
fish_number = 2;
while (fish_counter > 0)
  {
   return_code = strcpy(fishstring, argv[fish_number]);
   sscanf(fishstring, "%3d%3d%12s", &x, &y, new_fish);

   fish_number++;
   fish_counter-;
   return_code = strcpy(current_fish, path);
   return_code = strcat(current_fish, new_fish);
   putfish();
}

white = gdImageColorExact(tank, 255, 255, 255);
  if (white != (-1)) { gdImageColorTransparent(tank, white);  }
return_code = strcat(outpath, outfile);
out=fopen(outpath, "wb");
gdImageGif(tank, out);
fclose(out);

gdImageDestroy(tank);
}

putfish()
{
in = fopen(current_fish, "rb");
fish = gdImageCreateFromGif(in);
fclose(in);
white = gdImageColorExact(fish, 255, 255, 255);
  if (white != (-1)) {
  gdImageColorTransparent(fish, white);  }

gdImageSetBrush(tank, fish);
gdImageLine(tank, x, y, x++, y++, gdBrushed);
}

The use of the gdBrush function is what makes this entertaining application click. Replacing the lines

gdImageSetBrush(tank, fish);
gdImageLine(tank, x, y, x++, y++, gdBrushed);

with the straightforward Paste function gdImageCopy pastes the source image as a rectangle, painting over whatever is underneath it.

Using Expect to Interact with Other Servers

Expect is an extension to the Tcl language (see Chapter 26, "Gateway Programming Language Options and a Server Modification Case Study") that can be used to interact with other programs-in particular, programs that require or expect input from the user via the keyboard.(See note) Expect can be used to automate such tasks as retrieving files via FTP; interacting with a password program, such as NCSA's htpasswd (see Chapter 25, "Transaction Security and Security Administration"); and, as I will show here, communicating with another server via Telnet.

The two samples shown here use the Telnet service to connect to The Internet Chess Club's server at telnet://chess.lm.com:5000.(See note) The first pair of scripts logs onto the server, retrieves a list of the games currently taking place on the server, and returns that list to the Web user as a set of hypertext links. Selecting one of those links causes the second set of scripts to retrieve the current state of that game and feed the data into a gd-based program to create an image of the chessboard.

In each of the following examples, the basic procedure in the Expect scripts follows:

This first Expect script, shown in Listing 24.14, issues the games command to generate a list of the ongoing games on the server.


Listing 24.14. Using Expect to see the current games on the chess server.
#!/usr/local/bin/expect
# iccgames.ex

# turn off writing everything to stdout (the screen)...
log_user 0
# if the process 'hangs' for 60 seconds, exit
set timeout 60
match_max -d 20000

# execute the Telnet command...
spawn telnet chess.lm.com 5000

expect {
       timeout {puts "Connection to server timed out..."; exit }
       "login:"
}

# now send ICC specific commands to the ICC server.
send "g\r\r"
expect "aics%"
send "games\r"

# look at what's returned and do something:
expect -re "(\[1-9].* || \ \[1-9].*)(aics%)"
if { $expect_out(buffer) != "" } {
   puts $expect_out(buffer)
   } else { puts "NO_DATA" }

# logout
send "quit\r"
exit

The Expect script is run by a Perl script, which parses the output and sends the formatted HTML data back to the user, as shown in Listing 24.15.


Listing 24.15. Running Expect from within Perl: the iccgames.pl code.
#!/usr/local/bin/perl
# iccgames.pl

$machine = "www.hydra.com";
$cgipath = "cgi-bin/book/chess";

print "Content-type: text/html\n\n";
print "<TITLE>ICC Gateway: Current Games</TITLE>\n";
print "<H1><CENTER>Current Games on ICC</CENTER></H1>\n";
$date = 'date';
print "$date\n";
print "<HR>\n";
print "<H2>Click on a game to view the current position*.</H2>\n";

print "<PRE>\n";
@list =  './iccgames.ex';

$counter=1;
while($list[$counter] ne "")
{
  if($list[$counter] =~ m/aics/)
  { print "\n";  last;  }
  if($list[$counter] =~ m/games displayed/)
  {print "</PRE><BR><CENTER><B>$list[$counter]</B></CENTER>"; last; }

$game_no = substr($list[$counter], 0, 3);
$game_no =~ tr/ //d;
$players = substr($list[$counter], 4, 40);

chop $list[$counter];
print "<A HREF=http://$machine/$cgipath/iccobs.pl?$game_no>";
print "$list[$counter]";
print "</A>";

if($ENV{HTTP_USER_AGENT} =~ /Mosaic|Lynx/i) {print "\n";}
$counter++;
}

print <<ENDOFLINKS;
</PRE><BR>
*<B>Note</B>:  this application retrieves data from the ICC server in
realtime.  \
Due to your Internet connection, the game you wish to view may be over
by the time \
your request is received by the ICC server.

<CENTER><A HREF=http://www.hydra.com/icc/icc_news.html>ICC News</A> |
<A HREF=http://www.hydra.com/icc/iccwho.2.pl>Player Info</A> |
View Games |
<A HREF=http://www.hydra.com/icc/help/icchelp.local.html>Help Files</A>
</CENTER>
<HR>
Developed at <A HREF=http://www.hydra.com/><I>Hydra Information Technologies
</I></A>
(c) 1995
</HTML>
ENDOFLINKS
exit;

After returning output and control back to the Perl script, the data is parsed to include a clickable link with the game number, as shown in Figure 24.11.

Figure 24.11 : Output produced by the iccgames.pl and icc.ex scripts. Each game is an href to the iccobs.pl script with the game number as a parameter.

The next pair of scripts combines another chess server command, observe [game number], with gd to create an image of an ongoing game. The Perl script passes the game number as a command-line argument to the Expect script iccobs.ex, as shown in Listing 24.16.


Listing 24.16. The Expect script iccobs.ex.
#!/usr/local/bin/expect
# iccgames.ex

log_user 0
set timeout 60
match_max -d 20000

spawn telnet chess.lm.com 5000

expect {
       timeout {puts "Connection to server timed out..."; exit }
       "login:"
}

send "g\r\r"
expect "aics%"

send "games\r"

expect -re "(\[1-9].* || \ \[1-9].*)(aics%)"
if { $expect_out(buffer) != "" } {
   puts $expect_out(buffer)
   } else { puts "NO_DATA" }

send "quit\r"
exit

Before discussing the Perl script, let's examine the output from the Expect program:

<12> ---r--nr---bk-b- nq--pp-p pppp--p- -------- PPPPPPPP --Q-R-B- RNB---NK B -1 0 0 0  0 143 28 patt Mbb 0 5 0 39 39 237
 -154 97 R/d2-e2 (0:02) Re2 0

The style 12 server command outputs a single string of data in space-delimited fields. Listing 24.17 summarizes the ICC Style 12 Help File.


Listing 24.17. The chessboard position format.

The Perl code is a straightforward parsing job-reading the data returned, splitting on the spaces, and generating command-line arguments for the gd-based C program. Each piece on the board is represented by a string consisting of [piece][number], in which the number refers to the column and row in which the piece is to be pasted on the chessboard by the gd program.

For Lynx users, a separate Expect script is used, replacing the send "style 12\r" command with send "style 1\r". Style 1 prints an ASCII version of the chess position and is returned unparsed and enclosed in <PRE></PRE> tags to the user.

Listing 24.18 shows the Style 12 Perl script.


Listing 24.18. The Style 12 Perl script.
#!/usr/local/bin/perl
# iccobs.pl

$machine = "www.hydra.com";
$cgipath = "cgi-bin/book/chess";
$iconpath = "icons/icc/temp";
$http_doc_root = "web/";
$this_pid = $$;
$gif_file_out= "$this_pid.gif";

$query_string = $ENV{QUERY_STRING};
$query_string =~ s/[^0-9]//g;
if($query_string eq "") {$query_string = 0;}

print "Content-type: text/html\n\n";
print "<TITLE>ICC Gateway:  Game $query_string</TITLE>\n";

if($ENV{HTTP_USER_AGENT} =~ /Lynx/i) {&lynx_client;}

@list =  './iccobs.ex $query_string';
$counter = 0;
while($list[$counter] ne "")
  {  
     if($list[$counter] =~ m/<12>/)
     { $game_data = $list[$counter]; }
     $counter++;
   }

&check_game_data;

@parts = split(/ /,$game_data);
$pcount=1;
while($pcount < 9)
  {
  $row = $pcount - 1;
  $colcount = 0;
    while($colcount < 8)
    {
    $symbol = substr($parts[$pcount], $colcount, 1);
    if($symbol eq "-") {$colcount++; next;}

    if($symbol =~ m/[prnbqk]/) {$symbol =~ s/[prnbqk]/"b".$symbol/e;}
    elsif($symbol =~ m/[PRNBQK]/)
     {$symbol =~ s/[PRNBQK]/"w".$symbol/e;
      $symbol =~ tr/[A-Z]/[a-z]/;}

    $column = $colcount ;
    $command_arg = "$column"."$row"."$symbol";
    $command_line = "$command_line"." "."$command_arg";
    $colcount = $colcount + 1;
    }
$pcount = $pcount+1;
}


eval 'rm -f /$http_doc_root/$iconpath/$gif_file_out';
$command_line = "$gif_file_out"."$command_line";
eval './iccgif $command_line';
$image_file = "/$http_doc_root/$iconpath/$gif_file_out";
  if(-e $image_file)
  { print "<img ALIGN=RIGHT src=http://$machine/$iconpath/$gif_file_out>"; }
  else
  { &error; }

($style, $row0, $row1, $row2, $row3, $row4, $row5, $row6, $row7,
 $colorturn, $pawnpush, $wcs, $wcl, $bcs, $bcl, $irr, $game_no, $wname, $bname,
 $relation, $initial_time, $increment, $wstrength, $bstrength,
 $wtime, $btime, $move_number, $previous_move, $previous_time,
 $notation, $flip) = split(/ /, $game_data);

print "<FORM METHOD=POST ACTION=http://$machine/$cgipath
/iccobs.pl?$query_string>";
print "<PRE>\n";

print "<B>$wname <I>vs.</I> $bname</B>\n";
$wminutes = $wtime / 60;
$wseconds = $wtime % 60;
$bminutes = $btime / 60;
$bseconds = $btime % 60;
printf("%d:%02d - %d:%02d\n", $wminutes, $wseconds, $bminutes, $bseconds);
print "(Time remaining)\n\n";

if($colorturn eq "B")
     {$lastcolor = ""; }
else {$lastcolor = "...      ";
      $move_number-;}
print "          White    Black\n";
print "          -----    -----\n";
if($move_number <10)
{$padone = " ";}
else {$padone = "";}
print "Move $padone$move_number:  <B>$lastcolor";
print "$notation</B>\n";
print "$previous_time used\n\n\n";

printf("Time Control: %d %d\n", $initial_time, $increment);
print "\n\n\n\n";
print '<INPUT TYPE="submit" VALUE="Refresh position">';
print "\n\n\n";
print "<A HREF=http://$machine/$cgipath/iccgames.pl>Back to list of games</A>";
print "\n\n\n";
print "<BR>";
print "</PRE>\n";
print "</FORM>\n";
print "<HR>";

&print_tail;
exit;

sub lynx_client
{
print "<PRE>";

@list =  './iccobs.lynx.ex $query_string';
&check_expect_data;

$counter=1;
while($list[$counter] ne "")
{
  if($list[$counter] =~ m/aics/)
  { print "\n"; last; }
  if(m/You are now observing/)
  { $counter++; next; }
print "$list[$counter]";
$counter++;
}

print "</PRE><BR>";
print "<B>Lynx Mode: Use Control-R to refresh position</B><BR>\n";
&print_tail;
exit; }


sub nogame {
  print "<PRE>\n";
  print "There is no game number $query_string\n";
  print "\n\n\n\n\n\n";
  print "<A HREF=http://$machine/$cgipath/iccgames.pl>Back
 to list of games</A>";
  print "</PRE>\n";
  exit; }

sub error {
  print "<PRE>\n";
  print "Error - either the game is over\n";
  print "        or there was a problem connecting to the chess server\n";
  print "\n\n\n\n\n\n";
  print "<A HREF=http://$machine/$cgipath/iccgames.pl>Back
 to list of games</A>";
  print "</PRE>\n";
  exit; }

sub debug {
  print "<PRE>\n";
  print "Error:\n\n";

  $c = 0;
  while($list[$c] ne "")
  {print "list $c = $list[$c]\n"; $c++; }

  print "</PRE>\n";
  print "command line = $command_line\n";
  exit;
}

sub check_game_data
{ if($game_data eq "") {  &nogame; }
}

sub check_expect_data {
 if( ($list[0] eq "NO_DATA") || ($list[0] =~ /timed out/) )
 { &error; }
 if($list[2] =~ /no such game/)
 { &nogame; }
 }

sub print_tail {

print <<ENDOFLINKS;
<CENTER><A HREF=http://www.hydra.com/icc/icc_news.html>ICC News</A> |
<A HREF=http://www.hydra.com/icc/iccwho.pl>Player Info</A> |
<A HREF=http://www.hydra.com/cgi-bin/book/chess/iccgames.pl>View Games</A> |
<A HREF=http://www.hydra.com/icc/help/icchelp.local.html>Help Files</A>
</CENTER>
<HR>
Developed at <A HREF=http://www.hydra.com/><I>Hydra Information
 Technologies</I></A>
(c) 1995
</HTML>
ENDOFLINKS
}

Figure 24.12 shows output from a sample game.

Figure 24.12 : Output produced by the iccobs.pl and icc.ex scripts. Note the clever attack mounted by Gemini to checkmate Aries on move 2.

The C source, iccgif.c, which is shown in Listing 24.19, is similar to the fishpaper code; after creating a blank chessboard image in the gd format, the command-line arguments are looped through, calling the putpiece function to calculate the position that the piece will be copied to on the board.


Listing 24.19. The iccgif.c code to place chess pieces on the board.
/* iccgif.c */
/* Remember to check pathnames if you attempt to compile this 'as is'
   on your machine */

#include "gd.h"
#include <stdio.h>
#include <string.h>

gdImagePtr board;
gdImagePtr piece;
int square, column, row;
int x, y, offset;

char outfile[15];
char piecestring[4];

char *return_code;
char current_piece[32];
char new_piece[2];
char WhiteSq[] = {"0.gif"};
char BlackSq[] = {"9.gif"};

char path[] = {"/web/icons/icc/ch"};
char outpath[] = {"/web/icons/icc/temp/"};
FILE *in;
FILE *out;

square = 38;
offset = 0;

main(argc, argv)
int argc;
char *argv[];
{
int piece_counter;
int piece_number;

if (argc < 3)
  { printf("Wrong number of arguments!\n");
    printf("argc=%d\n", argc);
    return(1);
  }

return_code = strcpy(outfile, argv[1]);
piece_counter = argc - 2;

in = fopen("/web/icons/icc/chboard.gif", "rb");
board = gdImageCreateFromGif(in);
fclose(in);

piece_number = 2;
while (piece_counter > 0)
  {
   return_code = strcpy(piecestring, argv[piece_number]);
   sscanf(piecestring, "%1d%1d%2s", &column, &row, new_piece);

   piece_number++;
   piece_counter-;
   return_code = strcpy(current_piece, path);
   return_code = strcat(current_piece, new_piece);

   putpiece();
  }

return_code = strcat(outpath, outfile);
out=fopen(outpath, "wb");
gdImageGif(board, out);
fclose(out);
gdImageDestroy(board);
}

putpiece()
{
int nrow, ncolumn, divresult, sum;

char *catcode;

nrow=row; nrow++;
ncolumn=column; ncolumn++;
sum = nrow + ncolumn;
divresult = sum % 2;

if (divresult == 0)
  {catcode = strcat(current_piece, WhiteSq);}
else
  {catcode = strcat(current_piece, BlackSq);}

x = offset + (square * column);
y = offset + (square * row);
in = fopen(current_piece, "rb");
piece = gdImageCreateFromGif(in);
fclose(in);
gdImageCopy(board, piece, x, y, 0, 0, 38, 38);
}

The gd program used in this application performs a simple series of Paste operations that also could have been accomplished with NetPBM programs. The gd approach is noticeably superior because a single C program executes faster than a series of NetPBM commands.

GD.pm: A Perl 5 Graphics Module

It is common to have quantitative data that dynamically changes from one client session to the next. Wouldn't it be nice if you could keep track of numeric data, change it, and keep the changes persistent across sessions? And wouldn't it also be good if you could produce a dynamic graph of the data on demand? These rhetorical questions can be answered in the affirmative with the use of another one of Lincoln Stein's nifty creations: the GD.pm Perl 5 graphics module. This module is a set of Perl 5 methods that are applied on Boutell's gd C-based graphics library. In fact, it is necessary (but very simple) to install the gd product before GD.pm is installed. The module and its documentation are available at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/. This section looks at a practical example of how you can use CGI.pm to keep state between sessions and also use GD.pm to create a GIF-image graphic on demand.

Figure 24.13 shows a form you present to the user, asking for a vote to pick the best movie of the five shown here.

Figure 24.13 : Movie votes are recorded and remembered across client sessions for two hours.

Note the Graph button. Clearly, the script must be keeping state in some way (the users' votes are kept current across sessions) and the Graph button implies that some process will be kicked off to create a graph of the up-to-date state of affairs. Sure enough, in Figure 24.14, you see the result of the graph request.

Figure 24.14 : The user can see how the five movies rate, relatively speaking, by issuing a graph request from Figure 24.13.

A single script accomplished the twin goals of state (via Netscape Cookies) and dynamic graphing (via GD.pm). Listing 24.20 shows the code for the movieg.pl script.


Listing 24.20. The movieg.pl code.
#!/usr/local/bin/perl
#  movieg.pl
#    Use Lincoln Stein's CGI.pm Version 2.21
#    Example of Netscape Cookies using CGI.pm
#    Example of dynamic GIF drawing using Lincoln's GD.pm
#################################################################
#    part of the program (GD.pm movie votes) originally by
#    Lance Ball,lball@stern.nyu.edu,
#    CGI intertwinings by Mark Ginsburg.
#################################################################
use CGI qw(:standard);
use GD;

@movie_choices=('Reservoir Dogs', 'Chinatown',
                'Full Metal Jacket', 'The Usual Suspects', 'Toy Story');

%movie = cookie('movie');  # recover assoc array from cookie

# Get the new movie vote from the form
# If the user instead clicks on Graph, show the graph

$action = param('action');

if (param('action') eq 'Vote') {
   $new_vote = param('new_vote');
   if (length($new_vote) > 3) {
       $movie{$new_vote}++;  # bump up the vote by one
       $msg = "Thank you for vote $new_vote"; }
   else {
      $msg = "Null vote $new_vote not recorded.";  }
# Add new votes to old, and put them in a cookie
   $the_cookie = cookie(-name=>'movie',
                        -value=>\%movie,  # assoc array is cookie value
                        -path=>'/cgi-bin',
                        -expires=>'+2h');  # 2-hour expiration time on cookie
   print header(-cookie=>$the_cookie);  # new cookie value
}

elsif (param('action') eq 'Graph') {  # they want to see the vote graph
   &show_graph;  
   exit 0;  }


# Now we're ready to create our HTML page.
print start_html(-title=>'Vote for a Movie',
                 -author=>'lball or mginsbur @stern.nyu.edu',
                 -BGCOLOR=>'white');

#
# Now show main Movie Vote form
#
print <<EOHTML;
<h1>Vote on a Movie</h1>
Make a vote and click on 'Vote'.  Votes will be retained for 2 hours.

<center>
<table border>
<tr><th>Vote<th>Old Votes
EOHTML

print "<tr><td>",start_form;
print scrolling_list(-name=>'new_vote',
                     -values=>[@movie_choices],
                     -size=>5),"<br>";
print submit(-name=>'action',-value=>'Vote'),
      submit(-name=>'action',-value=>'Graph');

print end_form;

print "<td>";

foreach (sort keys %movie) {
    print "$_ : $movie{$_} <br>";  # display current standings
}

print "</table></center>";

print "<hr>$msg ";  # show guess status for demo purposes
print end_html;
exit 0;

#
# show_graph:  uses GD.pm to draw the graph
#
sub show_graph {

    print header(-type=>'image/gif');  # no need for cookie send

$no1 = $no2 = $no3 = $no4 = $no5 = 0;

$im = new GD::Image(300,300);  # instantiate a new image region
# and set up handy colors
$white = $im->colorAllocate(255,255,255);
$black = $im->colorAllocate(0,0,0);
$red = $im->colorAllocate(255,0,0);
$green = $im->colorAllocate(0,255,0);
$blue = $im->colorAllocate(0,0,255);
$rose = $im->colorAllocate(255,100,80);
$peach = $im->colorAllocate(255,80,20);

$im->transparent($white);
$im->interlaced('true');

$im->rectangle(0,0,299,299,$black);
#
# for each key in the array, see how the votes are relative to the
#  other movies, and set up some graphing parameters for each.
#
foreach (keys %movie) {
      $dataline = $_.":".$movie{$_};  # to emulate external data sets
                                      # which are in Movie : ### format
      @line=split(/:/,$dataline);
      if ($line[1] >= $no1) {  # now just figure out how good it is
        $no5 = $no4;           # in terms of its votes and which curr.
        $name5 = $name4;       # item it should displace.
        $no4 = $no3;
        $name4 = $name3;
        $no3 = $no2;
        $name3 = $name2;
        $no2 = $no1;
        $name2 = $name1;
        $no1 = $line[1];
        $name1 = $line[0];
    }
    elsif ($line[1] >= $no2) {
        $no5 = $no4;
        $name5 = $name4;
        $no4 = $no3;
        $name4 = $name3;
        $no3 = $no2;
        $name3 = $name2;
        $no2 = $line[1];
        $name2 = $line[0];
    }
    elsif ($line[1] >= $no3) {
        $no5 = $no4;
        $name5 = $name4;
        $no4 = $no3;
        $name4 = $name3;
        $no3 = $line[1];
        $name3 = $line[0]
    }
    elsif ($line[1] >= $no4) {
        $no5 = $no4;
        $name5 = $name4;
        $no4 = $line[1];
        $name4 = $line[0];
    }
    elsif ($line[1] >= $no5) {
        $no5 = $line[1];
        $name5 = $line[0];
    }
    else {}
}  
#  now some grunt work to get graph into shape
$coord1 = 250-((200*$no2)/$no1);
$coord2 = 250-((200*$no3)/$no1);
$coord3 = 250-((200*$no4)/$no1);
$coord4 = 250-((200*$no5)/$no1);

$brush = new GD::Image(5,5);
$brush->colorAllocate(0,0,0);
$brush->colorAllocate(255,255,255);
$brush->filledRectangle(0,0,5,5,$black);
$im->setBrush($brush);

$im->string(gdLargeFont,75,25,"Top Five Movies",$black);

$im->line(50,250,250,250,gdBrushed);
$im->line(50,250,50,50,gdBrushed);
#  now ready to draw the Vote Bars
$im->filledRectangle(55,50,70,250,$red);
$im->filledRectangle(95,$coord1,110,250,$blue);
$im->filledRectangle(135,$coord2,150,250,$green);
$im->filledRectangle(175,$coord3,190,250,$rose);
$im->filledRectangle(215,$coord4,230,250,$peach);
#  now attach the Movie Title labels
$im->stringUp(gdSmallFont,45,250,"${name1}",$red);
$im->stringUp(gdSmallFont,85,250,"${name2}",$blue);
$im->stringUp(gdSmallFont,125,250,"${name3}",$green);
$im->stringUp(gdSmallFont,165,250,"${name4}",$rose);
$im->stringUp(gdSmallFont,205,250,"${name5}",$peach);

$im->string(gdSmallFont,60,255,"${no1}",$red);
$im->string(gdSmallFont,100,255,"${no2}",$blue);
$im->string(gdSmallFont,140,255,"${no3}",$green);
$im->string(gdSmallFont,180,255,"${no4}",$rose);
$im->string(gdSmallFont,220,255,"${no5}",$peach);

print $im->gif;   # write the graph to the screen, that's it.

}  # end of subroutine

Code Discussion: movieg.pl

The program is divided into two logical parts. The first part, which uses CGI.pm to implement Netscape Cookies, should be rather familiar because it shares much in common with the Cookie.pl example. The twist here is that I use an associative array in the Cookie value instead of the regular array you saw previously. This structure is handy to easily increment vote totals as new ones come in. The movie votes are kept current on a client-by-client basis for up to two hours.

The second part, the dynamic graphing capability, is an interesting example of how GD.pm makes your life easy. All you need to do is use the current Cookie value, parse it in some sensible manner, and feed it to the simple GD.pm graphics methods, such as filledRectangle. The available methods are discussed in depth in the documentation and take an intuitive set of parameters. The filledRectangle method, for example, takes five parameters: the four corners and a color. Granted, the graph I drew in Figure 24.14 is not on the order of the Sistine Chapel, but the reader should get a sense of the cost (time to do the code) versus the return (an efficient means to provide graphics to the user).

Retrieving Web Data from Other Servers

Chapter 19 featured a discussion of TCP/IP as the fundamental building block on which the HyperText Transfer Protocol stands. By exploiting this concept, developers can create their own client programs that perform automated or semi-automated transfer protocol requests. The well-known types of these programs commonly are known as robots, spiders, crawlers, and so on.(See note)

Robots operate by opening a connection to the target server's port (traditionally, 80 for HTTP requests), sending a proper request, and waiting for a response. To understand how this works, Listing 24.21 shows opening a regular Telnet connection to a server's port 80 and making a simple GET request (recall the discussion of the HEAD method in Chapter 20).


Listing 24.21. A Telnet session to the HTTP port 80.
/users/ebt 47 : telnet edgar.stern.nyu.edu 80
Trying 128.122.197.196 ...
Connected to edgar.stern.nyu.edu.
Escape character is '^]'.
GET /
<TITLE> NYU EDGAR Development Site </TITLE>

<A HREF="http://edgar.stern.nyu.edu/team.html">
<img src="http://edgar.stern.nyu.edu/icons/nyu_edgar.trans.gif">
</a>

<h3><A HREF="http://edgar.stern.nyu.edu/tools.shtml">
Get Corporate SEC Filings using NYU </a> or
<A HREF="http://www.town.hall.org/edgar/edgar.html"> IMS </a> Interface
</A></h3>

<h3><A HREF="http://edgar.stern.nyu.edu/mgbin/ticker.pl">
<! img src="http://edgar.stern.nyu.edu/icons/ticker.gif">
What's New - Filing Retrieval by Ticker Symbol!
</A></h3>

<h3><A HREF="http://edgar.stern.nyu.edu/profiles.html">
Search and View Corporate Profiles
</A></h3>

...

Connection closed by foreign host.
/users/ebt 48 :

Note that Martin Koster has developed a set of robot policies, which are not official standards but allow a server to request that certain types of robots not visit certain areas on the server. His proposal is available at http://info.webcrawler.com/mak/projects/robots/robots.html. It is considered good Web netiquette to follow Koster's guidelines; a poorly behaved robot can generate vociferous complaints from sites that are affected adversely.

Assuming that the requested file exists, the data is sent back, after which the connection closes. Note that it is unformatted data; formatting is the job of the client software and, in this case, there is none.

This is amusing but hardly automated. Although most programming languages include networking functions that the developer could use to build automated tools, the developer does not need to start from scratch. A number of URL retrieval libraries are readily available for Perl. (See note)

Listing 24.22 uses the familiar http_get(See note) program again. The purpose of this Perl script using http_get is to do the following:

Retrieve a URL requested by the user (the root page)

Parse the data returned and attempt to identify all <A HREF=HTTP:> links within the root page

Retrieve each of the HTTP links found in the root page that have an .html extension or no extension, parse those pages, and display the links found.

It is interesting to study this program with the related robot.cgi presented in Chapter 22, "Gateway Programming II: Text Search and Retrieval Tools."

When run against http://www.hydra.com/, the output shown in Figure 24.15 was returned. The <HR> tag is used to separate each of the links found on the root page, with each of the second level links indented.

Figure 24.15 : Output produced by executing LinkTree with the URL http://www.hydra.com/.


Listing 24.22. The http_get Perl script.
#!/usr/local/bin/perl
#linktree.pl v.1

require "cgi-lib.pl";
print "Content-type:  text/html\n\n";

&parse_request;
$URL = $query{URL};

@urlparts = split(/\//, $URL);
$home = $urlparts[2];

$html = 'http_get $URL';
print "<B>Here is $URL</B><HR>\n";
&upcase_link;
$_ = $html;
&parse_links;
@toplinks = @links;
&repeat;
exit;

sub print_top_links {
  foreach $top (@toplinks) {
  print "$top<BR>\n";
  }
print "<HR>\n";
}

sub repeat {
  foreach $top (@toplinks) {
  print "$top<BR>\n";
  $link = $top;
  &real_url;
  next if ($real_url !~ /$home/i );
  $html = 'http_get $real_url';
  &upcase_link;
  $_ = $html;
  &parse_links;

  foreach $new (@links) {
   print "---->$new<BR>\n";
  } # END foreach

  print "<HR>\n";
  } # end outer foreach
} # END sub repeat

sub parse_links {
undef (@links);
$link_counter = 0;
$offset = 0;
$anchor_start = 0;

while($anchor_start != -1) {

  $anchor_start = index($_, "<A ", $offset);
  $anchor_end   = index($_, "</A>", $anchor_start);
  $url_end      = index($_, ">", $anchor_start) -1;

  $length = ($anchor_end + 4) - $anchor_start;
  $link   = substr($_, $anchor_start, $length);

  $offset = $anchor_end;
  $link =~ s/\"//g;

  if($link !~ m/=http/)
  { @temp = split(/=/, $link);
    $link = $temp[0]."=http://$home/".$temp[1];
    if($link !~ m/<A\s+HREF/i) { next; }
  }

  @links[$link_counter] = $link;
  $link_counter++;
} #end while

} #END SUB parse_links

sub real_url {
$real_url = $link;
$real_url =~ s/<A HREF=//;
$real_url =~ s/>.*//;
}

sub upcase_link {
$html =~ s/<\!.*\n/ /g;
$html =~ s/\n+/ /g;
$html =~ s/<a/<A/ig;
$html =~ s/a>/A>/ig;
# try to get rid of the annoying <A name= tag...
$html =~ s/<A\s+n/ /ig;
$html =~ s/href/HREF/ig;
$html =~ s/http/http/ig;
$html =~ s|<[hH][0-9]>||g;
$html =~ s|</[hH][0-9]>||g;
$html = $html."</A>";
}

This code does not attempt to follow the robots convention; it does not check for the file robots.txt in any of the directories explored. This minimal robot is intended to be somewhat benign, however; it tries to avoid executing any programs on the target machine. Recently, two machines I work on were visited by a somewhat malignant robot; it did not look for a robots.txt, but it did insist on exploring every link on a page, including method=post e-mail forms. After sending me various pieces of blank e-mail, it then proceeded to the chess pages and started executing each of those scripts (the ones that open a Telnet connection to chess.lm.com). Fortunately, that robot got bored after several retrievals and moved onto another directory. (A less likely explanation is that a human operator at the other end realized what was going on and interrupted the beast.)

The robot outlined previously is also benign because it only explores one depth of links local to the root page and then quits. By making the code recursive or even just exploring two or three levels, quite a tree could result when aimed at a suitable target with many links on the root page.

If you are interested in experimenting with a Web robot, my advice is to make it friendly and first test it only on sites that are agreeable to such experimentation.

Scripting for the Unknown Check

The purpose of this chapter is to give the developer a taste of what is possible with the Common Gateway Interface. It is not meant to be a comprehensive survey; as of this writing, there is a plethora of tools available to accomplish any of the tasks described here. The tools used in this chapter are only exploiting the fundamental nature of the HTTP protocol-from maintaining state to on-the-fly graphics creation to automated document-retrieval tools, the robustness of the protocol gives the developer a huge playground of possibilities to enhance and augment a Web site.


Footnotes

Netscape's Cookie specifications can be found at http://search.netscape.com/newsref/std/cookie_spec.html.
The sample is based on Lincoln Stein's animal crackers demo at http://www-genome.wi.mit.edu/ftp/pub/software/WWW/examples/cookie.cgi.
One of the most aesthetically pleasing access counters is at http://www.semcor.com/~muquit/Count.html. This method requires the ImageMagick X Window program. You can find other methods at http://www.yahoo.com/Computers/World_Wide_Web/Programming/Access_Counts/. Gwstat, which requires ImageMagick and Ghostscript, is at http://dis.cs.umass.edu/stats/gwstat.html.
There is a wealth of on-line information available on graphics formats. A good starting point is the Graphics File Formats FAQ, located at http://www.cis.ohio-state.edu/hypertext/faq/usenet/graphics/fileformats-faq/top.html.
The complete GIF specification is available from CompuServe.
A useful starting point for learning about transparent GIFs is http://www.mit.edu:8001/people/nocturne/transparent.html.
A good explanation of this matter is included at http://www.cis.ohio-state.edu/hypertext/faq/usenet/graphics/fileformats-faq/part1/faq-doc-41.html.
This site contains just about every FAQ known to humankind: http://www.cis.ohio-state.edu/hypertext/faq/usenet/jpeg-faq/faq.html.
ftp://ftp.uu.net/graphics/jpeg/ contains the source for the djpeg and cjpeg utilities, in addition to other JPEG-related source code and documents.
Source code and complete documentation can be found at ftp://ftp.wustl.edu/graphics/graphics/packages/NetPBM/netpbm-1mar1994.tar.gz.
http://www.boutell.com//png/ contains the PNG specification.
You can find the latest version of gnuplot at ftp://prep.ai.mit.edu/pub/gnu/gnuplot-3.5.tar.gz.
The gd1.1.1 package is at http://www.boutell.com/gd/.
Teach Yourself C in 21 Days, Sams Publishing.
The latest version of Expect always can be found at ftp://ftp.cme.nist.gov/pub/expect/index.html.
The Internet Chess Club can be reached via telnet://chess.lm.com 5000 or via e-mail at icc@chess.lm.com. Users behind firewalls beware: Your access to this nonstandard port may be blocked.
Good starting points for exploring this subject are http://web.nexor.co.uk/mak/doc/robots/robots.html and http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Searching_the_Web/Robots_Spiders_etc_Documentation/.
http://www.ics.uci.edu/pub/websoft/libwww-perl/ and http://uts.cc.utexas.edu/~zippy/url_get.html
Posted to comp.unix.sources, this package was "originally based on a simple version by Al Globus (globus@nas.nasa.gov). Debugged and prettified by Jef Poskanzer (jef@acme.com)."