Chapter 1

The World Wide Web as a Communications System


CONTENTS


This chapter gives you an overview of the key issues and concepts that define the World Wide Web's current state, structure, and characteristics as a communications system. I first present an overview of the important new ways the Web has expanded to fill niches as a communications system, with emphasis on technologies emerging since the first edition of this book. Next, I trace the origin of the Web from ideas about information system design and theories of nonlinear thinking. I then present a technical definition of the Web that identifies the Web's key software and network communications components involved in its operation and then place this definition of the Web in the larger context of on-line communications, or cyberspace, showing how the Web plays a role as an information integrator. I describe the types of communications enabled by the Web and link these types to human communication contexts. Finally, I present a primer in Web information literacy-how to navigate the Web using on-line resources for subject, keyword, and space-oriented searching.

Note
This chapter's initial sections include references to uniform resource locators (URLs), which are explained in detail later in this chapter. These URLs (for example, http://www.sgmlopen.org) refer to information available on the World Wide Web. Chapter 3, "Options for Web Connections," gives you an overview of software tools and how to access the Web.

The State of the World Wide Web

It's the late 1990s, and the World Wide Web is a more complex system for communication than when it was introduced almost a decade ago. Although technically still based on the system of hypertext that Tim Berners-Lee and others developed at the European Laboratory for Particle Physics in Geneva, Switzerland in the late 1980s, the Web today is more diverse technologically and more diffused within society and culture.

The Technical Expansion of the Web

The range of technologies a Web developer can choose from is now more varied than ever. Besides an array of techniques and tools to shape meaning with HyperText Markup Language (HTML), developers also can use many technologies to add new kinds of multimedia and interactive content to on-line services. New kinds of software to observe Web content are being developed, and the competition for being the provider of Internet software has risen to the highest priority in the personal computer industry.

Whereas the view of the Web in 1989 was a text-based browser deployed on an internal network, today the Web is a global medium that encompasses many software and communications systems across many networks. Within just the years 1995-1996, new kinds of system s emerged that enabled new forms of communications over the Web. The Java language (http://www.javasoft.com/), specifically designed for network communications, rose to prominence in 1995 as a new way of communicating on the Web. Following Java's success, a new company, Lucent Technologies, spun off from AT&T and presented its Inferno system (http://www.lucent.com/inferno/) to the world to address the need for a network operating system as well as a language for network-distributed content. Both Java and Inferno represent major new ways of thinking about on-line communications and technically supporting it.

Java brought a new way of expressing interactivity on-line and a shift of attention toward write-once, run anywhere, network-distributed software. Java didn't replace the existing systems of the Web that had existed up to 1995, but instead supplemented the Web with new components. No longer do Web pages have to be static like the pages of a book; now developers can create embedded executable software-applets-that users can interact with in information or communications applications.

Similarly, new technical possibilities expanded regarding the kinds of media a Web page can carry. Although sound, video, and other multimedia effects were possible before 1995-1996, systems emerged during this period that provided high-quality solutions to some of the problems of distributing multimedia on global networks. Notably, RealAudio (http://www.realaudio.com/) emerged as an outstanding solution to providing audio on demand over the Internet. Instead of a click and wait cycle of sound retrieval, RealAudio provides a streaming solution; users can listen to an audio file as it is downloaded instead of having to wait for the whole file to download. This system was a boon to the sound industry; ABC Radio News, CBC Radio, the National Public Radio of the United States, and dozens of radio stations worldwide suddenly could have the Internet as a supplement to the airwaves as a broadcast medium.

Multimedia developers also gained new possibilities for networked communication with Macromedia's Shockwave product (http://www.macromedia.com/). Shockwave provides a set of plugins for its existing Macromedia authorware and enables developers to create multimedia content. Users can view innovative multimedia presentations by using the Shockwave viewers given away for free on the Macromedia site.

The method of distributing RealAudio or Shockwave content to users who obtain freely available viewer software remains a commonplace model for distributing new media on the Web. Virtual Reality Modeling Language (VRML) (http://www.vrml.org/) is yet another major technology that emerged during 1995-1996 as a component of a Web developer's choices for expression (see Chapter 28, "Virtual Reality Modeling Language"). VRML opens up the possibility for three-dimensional worlds-the creation of a cyberspace more in line with the visions of science fiction writers of the past. Combined with Java, the 3-D worlds of VRML can have behavior-the shapes and structures can respond to a user's presence and input.

Still a raft of other kinds of technical innovations-some minor tweaks, some major innovations-came to light on the Web in 1995-1996. Microsoft's (http://www.microsoft.com/) strategic focus on the Internet as a key part of personal computing certainly shifted the whole playing field of Web technology development. Netscape Communications' Web software (http://www.netscape.com/) continued to dominate, but Microsoft's Internet Explorer 3.0 and other browsers (http://www.browserwatch.com/) emerged as serious alternatives for browsing the Web.

The hypertext language of the Web itself-the HyperText Markup Language-also changed. The pressure on the HTML specifications to change to accommodate new ideas was so great that the HTML 3.0 specification was overloaded with new ideas; as a result, it was never finalized. Instead, the HTML 3.2 specification was finalized in 1996. The HTML 3.2 specifications are a scaleback from the original 3.0 draft specifications but are more manageable to implement. The HTML 3.2 specifications are the basis for the HTML implementation discussion of this book (see Chapters 11 through 17).

The Social Expansion of the Web

The Web's expansion and change isn't determined by its technical makeup, however. Indeed, technical innovations throughout history often have failed because of their lack of social acceptance and use; witness the picture phone and other technically "good ideas" that people simply never wanted and never used.

The Web doesn't seem doomed as yet, however, to the scrap heap. The Web appears to have caught on-at least among the technologically rich people and countries of the world-as part of the communications environment. From Wall Street (http://www.dowjones.com/) to Wal-Mart (http://www.wal-mart.com/), the Web has become part of the communications culture. In the consumer world, the Web routinely is used to promote everything from Coca-Cola (http://www.cocacola.com/) to movies (http://www.ID4.com/).

Many businesses see the Web now as a key component of their work. As an integral part of future business on the Web, the banking industry gradually is making steps toward serving its customers on-line. Systems of virtual cash (http://www.w3.org/pub/WWW/Payments/) are in development that may create not only a widespread promotional market on the Web, but an actual market for trade. With financial payments, the Web is poised to become even a more important part of the global communications and trade systems.

According to most demographic gauges, the use of the Internet and Web is fairly extensive. According to researchers at Vanderbilt University (http://www2000.ogsm.vanderbilt.edu/), an estimated 28.8 million people in the United States who are 16 and older have potential or actual access to the Internet; 11.5 million of these people use the Web; and 1.51 million have used the Web for a purchase. Although these numbers don't reveal the actual involvement or commitment people have regarding Web communications, they do represent a mass audience that commercial enterprises don't want to ignore.

The Challenge for Web Developers

As the technology of the Web, its social acceptance, and audience expand, the most crucial challenge is to create meaning on-line. Although the Web's widespread use means that it is no longer a novelty to have a Web site, it remains a novelty to develop a Web site well. Web development is extremely easy to start; a novice can put together a Web home page very quickly and announce its existence to the world. What is hard is creating a successful Web site. The challenge for Web developers remains-as it has for almost a decade-to create meaning and value in the on-line medium. This essential skill-shaping meaning on-line with an appreciation for the qualities of the Web as a communications medium-is the central theme of this book.

Web developers no doubt have a difficult task. The period 1995-1996 saw the emergence of a Web hangover-a period of disappointment with the truly hard work it takes to develop a Web site well. The massive hype of the mid-1990s about Web communications gave rise to very difficult questions. Professionally developed Web sites involve major investments of talent and resources. The people funding this work naturally began to ask what kind of return they were getting on their investments. Sites providing a focused service, such as Ticketmaster (http://www.ticketmaster.com/), succeeded while other on-line ventures that did not involve direct sales floundered. Web publishing remained a serious challenge. Many print publishers developed Web sites. Subsidized by revenue from print sales, sites such as The Wall Street Journal Interactive Edition (http://www.wsj.com/) flourished. Web-based publications scrambled to find advertising money, and many ventures, initially enthusiastically supported, floundered during this period. O'Reilly and Associates failed a second time in producing a Web-based magazine. Their unsuccessful Web Review failed to connect to an audience or advertisers willing to pay the bills for the very expensive work they put into it. O'Reilly's failure highlights the very high cost of a venture when Web development is not done well.

The future of the Web will be sure to see more technical innovation and various levels of social acceptance. What remains on the agenda of the World Wide Web Consortium, an industry consortium helping to realize the full potential of the Web, is a long list of activities (http://www.w3.org/pub/WWW/Consortium/Prospectus/ActivityList). The Consortium has planned new developments in the user interface to the Web, technological and social practices, and work on the technical makeup of the Web's systems.

The rest of this chapter describes in more detail the makeup of the Web so that you can begin to understand the extent of the Web and how its components fit together.

Updates on the State of the Web
Talking about the Web's current state is almost like talking about the weather-it changes so rapidly and so often. For an update on this discussion of the Web's state, with links to the relevant Web sites, connect to http://www.december.com/web/develop/state.html. This resource is part of the Web Development web (a key part of the on-line support offered to readers of this book) and is part of this book's on-line support web, http://www.december.com/works/hcu.html.

An Overview of the World Wide Web

The World Wide Web (WWW) emerged from ideas about nonlinear information organization and was developed to meet the information needs of researchers in the high-energy physics community. Today, the WWW offers a system for distributing hypermedia information locally or globally.

Origins of the Web

Certainly, the idea of presenting information in a nonlinear fashion did not start with the twentieth century. The Talmud, an important document in the Jewish faith, includes commentaries and opinions of the first five books of the Bible. The Talmud's organization contains commentaries and commentaries on commentaries that extend from central paragraphs in the middle of the page. Footnotes, which are used in traditional paper texts, also have a relational, nonsequential quality that is similar to the spirit of hypertext.

Hypertext, as implemented on the Web, however, has its origins in the start of the electronic computer age, when ideas about associative linking could be married with the possibilities of automated storage-and-retrieval systems.

Vannevar Bush described a system for associatively linking information in his July 1945 article in The Atlantic Monthly, "As We May Think" (this article is available on the Web at http://www.isg.sfu.ca/~duchier/misc/vbush/). Bush called his system a memex (memory extension) and proposed it as a tool to help the human mind cope with information. Having observed that previous inventions had expanded human abilities for dealing with the physical world, Bush wanted his memex to expand human knowledge in a way that took advantage of the associative nature of human thought. Bush's design for the memex involved technologies for recording information on film and mechanical systems for manipulation. Although the memex was never built, Bush's article defined, in detail, many concepts of associative linking and an information system to capture these in a design.

Ideas about information systems design as well as working computer systems emerged in the decades after Bush's article. In 1962, Doug Englebart began a project called Augment at the Stanford Research Institute. Augment's goal was to unite and cross-reference the written material of many researchers into a shared document. One portion of the project, oN-Line System (NLS), included several hypertext features.

In 1965, Ted Nelson coined the term hypertext to describe text that is not constrained to be sequential. Hypertext, as described by Nelson, links documents to form a web of relationships that draws on the possibilities for extending and augmenting the meaning of a "flat" piece of text with links to other texts. Hypertext therefore is more than just footnotes that serve as commentary or further information in a text. Instead, hypertext extends the structure of ideas by making "chunks" of ideas available for inclusion in many parts of multiple texts.

Nelson also coined the term hypermedia, which is hypertext not constrained to be text. Hypermedia can include multimedia pictures, graphics, sound, and movies. In 1967, Nelson proposed a global hypermedia system, Xanadu, which would link all world literature with provisions for automatically paying royalties to authors. Although Xanadu has never been completed, a Xanadu group did convene in 1979, and the project was bought by Autodesk, Inc. in 1988 and developed until its cancellation in 1992. Afterward, Nelson re-obtained the Xanadu trademark and, as of 1994, was working to develop the project further (see http://xanadu.net/the.project).

Also in 1967, a working hypertext system called Hypertext Editing System was operational at Brown University. Andries van Dam lead a team that developed the system, which later was used for documentation during the Apollo space missions at the Houston Manned Spacecraft Center. By 1985, another hypertext system came out of Brown University, called Intermedia, which included bi-directional links and the possibility for different views of hypertext, including a single-node overview and an entire hypertext structure view called a Web view.

Also in 1985, Xerox Palo Alto Research Center (PARC) (http://www.parc.xerox.com/) introduced a LISP-based system called NoteCards. Each node in NoteCards could contain any amount of information, but there were many types of specialized cards (50) for special data structures.

Hypertext's stature as an important approach to information organization in industry and academia was marked in 1987, when the Association for Computing Machinery (http://www.acm.org/) held its first conference on hypertext at the University of North Carolina. This was the same year that Apple Computer Corporation (http://www.apple.com/) introduced its HyperCard system. Bundled free with each Macintosh computer sold, HyperCard quickly became popular. Users organized the cards and stacks in HyperCard and took advantage of the possibilities for ordering the cards in various ways in the stack.

Vannevar Bush's, Ted Nelson's, and others' ideas about information systems showed up in another project in the late 1980s. In March 1989, Tim Berners-Lee, a researcher at the Conseil Europeen pour la Recherche Nucleaire (CERN) European Laboratory for Particle Physics in Geneva, Switzerland, proposed a hypertext system to enable efficient information sharing for members of the high-energy physics community. Berners-Lee had a background in text processing, real-time software, and communications and had previously developed a hypertext system that he called Enquire in 1980 (at that time, he had been unaware of Nelson's term, hypertext). Berners-Lee's 1989 proposal, called HyperText and CERN, circulated for comment. The important components of the proposal follow:

A user interface that would be consistent across all platforms and that would allow users to access information from many different computers

A scheme for this interface to access a variety of document types and information protocols

A provision for universal access, which would allow any user on the network to access any information

By late 1990, an operating prototype of the WWW ran on a NeXT computer, and a line-mode user interface (called www) was completed. The essential pieces of the Web were in place, although not widely available for network use.

In March 1991, the www interface was used on a network, and by May of that year, it was made available on central CERN machines. The CERN team spread the word about its system throughout the rest of 1991, announcing the availability of the files in the Usenet newsgroup alt.hypertext on August 19, 1991 and to the high-energy physics community through its newsletter in December 1991. In October of 1991, a gateway from the Web to Wide-Area Information Server (WAIS) software was completed.

During 1992, the Web continued to develop, and interest in it grew. On January 15th, the www interface became publicly available from CERN, and the CERN team demonstrated the Web to researchers internationally throughout the rest of the year. By the start of 1993, there were 50 known Web servers in existence, and the first graphical interfaces (called clients or browsers) for the X Window System and the Macintosh became available in January.

Until 1993, most of the development of Web technologies came out of CERN in Switzerland. In early 1993, however, a young undergraduate at the University of Illinois at Urbana-Champaign named Marc Andreessen shifted attention to the United States. Working on a project for the National Center for Supercomputing Applications (NCSA), Andreessen led a team that developed an X Window System browser for the Web called Mosaic. Mosaic was released in alpha version in February 1993 and was among the first crop of graphical interfaces to the Web.

Mosaic-with its fresh look and graphical interface presenting the Web using a point-and-click design-fueled great interest in the Web. Berners-Lee continued promoting the Web itself, presenting a seminar at CERN in February 1993 outlining the Web's components and architecture.

Communication using the Web continued to increase throughout 1993. Data communication traffic from Web servers grew from 0.1 percent of the U.S. National Science Foundation Network (NSFNet) backbone traffic in March to 1.0 percent of the backbone traffic in September. Although not a complete measure of Web traffic throughout the world, the NSFNet backbone measurements give a sample of Web use. In September, NCSA released the first (1.0) operational versions of Mosaic for the X Window System, Macintosh, and Microsoft Windows platforms. By October, there were 500 known Web servers (versus 50 at the year's start). During Mecklermedia's Internet World in New York City in 1993, John Markoff, writing on the front page of the business section of The New York Times, hailed Mosaic as the "killer app [application]" of the Internet. The Web ended 1993 with 2.2 percent of the NSFNet backbone traffic for the month of December.

In 1994, more commercial players got into the Web game. Companies announced commercial versions of Web browser software, including Spry, Inc. Marc Andreessen and colleagues left NCSA in March to form, with Jim Clark (former chairman of Silicon Graphics), a company that later became known as Netscape Communications Corporation (http://home.netscape.com/). By May 1994, interest in the Web was so intense that the first international conference on the WWW, held in Geneva, overflowed with attendees. By June 1994, there were 1,500 known (public) Web servers.

By mid-1994, it was clear to the original developers at CERN that the stable development of the Web should fall under the guidance of an international organization. In July, the Massachusetts Institute of Technology (MIT) and CERN announced the World Wide Web Organization (which later became known as the World Wide Web Consortium, or W3C). Today, the W3C (http://www.w3.org/hypertext/WWW/Consortium/) guides the technical development and standards for the evolution of the Web. The W3C is a consortium of universities and private industries, run by the Laboratory for Computer Science (LCS) at MIT collaborating with CERN (http://www.cern.ch/) and Institut National de Recherche en Informatique et en Automatique (INRIA), a French research institute in computer science (http://www.inria.fr/). The Web ended 1994 with 16 percent of the NSFNet backbone traffic for the month of December, beating out Telnet and Gopher traffic in terms of bytes transferred.

In 1995, the Web's development was marked by rapid commercialization and technical change. Netscape Communication's browser, called Netscape Navigator (nicknamed Mozilla) continued to include more extensions of the HTML, and issues of security for commercial cash transactions garnered much attention. By May 1995, there were more than 15,000 known public Web servers-a tenfold increase over the number from a year before. Many companies had joined the W3C by 1995, including AT&T, Digital Equipment Corporation, Enterprise Integration Technologies, FTP Software, Hummingbird Communication, IBM, MCI, NCSA, Netscape Communications, Novell, Open Market, O'Reilly & Associates, Spyglass, and Sun Microsystems.

In May 1995, Sun Microsystems introduced its Java language (http://www.javasoft.com/) in its initial form to the Internet community. The language transformed the on-line world for the rest of the year, challenging long-held strategies of many companies. By December 1995, even Microsoft (http://www.microsoft.com/) recognized the Internet as a key player in personal and business communications. Microsoft announced its intent to license the Java language.

Along with growing interest in the Web, the number of Web resources offered through servers exploded. Yahoo! (http://www.yahoo.com/), a subject tree of Web resources, grew rapidly, rising from approximately 100 links in late March 1994 to more than 39,000 entries (and a new commercial home) by May 1995. Paper documentation about the Web also grew. By May 1995, the number of books about the Web exceeded two dozen, and several new paper periodicals devoted to the Web (WebWeek and WebWatch, for example) had been launched. Web traffic on the NSFNet backbone had exceeded all other services (in terms of bytes transferred through service ports), and interest in the Web among Internet users and users of commercial services was intense. Earlier in the year, Prodigy had announced full access to the Web for its customers, and CompuServe and America Online had interfaces ready or in the works shortly afterward.

Best estimates about the number of Web servers were 23,000 by April 1995. The Lycos spider (http://www.lycos.com/) successfully downloaded at least one file from 23,550 unique HTTP servers between November 21, 1994 and April 4, 1995. The content on those servers also exploded. By May 1996, the AltaVista spider (http://www.altavista.digital.com/) had indexed more than 30 million pages on the Web. By 1995 and 1996, the Web had entered popular culture, with URLs reproduced on T-shirts and hats and routinely appearing in movie advertisements-even comic strips. Every major television broadcast network had a Web site (http://www.abc.com, http://www.cbs.com/, http://www.nbc.com/, http://www.fox.com/, and http://www.pbs.org/) and most cable networks did, such as Cable News Network, which promoted the URL of its site at nearly every commercial break (http://www.cnn.com/).

This rapid growth in the number of servers, resources, and interest in the Web set the stage for a user base that has achieved such numbers that further growth may be self-sustaining. Members of such a mass audience find that they can reach many people and sites of interest using a medium, and this nearly universal access brings a benefit that attracts other users. Some say that the use of Internet e-mail among most scientists reached this stage long ago because, for many scientific (and other) disciplines, an Internet e-mail address is extremely helpful for scholarly communication. When large numbers of scholars are participating in electronic mail, it becomes beneficial for other scholars to adopt the communications technology to gain the benefit of being in touch with so many other scholars.

It's difficult to predict when (or if) the Web will ever reach such a mass audience. Users may begin to expect to find communications from organizations on the Web, however. When a consumer television audience using the Web is pleasantly surprised at finding a Web site ("Hey, it's great that CBS television has a Web server" [http://www.cbs.com/]), it may come to expect such Web communication ("Where is the XYZ television network on the Web?"). For organizations serving such user groups, there is a cost of not having a Web presence: competitor companies with a Web presence may use it aggressively for customer service and advertising to gain a competitive advantage.

A Definition of the World Wide Web

Despite its rapid growth and technical developments, the Web in the late 1990s retains the essential functional components it had in 1990. Its popularity as a view of the Internet, however, has muddied a popular understanding of it, because the Web sometimes is viewed as being equivalent to the Internet. The Web is a very distinct system from the Internet, however. First, the Web is not a network, but an application system (a set of software programs). Second, the WWW can be deployed and used on many different kinds of networks, or it even can be used on no network at all. The rest of this section develops a definition of the Web's components.

The WWW is a hypertext information and communications system popularly used on the Internet computer network with data communications operating according to a client/server model. Web clients (browsers) can access multiprotocol and hypermedia information (possibly by using helper applications with the browser) by using an addressing scheme.

Figure 1.1 summarizes the technical organization of the Web based on this definition.

Figure 1.1 : The technical organization of the Web.

The following paragraphs provide a point-by-point definition of the WWW.

The WWW is made of hypertext. Information presented on the Web does not need not be constrained in order to be linear. In mathematical terms, the Web is a directed graph in which nodes (the Web's hypertext pages) are connected by edges (the Web's hypertext links). Areas on Web pages, called anchors, are hotspots that the user can select to retrieve another document for display in the Web interface (or browser).

Figure 1.2 summarizes the basic organization of hypertext.

Figure 1.2 : The organization of hypertext.

Links among pages, shown as directed arrows, connect an anchor on one page of hypertext to another hypertext page or a specific location on that page. These anchors are displayed as hotspots in a Web browser and often are highlighted or underlined (or both). The user often can select these by using a point-and-click interface.

Figure 1.2 shows a system of information that may be traversed in a nonlinear fashion. The user can select a link on a page and begin reading other pages. Alternatively, the user can skip or choose different links on a subsequent reading of the same information.

Text that is not constrained to be linear (a characterization of hypertext as described by Ted Nelson) is an accurate characterization of hypertext. Another important characteristic of Web hypertext involves the notion of boundedness. In its networked form, information on the Web, because it can be linked to other information written by other authors, is not bound to a single server or work. Thus, Web hypertext often exhibits the characteristic of not being bound or contained with a single work written by a single author. Instead, Web hypertext links to and augments its meaning from many other pages of text from all over the network. (Again, this refers to the Web in its global deployment, noting that it is possible to deploy Web software on non-networked or locally networked-intranet-systems.)

Similarly, any Web-based work is potentially a destination for an anchor on another page somewhere on the Web. This interlinking fosters highly enmeshed systems of thought and expression very different from static, stand-alone systems of hypertext encoded in some CD-ROMs or on a single computer host controlled by a single author.

Figure 1.3 illustrates how networked hypertext includes links that may cross the work boundary (the demarcation of which pages the creator(s) of a hypertext declare as constituting their "work"). In contrast, hypertext developed by a single author or team and deployed on stand-alone or static systems (for example, a CD-ROM) usually does not include links to other "works" outside its work boundary (note, however, that some CDs contain databases of Web references and must be used on a networked computer in order to retrieve those resources). Although developers of nonnetworked hypertext systems could include several hypertext works with links to other works on a single CD-ROM, no links ever could go "outside" the boundary of the CD-ROM itself. Therefore, it is considered to be a closed system. In contrast, networked hypertext forms an open, dynamic system in which links may extend far outside author control and arbitrary links from remote hypertext works may connect to a work.

Figure 1.3 : Networked hypertext versus stand-alone hypertext.

The Web's hypertext is written using HTML, an application of Standard Generalized Markup Language (SGML). SGML is an international standard (ISO 8879) for text information processing (see http://www.sgmlopen.org/). SGML enables formatting of information so that publishing systems or other applications easily can share information. HTML is defined by SGML and is intended as a semantic markup language, demarcating the structure of a document rather than its appearance (Part II, "Web-Development Processes," goes into HTML in detail).

Remember that the definition of the Web earlier in this section pointed out that the WWW is an information and communications system. The Web allows both information dissemination and information collection (through the Forms capability of the HTML). The Web therefore isn't merely a one-way system for disseminating information; it also includes the potential for interactive communication. By using Forms with gateway programming (which is explained in Part IV, "Gateway Programming"), web developers can create systems for user manipulation or for changing a hypertext structure. As an information-dissemination system, the Web can reach audiences of an arbitrary size: just the creator (for hypertext deployed only on a personal file system), a group (for hypertext deployed on a file system allowing group access), or a mass audience (for hypertext made publicly available on Web servers).

The WWW is used on the Internet computer network. Web software does not have to be deployed on the Internet; it can be used on an intranet as well. The WWW also does not need to be deployed on a network at all or use the Internet's protocols for data transmission. Web software can be used on a local-area network or an organization's campus-area network and made accessible only to those with access to these local file systems. In its most popular form, the Web is used on the Internet computer network with publicly available Web servers, giving worldwide access to information.

The Internet is not a single network-it is a patchwork of networks run by cooperating organizations. Based on a set of protocols known as the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite, the Internet uses a system of packet switching for data transfer. Growing from research originally funded by the U.S. Advanced Research Projects Agency (http://www.arpa.mil/) in the late 1960s and early 1970s, the Internet was designed to be highly robust in case one section of the network (or a computer host on the network) became inoperable. Packets simply could be transmitted over another route through the network because no one network path was essential (unless, of course, it was the sole link to a given computer host). As Figure 1.4 illustrates, a set of data can be sent over the Internet broken into discrete packets. Each of these packets can be sent (or re-sent, in the case of data corruption or loss) over different routes on the network and assembled (based on information encoded into the packets) in their proper order after arriving at the destination.

Figure 1.4 : Basic operation of the Internet's TCP/IP packet-switching protocols.

Although the TCP/IP protocol suite has served the Internet well for many years, developers are working on a new protocol for the Internet referred to as Internet Protocol Next Generation (IPNG). IPNG's formal name is Internet Protocol version 6 (IPv6); the current (1996) version of the Internet protocol is IPv4. IPNG is expected to support the Internet far into the future by making up for the problems with the current IPv4. One problem with IPv4 is that the address space for naming Internet hosts is filling up rapidly. IPNG will be interoperable with IPv4 when deployed, and it will allow for more host addresses. IPNG will work well with high-performance networks-particularly, Asynchronous Transfer Mode (ATM) networks. It also will work well with low-bandwidth (wireless) networks. IPv6 was formally made a proposed standard on September 18, 1995. (For more information on IPNG, see http://www.ietf.cnri.reston.va.us/html.charters/ipngwg-charter.html.)

The WWW uses data communications according to a client/server model. A client/server model for networked computer systems involves three components: the client, the server, and the network. A client is a software application that most often runs on the end user's computer host. A server is a software application that most often runs on the information provider's computer host. Client software can be customized to the user's hardware system and acts as an interface from that system to information provided on the server. The user can initiate a request for information or action through the client software. This request travels over the network to the server. The server interprets the request and takes some desired action. This action might include a database lookup or a change in recorded database information. The results of the requested transaction (if any) are sent back to the client for display to the user. All client/server communication follows a set of rules, or protocols, defined for the client/server system. Figure 1.5 summarizes these relationships, showing the flow of a request from a client to a server and the transmission of information from a server to a client. A client might access many servers employing the protocol(s) that both the server and client understand.

Figure 1.5 : A client/server model for data communication.

The distributed form of request and serve activities of the client/server model allows for many efficiencies. Because the client software interacts with the server according to a predefined protocol, the client software can be customized for the user's particular computer host. (The server doesn't have to "worry" about the hardware particularities of the client software.) A Web client (a browser) can be developed for Macintosh computers that can access any Web server, for example. This same Web server might be accessed by a Web browser written for a UNIX workstation running the X Window System. This makes developing information easier because there is a clear demarcation of duties between the client and the server. Separate versions of the information do not need to be developed for any particular hardware platform, because the customizations necessary are written into client software for each platform.

An analogy to the client/server model is the television broadcast system. A customer can buy any kind of television set (client) to view broadcasts from any over-the-air broadcast tower (server). Whether the user has a wristband TV or a projection screen TV, the set receives information from the broadcast station in a standard format and displays it as appropriate for the user's TV set. Separate TV programming for each kind of set is not necessary, such as for color or black-and-white sets or different-sized sets. New television stations will be able to send signals to all the television sets currently in use.

Web clients (browsers) can access multiprotocol communication. Web browsers have a multiprotocol capability, meaning that they can access a variety of servers providing information by using a set of rules (protocols) for communication. Web browsers and links within Web documents can reference servers by using the following protocols (this list contains only the most popular protocols):

HyperText Transfer Protocol (HTTP)  This is the "native" protocol of the Web, designed specifically to transmit hypertext over networks.

File Transfer Protocol (FTP)  This protocol allows a user to transfer text or binary files among computer hosts across networks.

Gopher  This protocol allows users to share information using a system of menus, documents, or connections to Telnet sessions.

Network News Transfer Protocol (NNTP)  This is the protocol for Usenet news distribution. Usenet is a system for asynchronous text discussion in topic subdivisions called newsgroups.

Telnet  This protocol is used for (possibly remote) logon to a computer host.

A Web browser serves as a Gopher client when it accesses a Gopher server and as a News client when accessing a Usenet news server, for example. Figure 1.6 shows the variety of client/server relationships possible on the Internet. Although many of the clients are specialized (a Gopher client can be used to access only a Gopher server, for example), Web clients (Netscape and Lynx, two popular Web browsers) can access many kinds of servers.

Figure 1.6 : Client/server relationships possible on the Internet.

The address referring to a document or resource available through the Web (or the Internet in general) is called a uniform resource locator (URL). A URL is formed in a particular syntax to express how a resource can be retrieved, including (possibly) information about the name of the host computer and the path name to the resource, as well as other information. For illustration, here are three URLs and their explanations:

http://www.w3.org/hypertext/WWW/TheProject.html  This URL refers to a Web server (indicated by the http at its start, which indicates the use of HTTP. The Web server www.w3.org contains a file called TheProject.html in the directory hypertext/WWW/. If it conforms to file name extension conventions, the file consists of HTML (because of the .html extension).

ftp://ftp.w3.org/pub/  This URL refers to a host (ftp.w3.org) that can be accessed using FTP. The URL refers to the pub/ directory on that computer host, so this reference is to a directory listing of files, directories, or possibly an empty directory.

news:comp.infosystems.www.misc  This URL refers to a Usenet newsgroup. After the user selects this URL, the Web browser retrieves the current set of article titles in the Usenet newsgroup comp.infosystems.www.misc, a group dedicated to the discussion of miscellaneous (misc) topics about the World Wide Web (www) computer (comp) information system (infosystems). Unlike the previous two URLs, this one does not refer to a particular host; it refers to the Usenet news server host defined by the user when the browser was installed. This Usenet news server generally is defined to be the news server on the user's local host or local network.

The naming of computer hosts on the Internet follows a hierarchical, numerical scheme. All hosts on the Internet have a specific Internet Protocol (IP) numeric address assigned to them based on a numbering hierarchy. Using the Internet Domain Name System (DNS), a correspondence is established between numeric IP addresses (for example, 128.113.1.5) and host names (for example, ftp.rpi.edu). The alphanumeric host names therefore are easier for humans to use and interpret. A host name is segmented by periods, and each string between the periods is an alphanumeric string. The right-most string is the top-level domain name. When domain names first were developed, most of them referred to U.S.-based hosts. Table 1.1 lists the type of organization to which each computer host is assigned.

Table 1.1. Selected high-level domain names.

Domain NameType of Host
com A commercial organization
edu An educational institution (generally, a university)
gov A government (generally, U.S. government) organization
mil A U.S. military organization
net A network access provider
org Usually, a not-for-profit organization

Added to this top-level domain identifier, an organization gets an organization name to prepend to the left of it. Based on this name, the organization can create other names, often following organizational hierarchies. The host name miller.cs.uwm.edu, for example, refers to the educational institution University of Wisconsin-Milwaukee (uwm), the Computer Science (cs) department, and the miller computer (UW-Milwaukee's machines traditionally are named after beers).

Because of the proliferation of Internet hosts throughout the world, another scheme for identifying Internet hosts using a two-letter country code as the top-level domain identifier was developed to increase the name space. The country code for the United States is us, and some U.S. sites now use the two-letter code rather than (or in addition to) the three-letter codes shown previously, as these examples show:

well.sf.ca.us  Refers to the Whole Earth 'Lectronic Link (well), an Internet service based in Sausalito, California. The subdivisions ca and sf to the left of the us domain further specify California and the San Francisco area.

www.birdville.k12.tx.us  Refers to the Web server of the Birdville Elementary (k12) school in Texas, in the United States.

www.cern.ch  Refers to the Web server of CERN in Switzerland (ch).

For a current listing of country codes, see

ftp://rtfm.mit.edu/pub/usenet/news.answers/mail/country-codes

Tip
You can find out who owns a particular domain name by using the page http://rs.internic.net/cgi-bin/whois or the UNIX whois command. Entering whois well.sf.ca.us at the UNIX prompt ($), for example, produces this:
whois well.sf.ca.us
Whole Earth Lectronic Link (WELL)
1750 Bridgeway, Suite A200
Sausalito, CA 94965-1900
Hostname: WELL.COM
Nicknames: WELL.SF.CA.US
Address: 206.15.64.10
System: SUN SPARCENTER 1000 running SOLARIS 2.3
Host Administrator:
Chen, Hua-Pei (HC24) hpc@WELL.COM
415-289-7551 (FAX) (415) 332-4927
Domain Server
Record last updated on 26-Sep-95.

Tip
A user sometimes can use the whois command to find out the domain name(s) that correspond to an organization's name. For example, entering
$ whois "McDonald's Corporation"
produces this output:
McDonald's Corporation (BIGMAC-HST) BIGMAC1.MCD.COM 152.140.28.201
McDonald's Corporation (NETBLK-MCDNET) MCDNET
192.65.204.0 - 192.65.210.0
McDonald's Corporation (NETBLK-MCDONALDS-BNETS) MCDONALDS-BNETS
152.140.0.0 - 152.142.0.0
McDonald's Corporation (COMPANY-JOE-DOM) COMPANY-JOE.COM
McDonald's Corporation (COSMC-DOM) COSMC.COM
McDonald's Corporation (FRYGIRLS-DOM) FRYGIRLS.COM
McDonald's Corporation (FRYGUYS-DOM) FRYGUYS.COM
McDonald's Corporation (FRYKIDS-DOM) FRYKIDS.COM
McDonald's Corporation (GOLDEN-ARCHES-DOM) GOLDEN-ARCHES.COM
McDonald's Corporation (GRIMACE-DOM) GRIMACE.COM
McDonald's Corporation (HAMBURGLER-DOM) HAMBURGLER.COM
McDonald's Corporation (HAPPYMEAL-DOM) HAPPYMEAL.COM
McDonald's Corporation (HEARTH-EXPRESS-DOM) HEARTH-EXPRESS.COM
McDonald's Corporation (LORICO-DOM) LORICO.COM
McDonald's Corporation (MAYOR-MCCHEESE-DOM) MAYOR-MCCHEESE.COM
McDonald's Corporation (MCBABY-DOM) MCBABY.COM
McDonald's Corporation (MCBUDDY-DOM) MCBUDDY.COM
McDonald's Corporation (MCCHICKEN-DOM) MCCHICKEN.COM
McDonald's Corporation (MCDONALDLAND-DOM) MCDONALDLAND.COM
McDonald's Corporation (MCDONALDS-DOM) MCDONALDS.COM
McDonald's Corporation (MCEXPRESS-DOM) MCEXPRESS.COM
McDonald's Corporation (MCFOLKS-DOM) MCFOLKS.COM
McDonald's Corporation (MCFOOD-DOM) MCFOOD.COM
McDonald's Corporation (MCHAPPY-DOM) MCHAPPY.COM
McDonald's Corporation (MCKID-DOM) MCKID.COM
McDonald's Corporation (MCKIDS-DOM) MCKIDS.COM
McDonald's Corporation (MCMENU-DOM) MCMENU.COM
McDonald's Corporation (MCNUGGETS-DOM) MCNUGGETS.COM
McDonald's Corporation (MCSTOCK-DOM) McDonald's Corporation (MCSTOP-DOM) MCSTOP.COM
McDonald's Corporation (MCTOY-DOM) MCTOY.COM
McDonald's Corporation (MICKEYD-DOM) MICKEYD.COM
McDonald's Corporation (MICKEYDS-DOM) MICKEYDS.COM
McDonald's Corporation (ARCHDELUXE-DOM) ARCHDELUXE.COM
McDonald's Corporation (QUARTERPOUNDER-DOM) QUARTERPOUNDER.COM
McDonald's Corporation (RMHC-DOM) RMHC.COM
McDonald's Corporation (RONALD-HOUSE-DOM) RONALD-HOUSE.COM
McDonald's Corporation (RONALD-MCDONALD-HOUSE-DOM) RONALD-MCDONALD-HOUSE.COM
McDonald's Corporation (RONALD-MCDONALD2-DOM) RONALD-MCDONALD.COM
McDonald's Corporation (SPEEDEE-DOM) SPEEDEE.COM
McDonald's Corporation (MCD-DOM) (MCD-DOM) MCD.COM
Note that the whois service is limited (mostly) to U.S. domain names and nonmilitary domain names.

Web clients (browsers) also can access hypermedia. Similar to the way Ted Nelson characterized hypertext as text that is not constrained to be linear, he characterized hypermedia as hypertext that is not constrained to be text. Hypermedia can include graphics, pictures, movies, and sounds (multimedia). Because Web hypertext includes multiprotocol links and networked communications, the result is that the Web (in its global, networked sense) is networked hypermedia, or hypermedia that is not constrained to a single information server. Figure 1.7 summarizes the relationships in networked hypermedia, showing possible links from a hypertext page to hosts running servers of various protocols, as well as links to documents in various media such as text, sound, graphics, and movies.

Figure 1.7 : The Web's organization as networked hypermedia.

Hypermedia access is facilitated by helper applications. Helper applications include software the Web browser invokes to display multimedia information to the user. In order for the user to view movies, for example, the Web browser must have movie-display software installed and available. To display graphical images in an HTML document, the Web browser must be graphical-that is, it must employ a system such as X Window System, Macintosh operating system, or Microsoft Windows as a graphical user interface. Some Web browsers are text-based (for example, the original www browser from CERN); however, most modern browsers are graphical and are widely available for a variety of platforms.

Also grouped in the category of helper applications are special language interpreters. The Java programming language, for example, is used to create interactive content that can be downloaded and viewed using browsers that are Java-enabled. Essentially, this enabling requires that a version of the browser that includes a Java interpreter has been created. See http://www.javasoft.com/ for more information on Java and the latest ports.

The key element of this definition of the Web is that, as it is used for global information distribution, the Web = Hypertext + Multimedia + Network:

Hypertext is the basis for associative linking.

Multimedia presents data and information in multiple formats and senses (sight and sound).

The network is the essence of global reach.

The Role of the Web within Cyberspace

As an application that uses the Internet, the Web has a role within the larger context of all on-line communications. Because much data communication using the Web relies on Internet protocols, the best way to take advantage of all the Web's information and qualities is by having direct Internet access. Chapter 3 discusses options for Internet access as well as Web access and delivery options in detail. This section presents the role of the Web in cyberspace as a way to help the developer understand the Web as a network communications system, noting the communication gateways that allow transfer of data among networks and the information spaces defined by protocols.

The Topology of Cyberspace

Cyberspace refers to the mental picture a person generates from experiencing computer communication and information retrieval. The science fiction author William Gibson coined this term to describe the visual environments in his novels. In Gibson's worlds, computer users navigate a highly imaginative landscape of global network information resources and services. The term cyberspace is used today to refer to the collection of computer-mediated experiences for visualization, communication, interaction, and information retrieval. Cyberspace can be considered to be the largest context for any activity performed on-line or through computers. Examples of activities in cyberspace include a doctor using a virtual reality helmet for visualizing a surgical operation, a student reading a newspaper on-line, and a teacher presenting class materials through the Web.

The infrastructure for cyberspace consists of a wide variety of global networks as well as nonnetworked systems for communication and interaction. In the broad definition of cyberspace given here, for example, people using CD-ROM applications on their computers can be considered to be interacting in cyberspace, although the computers the users have might not be connected to a global (or a local) communications network. These off-line activities are one portion of cyberspace that is unreachable from the networked region of cyberspace (such as the Internet and other global networks). Because, by definition, the off-line region involves no network communication (via a wire or wireless), there is a "wall" in cyberspace that separates activities in the networked region from activities in the nonnetworked region.

Because using the Web in its global form requires on-line communication, this discussion now focuses on the topology of the on-line region of cyberspace. In this on-line region, thousands of networks and systems exist worldwide that enable users to communicate and exchange information. These systems and networks might use different protocols for exchanging information and different conduits for transmitting messages (everything from copper wire to fiber-optic cable, satellites, and other wireless communications systems). These networks also might vary in size from room-sized personal area networks (PANs) involving networked personal communications devices, such as hand-held digital assistants or personal-identification medallions, to world-sized global area networks (GANs), such as the Internet. Between these two size extremes, rooms and buildings may be connected in local area networks (LANs), cities in metropolitan area networks (MANs), and large organizations or regions in wide area networks (WANs) or region area networks (RANs). As technologies evolve, new possibilities open up for creating still more kinds of networks in on-line cyberspace.

The Internet and the Web within Cyberspace

Within the large context of global, on-line cyberspace, many computer networks enable people to exchange information and communicate. The Internet, as discussed previously, refers to one system for global communications and information dissemination. Internet information applications also are the basis for much information retrieval on the Web. In this way, the Web can be considered as located within (or on top of) the Internet. The Web is not a computer network like the Internet; it is an application that uses Internet tools as its means of communication and information transport.

Because the Internet is so central to the Web's operation, a Web navigator (someone who uses the Web for information retrieval or communication) needs to know something about the Internet's place in on-line cyberspace. One key to navigating on-line cyberspace is to understand how communication takes place among the networks. Because each on-line network may use a different set of protocols, communication among networks is not necessarily automatic, simple, or even possible.

The Internet is a very popular network in on-line cyberspace because of its many resources and large base of users. The Internet therefore often acts as a common ground for communications and activity, and many on-line networks have some way (through gateways or other connections) for their users to reach the Internet.

The Internet's role as a common ground in on-line cyberspace draws other networks to make connections to it. Commercial on-line services such as Prodigy, America Online, CompuServe, and Delphi provide users with access to global information systems. These commercial services might use different protocols for communication, however, so their users might not be able to directly access all the Internet protocols (or services on the other systems). These services also might provide graphical interfaces to their on-line services, and these graphical interfaces are not necessarily views of the Internet or Web resources. Many commercial networks do, however, offer a range of connections to the Internet. Most commercial services provide electronic mail gateways to the Internet (and hence to each other through the Internet). Also, commercial services are providing gateways to the Web. Prodigy was the first commercial service to provide direct access for its users to the Web. Other commercial services are expected to follow.

Just like users of some commercial on-line services, users of global networks other than the Internet sometimes can't easily access the Internet or Web. FidoNet (named after its creator, Tom Jennings' computer, which was a mongrel collection of a variety of computer parts) is a network of personal computers worldwide that exchanges information by modems and phone lines. BITNET (Because It's Time Network) and UUCP (UNIX-UNIX Copy Protocol) are other networks used for exchanging information among users. Users of these networks can't directly access the Web (except in limited ways-for example, through electronic mail interfaces).

Many of these global networks provide electronic mail gateways to the Internet, however. Figure 1.8 depicts the topology of on-line cyberspace, showing some major networks and gateways to the Internet. All the gateways shown are for electronic mail or Usenet news feeds, with the exception of the gateways from commercial services. Gateways from commercial services now include Telnet, Usenet, FTP, and full Web access (contact the individual service provider to verify its services available to access the Internet). Other services are merging with the Internet. The French TeleTel system (popularly known as Minitel), for example, now has a connection to the Internet.

Figure 1.8 : A topology of on-line cyberspace.

Gateways Among Networks

In many cases, there is no way to exchange information directly among the networks of cyberspace. The worldwide system for exchanging banking transactions is not accessible from the Internet (for obvious security reasons). In other cases, there is some level of connection among these large networks. BITNET and Internet users, for example, can exchange electronic mail through gateways built for that purpose. Similarly, many commercial services provide e-mail gateways from their services to the Internet. Figure 1.8 shows some of the electronic mail gateways that exist among the networks of cyberspace. Note how many global networks provide some connectivity to the Internet; this connectivity makes the Internet the common ground of cyberspace.

For Web navigators, the key to remember is that the Web can't easily be experienced except through direct Internet connectivity. Because not all networks have gateways to the Internet for all the protocols the Web uses, it often is very difficult for a non-Internet user to use the Web. Users of networks without the gateways for all Web protocols must rely on electronic mail or Telnet access to the Web (see Chapter 3 for these options).

Terminology
When reading about cyberspace, you may find the following brief definitions of its regions helpful:
The Matrix  The set of all networks that can exchange electronic mail directly or through gateways. This includes the Internet, BITNET, FidoNet, UUCP, and commercial services such as America Online, CompuServe, Delphi, Prodigy, and other networks. This term was coined by John S. Quarterman in his book, The Matrix (Digital Press, 1990).
The Net  An informal term for the Internet or a subset (or superset) of the Matrix in context. A computerized conference via e-mail, for example, may take place on a BITNET host that has an Internet gateway, making the conference available to anyone on either of these networks. In this case, the developer might say, "Our conference will be available on the Net." One might even consider discussion forums on commercial on-line services to be on the Net, although these are not accessible from the Internet.
The Web  Used in its strictest sense, the Web refers to all the documents on all Web servers worldwide. In a broader sense, it refers to all accessible documents (on FTP and even Gopher servers) accessible through a Web browser. This broader meaning includes FTP space and Gopher space. It would be misleading, however, for information developers to say, "We put the documents on the Web," when they have placed them only on an FTP server (as opposed to placing the documents on a Web server). Although FTP documents are accessible by Web browsers, the audience for the preceding statement might be misled to believe that the documents are on a Web server and perhaps in hypertext. A single Web server with its associated files can be called a web (with a lowercase w). You might say, "We're going to have to make a web to describe the new system," for example (web refers to a single, local web). By contrast, in the statement, "We'll put the documents on the Web," Web refers to the global collection of publicly accessible Webs and indicates the speaker's intention to make the local web widely known and publicly available.
The Internet  The Internet is the cooperatively run, globally distributed collection of computer networks that exchange information via the TCP/IP protocol suite. The Internet consists of many internetworked networks, called intranets (with a lowercase i). An intranet is a single network that uses the TCP/IP protocol suite, and some intranets are not connected to the global Internet.
FTP space  The set of all resources accessible through the File Transfer Protocol. These resources include directories of files and individual files that may be text or binary (executable files, graphics, sound, and video) files.
Gopher space  The set of all resources accessible through the Internet Gopher protocol. A Gopher is a system for organizing information in terms of menus. Menu items can be links to other documents or information services.
Usenet  This is not a network at all, but a system for disseminating asynchronous (time-delayed) text discussion among cooperating computer hosts. Usenet is not limited to the Internet. Its origins are in UUCP (UNIX-UNIX Copy Protocol) systems, but Usenet is disseminated widely throughout the Internet, the Matrix, and beyond.

You can keep Figure 1.8 in mind as a basic operational chart, remembering the following:

Cyberspace consists of an off-line region and an on-line region. The on-line region consists of many different local and global networks.

The Internet is a collection of networks in on-line cyberspace. Because the Web links Internet resources, the Web can be considered as "located" within (or on top of) the Internet.

Users of networks can exchange electronic mail or other information through gateways.

Because most implemented gateways among networks are for electronic mail only, it is easiest to use the Web from the Internet. Some commercial on-line services provide full Web access.

A user of the Web might encounter many references to non-Internet activities and other networks in cyberspace. Remember that these activities might not be directly accessible from the Internet. Eventually, gateways might be built from these other networks to support the protocols necessary for full Internet connectivity.

The Web within the Internet

Now that you've examined the role of the Internet and Web as one part of the on-line region of cyberspace, you'll examine the Web's role with the Internet itself. The power of the Web is that it links Internet resources through a system of hypertext.

From a user's point of view, the Web consists of resources on the Internet that are accessible through a Web browser. The Web connects these resources through hypertext written using HTML. Files containing text marked using HTML are located on a Web server and are available for Web browsers (clients) to access. The HTML file contains links to other Internet resources. Figure 1.9 illustrates the connections among an HTML document to other Internet resources and sample relationships among the Web browser, information servers, and files located on the servers.

Figure 1.9 : The Web within the Internet.

The resources shown in Figure 1.9 include a remote logon to a host through the Telnet protocol, a link to a text file on an FTP server, a link to a menu on a Gopher server, and a link to another HTML document on another Web server. Thus, the Web links disparate resources scattered across the Internet.

Information Spaces in the Web

The Web's linking relationship with Internet resources is one of its chief characteristics. The Web's scheme for referring to these Internet resources creates a structure of information spaces.

Uniform Resource Locators

The basis for referring to resources on the Web is the uniform resource locator (URL). A URL consists of a string of characters that uniquely identifies a resource. A URL is like a catalog number for a resource. When a Web browser opens a particular URL, the user gains access to the resource referred to by that URL.

The basic format for many (but not all) URLs follows:

scheme://host:port/path

Explanations of the syntax follow:

scheme  One of the rules or protocols to retrieve or send information, such as FTP, NNTP, Gopher, Telnet, and others.

host  The computer host on which the resource resides.

port  A particular number that identifies the service requested from the server. This number is provided if the service is installed on a port that is different from the standard one for that service.

path  An identification of the location of a resource on a particular computer host.

A Web navigator will encounter other variations in format. The URL

news:comp.infosystems.www.misc

for example, refers to a Usenet newsgroup.

The URL

telnet://locis.loc.gov

refers to a Telnet connection to the U.S. Library of Congress's on-line catalogs and databases. When a Web browser opens this URL, a Telnet session begins (a session in which the user can log onto a remote computer host).

The URL

http://www.december.com//works/wwwu/contents.html#part3

refers to a particular section of a hypertext page. The page resides on the host www.december.com and has the path name of works/wwwu/contents.html. The #part3 at the end of the path name for the file indicates that this URL will cause the Web browser software to go to a specific place within the file labeled with the anchor named part3. (Part II of this book explains how to construct and name these anchors.)

The URL

http://www.ncsa.uiuc.edu/SDG/Experimental/demoweb/marc-global-hyp.au

is an audio file (.au extension) located on a server demonstrating the Mosaic browser's capabilities. This sound file, when accessed by a browser (provided that the user has the appropriate audio player software and hardware installed in the computer) produces a voice message.

The URL

http://uu-gna.mit.edu:8001/uu-gna/index.html

refers to the home page of the Globewide Network Academy-an organization dedicated to creating a fully accredited on-line university. Note that this URL has a port number (8001) specified by the developers of this page. The standard port number for HTTP access is 80. Therefore, if a port not equal to 80 is set for HTTP access, a user should use it in the URL. If the user leaves off the port number, the following error message is generated:

Requested document (URL http://uu-gna.mit.edu/uu-gna/index.html) could not be accessed. The information server either is not accessible or is refusing to serve the document to you.

All URLs share the same purposes, however. When used in a Web document, a URL refers to a resource in hypertext anchors displayed by Web browsers. When opened by a user in a Web browser, a URL causes the resource to which it refers to be retrieved across the network and displayed in the Web browser. In the future, other forms of addressing will play a role on the Web (see http://www.w3.org/pub//WWW/Addressing/Addressing.html).

Web Hypertext Terminology
Although the concept of hypertext and its actual use in computer systems has been around a long time, terminology for Web-related hypertext elements is evolving, both in formal definitions and informal usage. These terms often are used when talking about Web-based hypertext:
Page  Refers to a single sheet of hypertext (a single file of HTML).
Home page  Refers to a designated entry point for access to a local web. Also refers to a page that a person defines as his or her principal page, often containing personal or professional information.
Hotspot  The region of displayed hypertext that, when selected, links the user to another point in the hypertext or another resource.
web (lowercase w)  A set of hypertext pages considered a single work, often located on a single server. In popular usage, it is synonymous with home page.
Web (uppercase W)  The set of hypertext on Web servers worldwide; in a broader sense, all information available through a Web browser interface.

Key Resources
To learn more about URLs, see "Uniform Resource Locators."
http://www.w3.org/hypertext/WWW/Addressing/URL/Overview.html
Theise, Eric S. (1994 January 7). "Curling Up to Universal Resource Locators." gopher://gopher.well.sf.ca.us/00/matrix/internet/curling.up.02

Information Spaces

URLs point to information spaces on the Web based on the information protocol used. All FTP URLs can be considered to exist in FTP space, for example-the set of all servers publicly available for anonymous FTP. This space is just one region of the Internet's resources, but it represents a vast repository of knowledge to which the Web can connect. Not only does a URL identify the protocol used for the information, but a URL also often identifies the type of media represented by the resource. For example, the URL shown previously is an audio file:

http://www.ncsa.uiuc.edu/SDG/Experimental/demoweb/marc-global-hyp.au

Similarly, there are file name extensions for movies (MPEG) as well as many kinds of graphics (such as GIF, JPEG, and XBM) and text files (such as TXT, PS, and TEX). (Multimedia issues are covered in detail in Chapter 15, "Multimedia.") In this way, a URL can identify the sensory experience a resource may offer. Information spaces on the Internet thus can be considered multimedia spaces. A good source of information about multimedia information on the Web is Simon Gibbs' Index to Multimedia Information Sources at http://viswiz.gmd.de/MultimediaInfo/.

Techniques for using URLs and writing HTML are covered in more detail in Parts II and III. A Web user should remember that the URL is the basis for some tasks in Web navigation. A URL is used to call up a specific resource in a browser, and URLs are used within HTML documents to create links to Internet resources.

Communication Contexts on the Web

Communications on the Web can take many forms and can occur in many contexts. Genres, or traditional ways for communicating using a form of a communications medium, have evolved on the Web. These genres correspond, in many ways, to off-line human communication contexts:

Interpersonal  The Web provides a way for users to create a home page, which typically conveys personal or professional information. The practice of creating a home page emerged from the technical necessity of defining the default page a Web browser displays when requesting information from a Web server when only the host name or a host and directory name is given. Home pages traditionally are the top-level page for a server, organization, or individual. When created by individuals, home pages often reveal detailed personal information about their authors and are listed in directories of home pages. Also, individuals often follow the tradition of linking to colleagues' or friends' pages, creating electronic tribes (mathematically, these electronic tribes are defined by the cliques of home pages in the directed graph describing the Web). When used interpersonally, personal home pages offer one-to-one communication, although the technical operation of all pages on the Web is one-to-many.

Group  As described previously, cliques of personal pages can define a particular Web tribe or group. Similarly, people can form associations on the Web that are independent of geography and focused on interest in a common topic. Subject-tree breakdowns of information on the Web (see the following section's discussion about locating subject-based information on the Web) often evolve from collaborative linking and the development of resource lists and original material describing a subject. Similarly, groups of people associate on the Web based on common interests in communication (a professional association, for example, that has a Web server to announce conferences or calls for participation in its publications). Web groups also can form around a focus on interaction based on social or professional discourse or symbolic exchange (perhaps nontextual) intended to define and indicate relationships in such "play" systems such as Web interfaces to Multiple User Dialogue/Object Oriented/Simulations (MU*s) or Web-based chat or conferencing systems.

Organizational  Many of the initial Web servers appearing on the Web belong to an organization, not an individual, so the home page for a server often identifies the institution or organization that owns the server. In this way, the genre of the Campus-Wide Information System (CWIS) evolved on Web servers of educational institutions. Similarly, commercial, governmental, and nongovernmental organizations have followed the pattern established by CWISs to a large degree.

Mass  Just as other media have been used for one-to-many dissemination of information (newspapers, radio, television), the Web also is used for mass communication. Many commercial and noncommercial magazines and other publications are distributed through the Web. Moreover, as noted previously, all publicly available Web pages are potentially readable to anyone using the Web, and thus are potentially one-to-many communications.

The key concept to understand is that the Web as a communications system can be flexibly used to express a variety of communications. The classification of the communication (in the categories listed previously) depends on who is taking part in the communication. The exact classification of any expression on the Web can be blurred by the potentially global reach of any Web page. Thus, a personal home page may be used interpersonally, but it may be accessed far more times on the Web than a publication created and intended for mass consumption. Chapter 2, "A Developer's Tour of the Web," explores these communication contexts in detail and gives examples of each.

Web Navigation Summary

Along with a basic familiarity of the Web's components and its role in cyberspace, a Web developer should know how to navigate (find information on) the Web. Web navigation involves a variety of techniques and applications. This section presents a summary of these techniques.

Searching the Web by Subject

Users often might want to learn about a subject without necessarily having a precise idea of the specific topics to study. Users wouldn't necessarily want to use keyword-searching techniques, because they might not yet have a specific set of keywords or concepts with which to search. Instead, the goal might be to find resource collections that present broad categories of information organized according to subjects, topics, and subtopics. In this way, users can find general descriptive information about a subject and then refine the search to more specific topics.

There is no single source for subject-oriented information on the Web, although there are some very complete collections. A few key places on the Web provide excellent jumping-off points. Here are the best of them:

Yahoo!  (http://www.yahoo.com/)  Yahoo! is a very large collection of Web links arranged into a hierarchical database. It is the most complete subject breakdown of Web information. Although not perfectly organized, it is an edited compilation of references and resources on many topics. Figure 1.10 shows the opening page of Yahoo!.

Figure 1.10 : The Yahoo! Web site (Courtesy of Yahoo!, Inc.).

The WWW Virtual Library  (http://www.w3.org/vl/)  This is the oldest subject tree on the Web; it contains hundreds of topics collaboratively maintained by many people. This tree is an early outgrowth of the initial Web development at CERN and is an excellent source of subject-oriented information. Individual pages of the WWW Virtual Library are maintained by many people-often by people who are experts in their fields. Therefore, the WWW Virtual Library is rich in content as well as extent.

Usenet FAQ Archives  (ftp://rtfm.mit.edu/pub/usenet/)  The global, asynchronous, text-conferencing system known as Usenet has grown very quickly over the years since its inception in 1979 as a project of two graduate students at Duke University-Jim Ellis and Tom Truscott. Today, Usenet newsgroups number in the thousands, covering a wide range of topics on just about every human pursuit or subject imaginable. Participants in Usenet newsgroups contribute articles to ongoing discussions. These articles propagate through the Matrix (not just the Internet) so that others can read and respond to them. This process of discussion is ongoing-some newsgroups experience hundreds of new articles per day. Because articles eventually expire (they are deleted from the local systems on which they are stored), information within the individual articles eventually can be lost. Long-time participants in the newsgroup often can face the same questions and discussions from new users over and over again. It is from this need to transmit accumulated knowledge that the tradition of Frequently Asked Question (FAQ) lists arose. The archives on the machine at rtfm.mit.edu provide a rich view of Usenet information space broken down into a subject hierarchy corresponding to the Usenet newsgroup name.

The Clearinghouse for Subject-Oriented Internet Guides  (http://www.clearinghouse.net/)  Another subject-oriented collection is at the University of Michigan. Developed by Louis Rosenfeld, the Clearinghouse provides a collection of guides in many areas outside of newsgroup subject divisions. Like the Usenet FAQs, the Michigan collection is arranged by subject; however, the Michigan collection's intent is to gather guides that help people discover Internet resources about a subject. Thus, the Clearinghouse guides are very useful for locating information about a subject all over the Internet.

Galaxy  (http://www.einet.net/galaxy.html)  Although the WWW Virtual Library is essentially a noncommercial, cooperative venture, EINet's Galaxy is offered to the Web for free, courtesy of and supported by a commercial network services company, Tradewave. Galaxy enhances Tradewave's reputation as a provider of network information and communications products and services while contributing a valuable public service to the Web community. Galaxy, like the WWW Virtual Library, is a hierarchical organization of subjects, arranged in broad subject categories listed alphabetically, with links from the front page to other pages containing further information. Unlike the Virtual Library, however, Galaxy provides a search mechanism for finding entries in the entire Galaxy Web as well as direct access to other keyword search mechanisms.

Others  Many other subject breakdowns of the World Wide Web exist (see http://www.december.com/cmc/info/internet-searching-subjects.html). In fact, subject-oriented searching trees grew tremendously during 1995-1996. Many of the newer ones (for example, Magellan at http://www.mckinley.com/ or Point at http://www.pointcom.com/) also include their own rating system for Web resources. Whether these ratings represent meaningful values or just a clever way of getting the "winners" of high ratings to link back to the "rater" site has not been settled. Many users in the Web community don't take the ratings these sites offer very seriously.

Searching the Web by Keyword

If your goal is to find a specific piece of information but not necessarily the contextual or related information that might be available through a subject-oriented search, a good strategy is to use keyword searching techniques.

A general term for keyword searching tools on the Web is spider. Spiders constitute a class of software programs that wander through the Web and collect information about what is found there. (Other terms used for these tools are robots and wanderers.) Some spiders crawl the Web and record URLs, creating a large list that can be searched. Other spiders look through HTML documents for URLs and keywords in title fields or other parts of the document.

Here are the Web's major spiders and keyword indexes:

AltaVista  (http://www.altavista.digital.com)  This site has a very large database and includes an advanced query form for very effective searches. You can search the Web or Usenet news articles for keyword matches with advanced searching options, including Boolean operators (and, or, not) as well as field selectors (such as url:, host:, and title:). Output is based on relevance and can be set to detailed, compact, or count format. The documentation is helpful and includes good examples. Figure 1.11 shows the AltaVista home page.

Figure 1.11 : The Alta Vista site (Courtesy of Dave Price, Digital Equipment Corporation).

Excite  (http://www.excite.com)  This site has a large and diverse database, including the Web, Usenet, classified ads, or Web site reviews. You can search by concepts or keywords. This site is easy to use and powerful. The search language is "plain English," however, and doesn't allow for sophisticated, advanced searches or much filtering. Output is ranked according to relevance and can be sorted by how well the search engine scored the relevance of the document to your search or site. The Web site also includes a subject breakdown of the Web, news headlines from Reuters, a cartoon, and columns. The documentation is helpful and explains good strategies for making your search more effective.

Open Text  (http://www.opentext.com)  This is a very large database of Web pages with a variety of searching capabilities. You can search by keyword phrases or simple Boolean expressions. The Power Search option enables you to use some filtering. You can use the Weighted Search option for further customization. Output is supposedly by relevance, but without using the more sophisticated searching options, the output tends to contain many irrelevant matches.

infoseek  (http://www.infoseek.com)  The infoseek guide service includes a diverse set of databases (Web pages, Usenet newsgroups, Usenet FAQs, reviewed pages, or topics). This site includes a subject-tree breakdown of the Net, but it is somewhat arbitrary and incomplete. Instructions are included, but the advanced searching options involve a syntax that is difficult to understand, remember, and use. The infoseek professional service allows you to search wire services, business periodicals, and more, but this professional service requires that you open an account and pay a fee to use it.

NlightN  (http://www.nlightn.com)  This offers a large, diverse database of Web pages, hundreds of specialized databases, news, and references, but the interface is difficult to use. Use requires a (free) user ID and logon password. NlightN supposedly links to Lycos spider for its Web database, but sample searches failed to turn up matches via Lycos through the interface. Searches result in links to full text of news and other articles that you then can purchase.

Lycos  (http://www.lycos.com/)  The search interface to Lycos' databases provides a way for users to locate documents that contain references to a keyword and to examine a document's outline, keyword list, and excerpt.

Although many early Web spiders infested a particular server with a large number of rapid, sequential accesses, Lycos and other modern-day Web spiders use a random-search behavior to avoid hitting the same server repeatedly in a short period of time. Lycos also complies with the standard for robot exclusion (see http://web.nexor.co.uk/ mak/doc/robots/robots.html) to keep unwanted robots off WWW servers and identifies itself as Lycos when crawling so that Web masters know when Lycos has hit their servers.

Others  See http://www.december.com/cmc/info/internet-searching-keyword.html/.

Searching the Web by Geographical Space

Frequently, a user looking for a particular Web server will know its geographic location. A variety of Web applications allow users to view Web resources organized in graphical maps or geographically organized listings of servers:

The Virtual Tourist I  (http://www.vtourist.com/webmap/)  This server, developed by Brandon Plewe, serves as a visual interface into the geographic distribution of WWW and other network information servers. By clicking a symbol or boxed region, the user obtains more information about that region. Figure 1.12 shows the top-level map of Earth, showing the regions that contain further information. By clicking these regions, a user can continue to zero in on a geographic location, eventually obtaining a map of servers, such as that shown in Figure 1.13.

Figure 1.12 : The Virtual Tourist I home page (courtesy of Brandon Plewe, printed by permission).

Figure 1.13 : The Virtual Tourist showing New York Web servers (courtesy of Brandon Plewe, printed by permission).

The Virtual Tourist II/City.Net  (http://www.vtourist.com/vt/)  Like the Virtual Tourist I, City.Net presents information about geographical locations in clickable maps. City.Net focuses on tourism and city information.

CityLink  (http://usacitylink.com/)  Like the Virtual Tourist II/City.Net, CityLink gathers geographical information about locations on a Web with clickable maps as an interface. CityLink focuses on United States cities and, as a service, CityLink develops Web material for cities.

Searching the Web by Information Space

An information space is the set of all information worldwide available on information servers of a particular protocol. Gopher space consists of all the information and files on publicly available Gopher servers worldwide, for example. Each information space presents its data in its own format, and each information space can be thought of as being defined by the collection of all information on all servers of that type. To search information spaces by server machines, you need to find a monster list of all the servers. (These lists are called monster lists because they often are extremely long.) Here are those lists:

FTP Space  (http://www.iaehv.nl/users/perry/ftp-list.html)  This is a list of FTP sites on the Internet. It also is maintained at ftp://rtfm.mit.edu/pub/usenet/news.answers/ftp-list/. This list is maintained by Perry Rovers and is a useful catalog of servers providing File Transfer Protocol access.

Gopher Space  (gopher://gopher.micro.umn.edu/)  A user can browse a monster list of Gophers by geographic region from the Minnesota Gopher by selecting Other Gopher and Information Servers. By tradition, most Gophers offer a similar option to browse Gopher space through a geographic breakdown.

Telnet Space: Hytelnet  (http://library.usask.ca/hytelnet/)  Hytelnet also organizes Telnet-accessible resources by geography (as well as other subjects). Hytelnet, developed by Peter Scott, is particularly useful for services that adopted on-line technology early, such as libraries and community-based FreeNet systems, because Telnet is a widely available interface for dial-up modem users.

Web Space  (http://www.w3.org/hypertext/DataSources/WWW/Servers.html)  The list of Web servers at CERN is organized by geography (and used with the Virtual Tourist I). Web space itself also can be organized according to machine name.

Searching the Web by People Space

Geographic directories-such as the Virtual Tourist I and II, City.Net, and CityLink-fill a need for organized, geographically based information about information servers and tourist information. A similar need exists for directories to help users find people. Although keyword and even subject-oriented searching methods might locate people (through common interests tied to keywords or subject-oriented resource collections), it sometimes is useful to be able to search for a person within directories of home pages or White Pages directories. This section lists phonebook-style directories, as well as collections of home pages that are helpful for finding people on the Web.

Phonebook-style directories:

Switchboard  (http://www.switchboard.com/)  A database of names of individuals and businesses, with phone numbers and postal addresses. Users can add their electronic mail address or a URL to their listings. This database is very extensive (it contains more than 90 million entries) and nationwide. It is an excellent way to find people who have telephones. Figure 1.14 shows the Switchboard home page.

Figure 1.14 : The Switchboard site (courtesy of Elizabeth Broadhurst, Switchboard Web master).

WhoWhere?  (http://www.whowhere.com)  The directory to use for finding people who have an e-mail account. It is a comprehensive White Page service for locating people and organizations on the Net.

X.500 Directory Services  (http://ds.internic.net/ds/dspgx500.html)  The goal of the X.500 White Pages project is to provide a way for institutions to offer White Pages information. This collection links to many organizations participating in this project.

Open Market's Commercial Sites Index  (http://www.directory.net/)  Lists commercial services and products. Users can search the directory by keyword or by alphabetical listings.

Home page collections:

GeoCities  (http://www.geocities.com)  Web homesteads available in themed collections.

Housernet  (http://www.housernet.com)  A directory of home pages. You can search the database by marital status, age, and gender.

Netizens  (http://nearnet.gnn.com/gnn/netizens/index.html)  The Global Network Navigator's Internet Center Netizen's project, which is a directory of home pages written by GNN users.

People Page  (http://www.peoplepage.com)  Offers a large collection of personal home pages, listed alphabetically.

People-Yahoo!  (http://www.yahoo.com/Entertainment/People/)  Yahoo!'s section on personal home pages, from the Yahoo! subject tree. It contains an alphabetical listing of people who have registered their home page with Yahoo!.

Personal Pages World Wide  (http://www.utexas.edu/world/personal/index.html)  A metacollection of institutional collections of personal pages worldwide, from the University of Texas at Austin.

Who's Online  (http://www.ictp.trieste.it/Canessa/whoiswho.html)  Offers a collective database of non-commercial biographies of people on the Net.

Who's Who  Who's Who on the Internet or The Complete Home Page Directory of Internet.

Personalities  (http://web.city.ac.uk/citylive/pages.html)  Kirk Bowe's Who's Who on the Internet, from CityLive! magazine.

World Alumni Page  (http://www.infophil.com/World/Alumni)  Provides an automated alumni e-mail registry and bulletin board for colleges and high schools all over the world.

Searching the Web for Software

The Web is also a good distribution mechanism for software-particularly, freeware and shareware. Here are some important sites to keep in mind:

shareware.com  (http://www.shareware.com/)  An enormous site sponsored by c|net, Inc. It helps you locate all kinds of shareware from an enormous database. You also can get tips here and some information about some of the best shareware software.

World File Project  (http://www.filepile.com/)  Provides access to shareware files on Exec-PC. It includes an indexed collection of shareware files, including drivers, applications, games, or pictures.

ZD Net Software Library  (http://www.zdnet.com/zdi/software/index.html)  
A collection of shareware and packages divided by category, including games, Internet, education, programs and utilities, Windows 95, and editor's picks. This is a useful site to find some of the best software and reviews.

Jumbo  (http://www.jumbo.com/)  Includes many programs in the areas of business, home, programming, utilities, graphics, and more.

Web Introductory Check

As a Web developer, you should have a basic understanding of the origins of the Web in hypertext and hypermedia thought, as well as an excellent understanding of the Web's present components, structure, and place within the larger context of communications in cyberspace.