Chapter 6

Web Analysis


CONTENTS


If you have just planned a web, a big question that should be in your mind is, "Will the web accomplish its purpose?" Even when a web already is deployed and operating, you frequently should investigate whether the web is accomplishing its planned objectives. The web analysis techniques presented in this chapter are intended to help you check web elements in a planned or operating web. This analysis process covers the technical validation of a web's HTML implementation as well as analysis of the web's planned or existing content and design. This process also touches on usability and style issues. Because of the dynamic information environment in which a web operates, these ongoing efforts to evaluate web quality and usability may be the key to increasing the effectiveness of an organization's Web communication.

Web Analysis Processes

Figure 6.1 summarizes the overall goals and information required for web analysis.

Figure 6.1 : Web analysis people and processes.

The figure shows the key information needs of a web analyst for all the web's six elements: purpose and objective statements, audience and domain information, and specification and presentation. The overall goals of a web analyst follow:

Check to make sure that the web works

Rhetorically  Is the web accomplishing its stated purpose for its intended audience?

Technically  Is the web functionally operational, and is its implementation consistent with current HTML specifications?

Semantically  Is the web's information content correct, relevant, and complete?

Make recommendations to the other web-development processes:

Advise on new web planning, including administrative and information policy (see Chapter 5, "Web Planning").
Give input to web designers on user problems or redesign ideas (see Chapter 7, "Web Design").
Recommend maintenance to web implementers (see Chapter 8, "Web Implementation").
Give reports to web promoters (see Chapter 9, "Web Promotion") about user experience with the web.
Collaborate with web innovators (see Chapter 10, "Web Innovation") by providing insight for improving the web's content or operation.

The web analyst thus acts as a reviewer, evaluator, and auditor for the web-development process. When practical, therefore, the web analyst should be as independent as possible from the duties of web implementation, design, and planning.

Web Analysis Principles

Based on the characteristics and qualities of the Web as described in Chapter 4, "Web Development Principles and Methodology Overview," web analysis should pay close attention to evaluating how the web is consistent with the following principles:

Strive for continuous, global service.  Because a characteristic of an operating public web is that it is available worldwide, 24 hours a day, an analysis of its content and operation must take into account a multinational, multicultural audience and its needs for continuous access.

Verify links for meaning as well as technical operation.  As networked hypermedia, a web extends and augments its meaning through internal and external links. External links tightly bind a web within larger contexts of communication, culture, and social practice that extend beyond an organization's outlook. A rhetorical and semantic analysis of links in a web therefore must look at how links contribute to a web's meaning. Technical analysis of links must ensure their operation and availability to the degree possible.

Ensure porousness.  A web that contains more than one page offers multiple entry points for its users. An analysis of the usefulness of a web must examine how each of these multiple pathways offers a user the right amount and level of information to use the web well. A close analysis of a web's design should reveal multiple strategies for addressing porousness.

Work with dynamism.  A web operates in an environment in continual flux in terms of meaning and technologies. Not only are new webs introduced all the time that try to accomplish the same purpose and/or reach the same audience of a given web, but methods for implementing and experiencing webs continually are introduced and upgraded. An analyst needs to keep abreast of the state of the Web's information and technical environment in order to evaluate a web's effective operation.

Stay competitive as well as cooperative.  Because of the Web's dynamic nature, an analyst as well as the web's innovator must work to know the competitive webs that vie for their audience's attention. Opportunities also exist for competitor webs to combine, using the features of linked hypertext, to better serve the audiences.

In summary, a web analyst is concerned with principles for the technical and rhetorical integrity of a web. The goal is to create a web that works with the characteristics and qualities of networked hypermedia to best accomplish the web's purpose for its audience.

Information Analysis

A web analyst can evaluate many of the web's technical and rhetorical aspects by analyzing the web's elements (audience information, purpose and objective statements, domain information, web specification, and web presentation) and performance (information about how users have used or are expected to use the web). This information analysis process also involves gathering information about other competitor webs that may be accomplishing a similar purpose or reaching a similar audience. When performed with the other people involved in web development processes, web information analysis serves as a check of the web's overall quality and effectiveness. Web information analysis seeks to uncover the answers to the following general questions:

Is the web accomplishing its stated purpose and meeting its planned objectives?
Is the web operating efficiently?
Are the intended benefits/outcomes being produced?

Although a definitive answer to these questions might be impossible to obtain at all times, web analysis can serve as a check on the other development processes. This section looks at information analysis checkpoints that can be examined during a web's planning or after it is implemented. This analysis process involves gathering information about a web's elements and comparing it to feedback from users and to server statistics.

Figure 6.2 shows an overview of information useful in analysis. In the figure, the web's elements are in rectangles, and supporting or derived information is in ovals. Key checkpoints for analysis are shown in small circles, labeled A through F. At each checkpoint, the web analyst compares information about the elements or information derived from the web elements to see whether the web is working or will work effectively.

Figure 6.2 : Web analysis information checkpoints.

The information about the web elements and derived information varies in completeness depending on how far the developers are into actually implementing the web. A web analyst can obtain information about the web elements from the results of the planning, design, implementation, or development process. If the developers have just started the planning process, web analysts can analyze the checkpoints for which they have information. A web analyst can obtain the derived information through examining web statistics. Ideally, a web analyst will be able to observe representatives from the intended audience as they use the web. If web analysts don't have a working web ready, these audience representatives may give feedback on a mock-up of the web, its purpose statement, or a diagram of its preliminary design.

The key to the analysis process is that it is meant to check the overall integrity of the web. Results from the analysis process are used in other processes to improve the web's performance. If analysis of the web's domain information shows that it is often out of date, for example, the planning process needs to be changed to decrease the time between updating the domain information. The analysis process on the web's elements helps all processes of web weaving work correctly and efficiently. The following sections go through each of the analysis checkpoints shown in Figure 6.2.

Does the Audience Exist on the Web for the Given Purpose? (Checkpoint A)

Before spending too much time in the planning process defining and describing a target audience, web analysts first should check to see whether this audience can use the web at all. Although the interests of all the people who use the WWW are increasingly growing diverse, a routine check of the Web's demographics or contents might tell web analysts something about the size of the audience they want to reach. Up-to-date, accurate demographics of Web users are difficult to obtain (mainly because getting this information is a complicated task). Moreover, even an up-to-date demographic profile of current users might not say anything about the massive number of people who are beginning to use the Web. Therefore, comparing a description of the target audience with any demographic statistics should be done with caution, and it gives web analysts only a rough feel about whether the audience sought is out there. The Graphics, Visualization, and Usability Center at Georgia Tech (http://www.cc.gatech.edu/gvu/user_surveys/) has compiled a good collection of demographic statistics on Web users.

Without demographic statistics, the other way to see whether the audience is on the Web (or the Net) is to check for subject-oriented information resources and forums that are of interest to the audience. If the target audience consists of botanists, for example, what on-line information already exists that shows botanists as active on the Web and the Net? A web analyst can find out by

Searching subject-oriented trees for resource collections related to botany
Locating institutions-academic, commercial, or research-that are involved with botany
Checking Usenet newsgroups and FAQ archives to see what botanists are active on the Net
Checking to see whether there is an on-line mailing list devoted to botany
Checking to see whether professional societies or publications in the field of botany offer an on-line forum or information service

Web analysts can interpret the results of the check of demographic statistics or Net resources related to the subject in two ways. First, if they find nothing, it might mean that the audience has made no forays into the Net-no newsgroups, no mailing lists, and no on-line collections of resources at major institutions. Based on this, web analysts could decide that the web would fill a great need for this audience. In contrast, they might conclude that this particular audience is not interested in on-line communication at all.

To decide which of these two alternatives is more accurate, web analysts should consult representative audience members. Analysts can check with people in the field and ask them, "What if you had an on-line system for information and communication?" Because on-line electronic mail discussion lists have been around longer than many network communications forums, an on-line mailing list that the target audience uses can be a good source of information about that audience's interests. Another aspect of this analysis of audience information is to make sure that the purpose for the web is one that meets the audience's patterns of communication, or at least the patterns in which the audience is willing to engage.

Web analysts might find that certain audiences are not willing to have a publicly available forum for discussion and information because of the nature of their subject matter, for example. Computer security systems administrators might not want to make detailed knowledge of their security techniques or discussions publicly available on a web server.

Certainly, private businesses or people involved in proprietary information might not want to support a web server to share everything they know. These same people might be interested in sharing information for other purposes, however. Computer security administrators might want to support a site that gives users advice about how to increase data security on computer systems. Thus, the web's purpose statement must match the audience's (or information provider's) preferred restrictions on the information. Current technology can support password protection or restricted access to Web information so that specific needs for access can be met.

Through a check of the audience, purpose, and communication patterns for that audience, web analysts quickly can detect logical problems that might make a web's success impossible. If the web's purpose is to teach new users about the Web, for example, web analysts might have a problem if the audience definition includes only new users. How can new users access the web in the first place? In this case, the audience should be redefined to include web trainers as well as the new users they are helping. This more accurate audience statement reflects the dual purpose of such a training web: getting the attention, approval, understanding, and cooperation of trainers as well as meeting the needs of the new users. If web analysts have an accurate audience statement, all the other processes in web weaving, such as design and development, can work more efficiently because they take the right audience into account.

Is the Purpose Already Accomplished Elsewhere on the Web? (Checkpoint B)

Just as web analysts don't want to reach an audience that doesn't exist or target an audience for a purpose they don't want to achieve, they also don't want to duplicate what is being done successfully by another web. Checkpoint B is the "web literature search" part of the analysis: "Is some other web doing the same thing as what the web analyst wants to do? What webs out there are doing close to the same thing?" These questions should be asked at the start of web development as well as continuously during the web's use. New webs and information are developed all the time, and someone else might develop a web to accomplish the same purpose for the same audience.

To find out whether someone has built a web for a specific audience and purpose, use the subject and keyword-oriented searching methods. Web analysts also might try surfing for a web like this or for information related to the audience and purpose. During this process, save these links; if they are relevant to the audience and purpose, they can become part of the domain information on which the web's developers and users can draw.

The other benefit of this web literature search is that web analysts can find webs that might be accomplishing the same purpose for a different audience. These webs might give web analysts ideas about the kinds of information they can provide for the audience. Also, they might find webs that reach the same audience but for a different purpose. These webs can give useful background or related information that web analysts can include as links in the web. If they find a web that reaches the same audience for the same purpose, they can consider collaborating with the developers to further improve the information.

Do the Purpose, Objective, and Specification Work Together? (Checkpoint C)

One of the most important elements for the integrity of the web is the purpose, objective, and specification triad. These three elements spell out why the web exists and what it offers. The purpose statement serves as the major piece of information the potential audience will read to determine whether they should use the web. If the purpose statement is inaccurate, the audience might not use the web when they could have benefited from it, or they might try to use the web for a goal they won't be able to accomplish.

The check of the purpose-objective-specification triad is to make sure that something wasn't lost in the translation from the purpose (an overall statement of why the web exists) to the objective statement (a more specific statement of what the web will do) to the web specification (a detailed enumeration of the information on the web and constraints on its presentation).

During the development of the specifications, the analyst might find that a piece of information was added that has no relation to the stated purpose. Or some aspects of the stated purpose might not be reflected in the specification at all.

One way to do this check is to make a diagram that traces the links from the purpose statement to the objective statement to the specifications-both top-down and bottom-up. Figure 6.3, for example, shows how a purpose can be matched to specific objectives. Each objective gives rise to specifications for the web. From the bottom up, every specification should be traced to an objective, and each objective should be traced to some aspect of the purpose. The diagram shown in Figure 6.3 is incomplete because the specifications would include a list of all URLs used in the web, as well as a more complete specification of the database. Figure 6.3 shows just the categories for this specification information. When filled out completely, however, every URL and component of the specification should be traced back to an objective, and each
objective should be traced back to the purpose statement. If there is a mismatch, more planning must be done to restate the purpose, objectives, or specification so that they all match.

Figure 6.3 : The web's purpose, objective, and specification must work together to accomplish the same aim.

Is the Domain Information Accurate? (Checkpoint D)

The quality of the domain information affects the users' perceptions of the web's overall quality. Inaccurate or incomplete information hinders web developers and leads to dissatisfaction by the web's users. The domain information must be checked to make sure that it is accurate, updated, and complete. Periodic checks can be made according to the nature of the domain.

Recall from the definition given in Chapter 4 that there are two kinds of domain information: the information that the web developers need to understand enough to plan, analyze, design, implement, and develop the web; and the domain information that the web provides to its users. Remember also that domain information of the first type does not need to be located on the Net at all; it might include textbooks or courses the web developers use as a means of getting up to speed in the area of knowledge the web covers. This kind of domain information also can serve as reference information throughout the course of web weaving.

Verifying the accuracy, currency, and completeness of the domain information is a difficult task because the web analyst must have adequate knowledge of the subject matter to make a judgment about the veracity of all domain information. Although the verification of off-Net resources, such as books and courses, can be evaluated according to the same judgment the analyst uses for similar off-line materials, the Net information included in the first type of domain information and all the second type of domain information can be checked through a process of Net access and retrieval.

The process for checking Net-accessible domain information follows. For domain information provided to developers but not users of the web (the first type of domain information, which is Net-accessible), check the web page provided to developers in the same manner as described in the following paragraphs.

Verify the freshness of links.  If the web is operational, use the links provided in the web itself to ensure that the links are not stale and that the resource has not moved. (The section "Implementation Analysis," later in this chapter, discusses checking links in more detail.)

Check the accuracy of the information.  If the web purports to respond with the correct solution to a problem given a set of inputs (for example, a physics problem answer through a forms interface), have a set of conditions that lead to a known result. Test the web to verify that it yields the same answer, and vary the test cases the web analyst uses.

Use reliable and authoritative sources.  Use these sources, when available, to verify the new information added in the web since the last analysis. If necessary, contact the developer of that information and discuss his or her opinions of the information's accuracy.

In the case of databases, make sure that they are as current as they possibly can be.  This is crucial, for example, if the web serves out time-dependent data, such as earthquake reports. If the web analyst is not getting a direct feed from an information provider who supplies the most current information, check to make sure that the most current reports or data have been downloaded to the database that the web analyst uses in the web.

Compare all specifications to items in the database.  Are there any specifications calling for information that currently is missing?

Check locations on the Net.  Use the methods of navigation described in Part III, "Web Implementation and Tools," to locate more current or reliable domain information.

Check locations on the Net to find other domain information that might be helpful as background to developers.  Also look for information that could be part of the objective statement of the web.

Is the information at the right level of detail?  Are the web weavers getting the right level of information for their work? Are the web's users given the right amount of information, or is there an information overkill or an oversimplicity in what is offered?

Is any of the information not appropriate for the users or the Web community at large?  Is any of the information unethical, illegal, obscene, or otherwise inappropriate? Check links to outside information to verify that users will not encounter inappropriate material. Clearly, for outside sources of information, web analysts will be limited in the ability to control inappropriate information. Include this check in the analysis process to make decisions about what outside links the web analyst wants to use.

Is the Web Presentation Yielding Results Consistent with the Web's Design and Purpose? (Checkpoint E)

The goal of this checkpoint is to determine whether the web, based on server statistics or feedback from users, is being accessed consistently with how the web analyst wants it to be used. One part of checking this consistency is to find out whether the web server's access statistics show any unusual patterns. A web server administrator should be able to provide the web analyst with a listing of the web's files and how many times they have been accessed over a given period of time. Although this file-access count is a simple measure of web usage, using it might reveal some interesting access patterns. A check of the web's files, for example, might show the following access pattern over the past 30 days:

FileNumber of Accesses
top.html 10
about.html 9
overview.html 5
comic.html 5800
resources.html 200
people.html 20
newsletter.html 8

This shows a fairly uneven distribution of accesses in which a single file is accessed many times (the 5800 shown for comic.html). Compared to the small number of accesses to a "front door" (top.html) of the web, this pattern shows a problem unless this imbalance was intended. Also, the statistics show that the newsletter isn't being read very much, whereas the resources are being accessed quite a bit. In order to interpret the web's access statistics, the analyst should ask the following questions:

Another aspect of verifying the web's consistency of design and purpose is to see that it is listed and used in appropriate subject indexes related to the subject of the web. Does the web analyst find links to the web on home pages of people working in the field? Is the general reputation of the web good? A web analyst can find answers to these questions by doing web spider searches to find what pages on the web reference the pages. Check major subject trees to see whether the web is represented in the appropriate categories. Much of this analysis of the web's reputation is useful in the development and process described in Chapter 10.

Do the Audience Needs, Objectives, and Results of Web Use Correspond to Each Other? (Checkpoint F)

It is very important that web analysts determine whether the audience's needs are being met by the web. To do this, they must compare the audience information (the audience's needs and interests) with the objective statement and the intended and actual benefits and results from the web. Information about the actual benefits and results of the web's use is the most difficult to come by. Web analysts can use several methods, however, to get a view of the effects of the web:

Ask users.  Design and distribute a survey. This could be done using the forms feature of HTML if web analysts are willing to use features not found on all web browsers. They could distribute the survey by e-mail to a random sample of users (if such a sample can be constructed from a listing of registered users or derived from web-access logs). Include in this survey questions about user satisfaction. Are the users satisfied that the web meets their needs? What else would the users like to see on the web? How much do users feel they need each of the features the web offers?

Survey the field.  Is the web used as a standard reference resource in the field of study? This is similar to the analysis performed at checkpoint E, but instead of just focusing on the occurrence of links in indexes and other web pages, web analysts need to analyze the web's reputation in the field of study or business as a whole. Do practitioners generally recommend the web as a good source of information?

Are the web analysts accomplishing the purpose?  Are outcomes occurring that the web analysts specifically stated in the purpose? If one phrase of the purpose is to "foster research in the field," for example, is there any evidence to support this? Is there research published that was sparked by the interactions the web fostered? If the web analysts have a commercial web, how many sales can they say the web generated? Determine some measure of the purpose's success and apply it during the analysis process.

Web User Tracking and Analysis Software
Software to track and analyze users' experience of a web is available commercially. Internet Profiles Corporation (http://www.ipro.com/) offers products to help track use of a web. Check for more vendors for Internet-related tracking software at
http://www.yahoo.com/Business/Corporations/Computers/Software/Internet/

Another way to look at checkpoint F is to ask the broader question, "Is the web doing some good?" Even though the web might be under development and its objectives still have not truly been met, is there at least some redeeming value of the web? What benefits is it offering to the specific audience or even to the general public? A commercial site that also provides some valuable domain information, for example, is performing a public service by providing education about that topic.

Another approach is to conduct research using theory and methods from the fields, such as Computer-Mediated Communication (http://www.december.com/cmc/study/center.html), Computer-Supported Cooperative Work, Human-Computer Interaction (http://www.cs.bgsu.edu/HCI/), or other disciplines that can shed light on the dynamics of networked communication. These fields might yield theories the web analysts can use to form testable hypotheses about how the web is working to meet users' needs, to foster communication, or to effectively convey information.

The key to checkpoint F is to make sure that the other checkpoints-A through E-are working together to produce the desired results. A web analyst will notice that checkpoints A through E in Figure 6.2 each touch on groups of the web's elements. Only checkpoint F spans the big-picture questions: Are the people who use the web (audience information) getting what they need (purpose, objective, benefits/results) from it?

Design and Performance Analysis

Not only should the information in a web be analyzed for its rhetorical and technical integrity, but the overall design of a web also should be evaluated for how well it works as a user interface and for its intended purpose and audience.

This analysis step draws heavily from the "Design Problems" section of the next chapter by asking questions about the web's operation.

Performance

One of the most important impressions a web gives to users is how much it costs them to retrieve the information in it. One aspect of user cost related to the technical composition of a web is retrieval time. Many inline images and extremely large pages can cause long retrieval times. Performance for users varies widely, based on the browsers they use, the type of Internet connections they have, and the amount of traffic on the network and the Web server.

Analysis can be done, however, in general terms, to get some ideas of retrieval times. Here is a possible (not necessarily definitive) checklist for web-performance analysis:

Retrieval time  The analyst can retrieve the pages of the web using a browser and time how long it takes to download them. If the analyst retrieves the web pages from a local server (that is, a server on the same local network as the analyst's browser), these retrieval times, of course, will be less than what a typical user would encounter. Therefore, it might help if an analyst has an account or a browser available that is typical of most users-perhaps an outside account on a commercial service or at a remote site. This remote browser account then can be used to time the retrieval of the web pages. The analyst can report the retrieval times to the web designers. In many cases, it might be difficult to determine exactly what is "too long" for retrieval times. An analyst can look for pages that are very long and pages that contain a great deal of inline images, however, and evaluate whether the download costs of these pages are appropriate for the web's audience and purpose.

Readability   This is a simple test to see whether the user can read the text on the pages of the web. With the advent of background images, developers often create textured and colored backgrounds that make reading unpleasant and sometimes nearly impossible.

Figure 6.4 shows a background texture obscuring words. Other problems include extreme font-size variation and blinking text.

Figure 6.4 : A textured background can make it impossible to read text.

Rendering  The analyst should test the web in various browsers just to make sure that the information is available to users. This rendering check should be done to the level specified during the planning stages. If essential information is available in text, the analyst can use text-only browsers to make sure that information (including information in image ALT fields) is set to guide users without graphics.

Aesthetics

Aesthetics, which are a subjective impression of the pleasing quality of a web, are difficult to test. Some guidelines, however, can help an analyst evaluate the aesthetics of a web:

Does the web exhibit a coherent, balanced design that helps the user focus on its content?  One design problem associated with a lack of aesthetic focus is the clown pants design method: The web consists of pages containing patches of information haphazardly organized. A related (poor) design technique is the K00L page design method; The web designer apparently attempts to use every HTML extension possible-including blinking text, centered text, multiple font sizes, and blaring, gaudy colors. Both the K00L page and clown pants design problems are discussed in detail in the next chapter. An analyst should try to identify page designs that fall outside the purpose of the web or the audience's needs.

Do the web's pages exhibit repeated patterns and cues for consistency, with variation in these patterns for expressiveness?  Repetition with expressive variation is a design principle used in many areas, such as graphic design, architecture, painting, textile design, and poetry. Which graphic elements are repeated on many pages for consistency? What content is varied to convey informational or expressive content?

How is color used?  Color can be used effectively to code information or to focus user attention. Randomly used color can confuse the user, and some users have impaired perception of color. Complementary colors used on top of each other often give a jarring, shimmering effect.

Usability

Analysts can test a web for usability in a variety of ways. The quick ways of usability testing can give inexpensive, rough ideas of how well the web is working. More elaborate methods of usability testing can involve controlled experiments that might be prohibitively expensive. Here's a checklist to analyze the usability of a web, starting with the quick, simple, and inexpensive methods:

Perform a simple web walkthrough.  With the web's purpose and audience definition in mind, analysts can perform a simple check of the pages, looking to see whether the major objectives are met.

Check sample user tasks.  Based on the purpose statement and audience information for the web, analysts should be able to devise a set of tasks that the user is expected to accomplish. They then can use the web to accomplish these tasks, noting any problems along the way.

Test tasks on representative users.  Based on the list defined in the preceding check, analysts can find several representative users and observe them as they complete the tasks. They might ask the users to say aloud what they are thinking when trying accomplish the tasks. They might record this narrative, gather recordings from several audience members, and then analyze the transcripts. This might help not only in web analysis, but also in redesign ideas.

Perform field testing with actual users.  This method attempts to get a true sense of how the web actually is used. Analysts need to be able to select random users of the web and observe them in the settings in which they use the web. The users of a web might not be located in a single geographic area, so, obviously, this type of testing can be very difficult and expensive. Alternatively, extensive interviews of actual users or focus groups of users might give better insight into how the web is being used.

Semantics

Semantics refers to the meaning conveyed by the pages of the web. Through many of the information-analysis steps outlined previously, the analyst would have addressed many aspects of how the web conveys meaning. But a separate check of the web, focusing only on semantics, might reveal problems not detected in other ways:

Check for false navigational cues.  Some designers put arrows on pages, indicating "go back to home" or "go back" to some other location on a web. Due to the web's porous quality, these arrows might make no sense for users encountering them. In general, Back or Forward arrows in hypertext don't make much sense. Linear relationships among pages is rare. Instead of arrows and the word back, cues on pages should indicate the destinations to which they refer.

Check for context cues.  Some designers create pages with no context cues at all. These pages are simple "slabs" of text, perhaps without even any links to cue the users as to how the page's information fits into a large system of information or knowledge. (See The Page from Outer Space design problem in the next chapter.)

Check graphical/symbolic meanings.  If the web uses graphics or icons, an analyst should consider whether the symbols or icons used are standard or can be misinterpreted by members of other cultures or even by the users.

Implementation Analysis

Besides analyzing a web's information and design, web analysts also should take a look at a web's implementation. The HTML that comprises a web should be correct, and, to the extent possible, the links that lead out of a web should not be stale or broken. Validating that a web conforms to current HTML specifications is key to making sure that a web is usable by many different browsers.

This analysis of implementation is not content analysis. These tools can help improve the quality of the HTML code, but not the meaning of what that code conveys. Analysts should be careful not to focus entirely on the technical validation of a web. This is analogous to focusing entirely on spelling and grammar as the single most important factor in quality writing. As a result of problems in internal or external links, web analysts should inform the web implementer.

Directory, File, and URL-Naming Checks

Because you will use the URL of your web in a variety of contexts, you should check to see whether the directory structure and naming conventions used are simple, consistent, and extendible.

First, if you are analyzing a planned web, what will its URL be? In the early days of the Web, many companies' webs were "hosted" on the sites of Web presence providers. This led to situations in which URLs for a company (for example, evergreen) included a reference to their Web presence provider (for example, globalweb.com), leading to a URL such as http://www.globalweb.com/evergreen/. This URL doesn't clearly convey the ownership or brand of the web. Instead, if you are preparing a web for a company or major brand, consider getting a domain name.

Next, take a look at the planned structure of the directories on the web. Check to see whether the resulting path names make sense, are as simple as possible, and yet allow for growth in the directory tree. One common error is to place all files at a site at the highest level, leaving no room for organizing the files into a structure for easier maintainability and usability.

At the highest level, the URL identifying your server only, such as http://www.example.com/, would be the identifier you most commonly will use in advertising and promotion, particularly in non-Web media. This page therefore should load quickly and contain information to guide users efficiently to the information content of the site.

A Quick Way Users Can Access Your Web
Using a Netscape browser, users quickly can access a web site of the form http://www.example.com/ by just entering the word example in their browser's open Location dialog box. By default, the browser pre-pends the http://www. and appends the .com to the request. Users also can just enter an http URL without the http:// prefix. Note that this shortcut won't work when writing out a URL in HTML.

For other files at your web site, the directory structure and the file and directory names should identify the resource named by the URL. When I created a directory structure for my on-line periodical, CMC Magazine at http://www.december.com/cmc/mag/, I collected files about editorial policies into a single directory called editorial. This led to URLs to these files, such as the following:

http://www.december.com/cmc/mag/editorial/style.html
http://www.december.com/cmc/mag/editorial/plan.html
http://www.december.com/cmc/mag/editorial/identity.html

These URLs are quite specialized, so I wouldn't expect to list them in a print advertisement. Therefore, their length is not as important as the meaning they convey. The benefit of the directory structure is that the URL can be read as a phrase. The URL http://www.december.com/cmc/mag/editorial/plan.html, for example, is for the CMC Magazine editorial plan.

Avoid redundancy in directory or file naming. For example, the URL to the home page of the following site doesn't need to be so complicated:

http://www.example.com/html/home/examplehome.html

There's often little reason to create a directory for files of a special format (html), to use names like home, or to repeat the site name in a URL. A cleaner solution is http://www.example.com/index.html as the home page of the site. The file index.html is treated as the default page by most Web server software, so you even can leave off the index.html when providing publicity about your site.

Avoid mixed case in your directory names. A convention that provides directory names in initial uppercase and file names in all lowercase letters is a good one, but more often than not, it can lead to confusion. For example,

http://www.example.com/Projects/STAR/Docs/index.html

conveys a good structure for the documents of the STAR project, but its mix of upper- and lowercase might make it cumbersome to reference elsewhere. The mix of upper- and lowercase does convey meaning, but it is a redundant meaning when encoded into a URL; clearly, Projects is a directory because it has a subdirectory and index.html is a file because it is in the last position of the URL. The STAR project is clearly an acronym. The URL

http://www.example.com/projects/star/docs/index.html

enables the user to concentrate on the logical organization of the files on the server rather than the syntax of this organization.

A site that displays a good example of a solution to directory structure and file naming is the United States Senate site (http://www.senate.gov). This site uses a very logical approach to organization. One possible critique of this site is that it doesn't show a sensitivity to providing stability of the directory structure over time. Current senators have a particular syntax for their Web page. Senator Edward M. Kennedy's Web page is at http://www.senate.gov/~kennedy/, for example. Former senators have a different syntax. Henry Clay's Web page is at http://www.senate.gov/history/clay.htm, for example. So if someone writes a hypertext report today about Senator Edward M. Kennedy, he or she can link to his home page, but that page might go away when he leaves the Senate. Instead, the site should be structured so that references to information about Senators can be made consistently, regardless of their current membership status in the Senate. (Of course, Henry Clay's staff can be excused from this critique, because they had little knowledge of the World Wide Web during Clay's time in the Senate.) When analyzing any site, think of how the URLs might be affected by the passage of time and design the structure to ensure that references to it can remain as stable as possible.

Look for ways to make the directory structure of your site meaningful and stable, but as simple and extendible as possible. Specific techniques to do this are covered in Chapter 8.

HTML Validation (Internal Links)

The first step in implementation is to check to make sure that the HTML implementing the web is correct. Several on-line validation services are available that can help a web analyst in this task. For a good discussion of validation, see http://www.earth.com/bad-style/why-validate.html. For a list of the current validation checkers available, see http://www.yahoo.com/Computers/World_Wide_Web/HTML/Validation_Checkers/.

The WebTechs Validation Service (http://www.webtechs.com/html-val-svc/) has developed into the standard HTML syntax checker. This is a fussy tool and probably best for advanced users. People interested in quick, informal checks can check out the WWWeblint Service (http://www.unipress.com/cgi-bin/WWWeblint) sponsored by Unipress. This checker is based on Neil Bowers' weblint program (http://www.khoros.unm.edu/staff/neilb/weblint.html) and is a quick-and-dirty way to check an HTML file and get easy-to-understand output as a result. Although this tool is very useful for a quick syntax check, it does have its limitations, because a user can't set the levels of HTML compliance as part of the service.

For example, the following HTML source code has multiple errors:

<HTML>
<!-- Author: M.U. Langdon (mul@xyz.com) -->
<!-- Dept: Corporate Communications -->
<!-- Date: 22 May 95 -->
<!-- Purpose: overview of XYZ products -->
<HEAD>
<TITLE>Overview of XYZ Industries Product Line</TITLE>
<LINK REV="made" HREF="mailto:cc@xyz.com">
</HEAD>
<BODY Background="../images/xyz-back.gif">
<IMG SRC="../images/xyz-logo.gif"> XYZ Industries
<HR>
<H1>The XYZ Industries Product Line</H1>
Founded in July 1994, XYZ Industries has
rapidly become a world leader in
HyperWidget and Odd-Bearing Machine (OBM) technologies.
<P>
The XYZ products currently available for sale and delivery are:
<OL>
<LI>OBM 411, 412, 413, and 440
<LI>HyperWidget 2000, 2000A, 2000A-XL, and 2000A-XL-G
<LI>Alpha, beta, gamma, and delta class HyperWidgets for VR applications
</UL>
(f)
<ADDRESS> <A HREF="http://www.xyz.com/units/cc.html>Corporate
Communications</A>
(<A HREF="mail:cc@xyz.com">cc@xyz.com</A>)</ADDRESS>
</PRE>

The weblint warning messages quickly found the errors:

Weblint Warning Messages
line 13: IMG does not have ALT text defined.
line 29: unmatched </UL> (no matching <UL> seen).
line 31: unknown element (f).
line 33: odd number of quotes in element <A HREF="http://www.xyz.com/uni
ts/cc.html>.
line 34: unknown element </ADDRESS>.
line 0: No closing </HTML> seen for <HTML> on line 1.
line 0: No closing </BODY> seen for <BODY> on line 11.
line 0: No closing </OL> seen for <OL> on line 25.
line 0: No closing </ADDRESS> seen for <ADDRESS> on line 33.

After analyzing the preceding error-filled HTML code, this validation service reported the following:

Errors
sgmls: SGML error at -, line 14 at ">":
Out-of-context IMG start-tag ended HTML document
element (and parse)

The user would need to correct this line in order to continue checking the rest of the file, because the parse of the document ended with this error.

htmlchek (http://uts.cc.utexas.edu/~churchh/htmlchek.html) is an in-depth validation package that must be set up and has many options.

The Arena Browser (http://www.w3.org/hypertext/WWW/Arena/) was designed to report on bad HTML when it displays it by providing (somewhat cryptic) identification of errors.

The Doctor Is In!
Check out the Doctor HTML site at http://imagiware.com/RxHTML/ for help on a variety of tests for your web pages. Tests include spelling, syntax, image analysis, table structure, and hyperlinks. The interface is easy to use and the output reports are easy to interpret.

Link Validation (Internal and External Links)

Another aspect of checking a web's links is to examine the links out of a document. This requires network information retrieval to verify that these external links are not stale or broken. Several services are available in this area.

The MOMspider (Multi-Owner Maintenance spider) (http://www.ics.uci.edu/WebSoft/MOMspider/) was developed by Roy T. Fielding. This software is written in Perl and allows users to check for links that do not resolve to a resource (at the time of the check).

A Suite of HTML Validation Suites
Check out the Chicago Computer Society's Suite of HTML Validation Suites (http://www.ccs.org/validate/) for a quick interface to the most popular HTML validation tools. This forms interface allows you to set some of the options and quickly find the validation tool that is right for your needs.

Questions Every Web Analyst Should Ask About a Web

The sections so far in this chapter have approached web analysis from a very formal set of checklist items intended to exhaustively analyze the integrity of any web. In looking at many web sites, I've also come up with an informal list for a web critique. These questions approach some of the most common problems I often see. In special cases, there might be a very good reason why a web designer or implementer has used a technique or effect mentioned here, so all these questions should be taken in the spirit that they might have a reasonable affirmative answer-but that answer had better be good.

  1. Why have big graphics? What is the point of costing your users time and money by requiring them to download a large graphics image? In particular, why would you make them do this without their prior consent and choice?
  2. Why use graphics for words? If you do have graphics on a page, why would you use graphics to display words? The HTML FONT element allows you to set the size and color of text (for capable browsers). Certainly, you would want to use graphics for the special font used in a logo, brand name, or icon, but why use graphics to display words when it is not necessary? Moreover, even if you do display words with graphics, why not use the Alt attribute to allow users with nongraphics browsers to see the same thing?
  3. Why have a different logo on each page? Why not reuse your company or product logo from page to page so that users who have enabled the memory cache on their browsers won't have to download the logo from each page? Why have so many variations of your company logo or brand name throughout the web? (In one web, I counted more than five variations on the company logo, yet each variation served the same purpose.)
  4. Why have a different background on each page? The use of the Background attribute of the HTML BODY element can allow you to create a unique look and feel for your web. Why would you want to change this background for every page of your web? This causes users to have to download each new background, and each new background gives a whole new visual cue to the user. You lose the benefit of having a common look and feel for all your web pages.
  5. Why use old analogies? Why structure your web so that it looks like TV screens or uses "back" and "forward" analogies for movement (like a VCR or a slide show)? Hypertext allows you to use associative linking so that users can have the choice to access information at the time they want it rather than in a predefined sequence. (One web I encountered used the VCR analogy to an extreme: It showed a picture of a VCR player and used that as the repeated icon on all its pages.)
  6. Why call your web a home page? Are you really creating just a single page to give information to your users? Isn't what you are creating a web of information that is associatively linked? If you are creating Web-based information for an institution or a corporation, you shouldn't be creating a home page. This term implies that you will create a single, gigantic file that contains massive graphics and too much text to see everything on one screen. The term home page generally refers to the default page that a browser displays when it accesses the web site. For example, CNN's home page is http://www.cnn.com/index.html, but the people at CNN do far more than just maintain that one file.
  7. Why do a "bait and switch" if you offer a text alternative to your web? If you provide a text-only alternative to your site, why would you provide any links to pages with graphics on your site after the user has chosen this alternative? You can provide alternatives to users of text-only browsers by using the Alt attribute of your HTML IMG elements. Users who want no graphics often turn off graphics loading. If you are not going to do it consistently, why provide a graphics-only alternative at all?
  8. Why not get to the point? Hypertext allows you to layer information, and no doubt you can't be psychic about what your users will need to see on any given page of your web. But why not get to the point of your site or the purpose of any given page right away? Why not place any asides or related comments in other pages that users can access if they need that information? Do you need the full text of your legal notice on the home page of your web, for example? Why not provide it in a separate file for people to access it if they want it?
  9. Why not focus on your unique contribution? If your site is about used automobiles, for example, why provide so many links to Internet information? Why not get right to the strength of your site and provide coverage of your subject area in depth? Why provide so much information not related to your area of business or expertise? (I've seen many sites where I couldn't figure out on what area of business or expertise a company was focusing.) If you are a major telecommunications company, why provide a soap opera on your web? What benefit does it give your customers?
  10. Why use all the buzzwords for no reason? Why provide market-speak to your users? Do they really care that your products and services are "interactive, intelligent solutions?" What are your projects? Can your users discern what their benefits are easily and quickly?
  11. Why be cheesy? Why use overly cute graphics and language that is dumbed down so much that it would insult an eight year old? If your site is meant for children, clearly identify that; if it is meant for adults, why not give them the impression that you expect them to be intelligent, busy people who are accessing your web to find useful information for their interests?
  12. Why not identify the location of your service or company when it is important? If you are providing a web for a restaurant or store that involves physical contact for the transaction, why not tell the user where that store or restaurant is, including the city, state, and possibly the country? Users of your web might come across the page showing the menu for this wonderful restaurant or the catalog for the fantastic store, but then be faced with the identification, "Find us on Main Street!" What city? What state? What country? Why not tell people where you are if you expect them to find you?

Sample Web Analysis

You can get a good idea of the kind of information a web analyst might gather by looking at the sample web analysis I've prepared for the Web Development web (http://www.december.com/web/develop.html). I continuously update and refine this analysis information. You can take a look at its current state on-line at http://www.december.com/web/develop/wdanalyze.html.

Web Information Analyst's Check

A web analyst examines a web's information, design, and implementation to determine its overall communication effectiveness. This process of analysis involves gathering information about the web's elements and performance and evaluating this information to see whether the web's purpose for its intended audience is being met. This analysis process involves the following: