Chapter 12 Basic HTML 3.2

An Overview of HTML
HTML Description (Levels 0 and 1)
HTML Tutorial
More HTML Features
Key HTML Information Sources
Basic HTML Check

HyperText Markup Language (HTML) is used for creating hypertext on the Web. Conceived as a semantic markup language to mark the logical structure of a document, HTML gives users a way to identify the structural parts of a document. Learning HTML involves finding out what tags are used to mark the parts of a document and how these tags are used in creating an HTML document.

This chapter presents the first two levels of HTML-levels 0 and 1, which just about all browsers can render. First, this chapter presents HTML's relationship to Standard Generalized Markup Language (SGML) in order to show how the separation of document components and processing lies behind the ideal of a markup language. Then, this chapter presents a summary of basic HTML elements and attributes. Next, these HTML elements are placed in the context of how implementers typically work with HTML. An introductory tutorial is included to implement a simple web page.

HTML Implementation Information

I've prepared a support web to help you with HTML syntax and reference information-The HTML Station (http://www.december.com/html/). There, you'll find information about all levels of HTML, as well as examples, tag summaries, and links to supporting and reference information.

HTML Lingo

Sometimes people refer to HTML elements as tags. A tag might be just part of an HTML element, however. Many HTML elements include both start tags and end tags. In most cases, if you call an HTML element a tag, people will understand what you mean. But realize that an HTML element can contain other elements between its start and end tags. I use the term element in these chapters to help reinforce this concept, which will become more apparent when I discuss the syntax of specific HTML elements.

The qualifiers of elements are called attributes. In general, an HTML element looks like this:

<ELEMENT Attribute1="value" Attribute2="value">Text or other elements </ELEMENT>

Special symbols in HTML used to render characters, such as the copyright symbol (©), are called entities. Appendix B, "HTML Language Reference," contains a chart of entities.

There's no formal convention for writing out elements, attributes, and their values. My personal convention is to put elements in all capital letters, attributes in initial capital letters, and attribute values all in lowercase, unless the case matters in the attribute value. This capitalization scheme is for my own reading use only-HTML elements and attribute names are not case sensitive.

An Overview of HTML

HTML was not originally intended to be a page-layout language; instead, it was to be a language used to mark the structural parts of a document-parts such as paragraphs, lists, headings, block quotations, and others. Based on the identification of these document parts, the programs that render HTML documents (Web browsers) display the HTML in a readable form. This organization allows for a separation of a document's structural specification in the HTML code from its formatted appearance in an HTML browser. In practice, there now are many language constructs you can use in HTML to control the appearance of a document.

HTML and SGML

The separation of document specification from document formatting relates to HTML's relationship to SGML. HTML is defined using SGML-an international standard (ISO 8879:1986, Information Processing-Text and Office Systems (SGML)) for text information processing. SGML itself is a metalanguage (a language to define languages). The goal of SGML is to help format information on-line for efficient electronic distribution, search, and retrieval in a way that is independent of the appearance details of the document. A document marked according to SGML has no indications of the representation of a document. Only when a presentation program merges the SGML document with style information is the physical layout and appearance of a document apparent. Figure 12.1 illustrates this basic idea of processing.

Figure 12.1 : The organization of SGML document processing.

The data of a document consists of the contents of a document, whether it is text or multimedia, and any information about the information itself (such as administrative or technical information about the document that would not be rendered in its final form). The tags in an SGML document identify the structure: the headings, subheadings, paragraphs, lists, and other components. Finally, the format of a document is its final appearance, after the merging of data, structure, and specifications for how the formatting should be done. Note how all these parts are separable; document data can be created without the author worrying about the structure, structure can be added without worrying about its formatting, and formatting specifications can be created to follow a "house style" or particularities for an organization. And, because all these parts are independent, if the house style of an organization changes, the developers just need to change the specification for the style information instead of all the data or structure of the documents.

Using SGML as a data-encoding standard also is beneficial because it's an international standard not tied to any one vendor. Because information in documents is marked in a standard way, information can be shared by other document-publishing systems (or possibly even automated web-generation programs). The tags in an SGML document make it easier to reuse written information. SGML tags also help in searching, because the tags in an SGML document help mark the meaning of information. Because SGML is a standard for document applications, SGML users can choose the best tools to manipulate documents in SGML.

Because SGML is a metalanguage, developers must specify rules for the structure of a document through a document type definition (DTD). A DTD specifies exactly what a document of a particular SGML language must look like. For a wealth of information on SGML, see SGML Open's web at http://www.sgmlopen.org/.

Using a markup language, authors can create documents without having to worry about the details of a document's appearance. Graphic artists can create a pleasing specification for appearance of documents that can be uniform and consistent for all documents in an organization. Therefore, the writing and production of documents can be expedited. An organization can have a store of reusable chunks of information that easily can be deployed in any publication. It's like having an "information store" expressed in terms of its structure, so that an "information displayer" can translate this store of information into any format.

The Philosophy of HTML

SGML is used to define HTML at all its levels. The DTDs for HTML can be found at http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html. HTML, then, follows the same philosophy of data, structure, and format independence of SGML. Users of HTML create files of marked text analogous to a computer programming language; authors write the information using a specific structure in order for the "computer" (in this case, a hypertext browser) to understand. Although HTML is not as complicated as some computer programming languages, writing HTML requires authors to follow specific rules to tag or mark the parts of the document. This marking sets HTML apart from free-form prose or text created in a word processor. In fact, the whole idea of marking up a text to express its structure comes from a very different approach than the what-you-see-is-what-you-get (WYSIWYG) word processors. In a WYSIWYG word processor, authors concentrate on a document's data, structure, and format all at once. For individual work, this might be very useful. For large systems of documents and information, markup languages are far more efficient.

Why worry so much about a document's structure when a WYSIWYG word processor can show-right away-what a document looks like? The answer lies in the relationship between HTML and all the possible hypertext browsers that might read it (see the lists of browsers in Chapter 3, "Options for Web Connections"). Using HTML, a developer carefully defines the structure of a document so that any present (or future) browser can read it and display it in a way that is best for that browser. This makes it possible to develop information in HTML without having to create a separate version of it for the Lynx browser, another for Cello, and still another for Netscape or Mosaic.

HTML itself is being formally defined by the HTML Working Group of the Internet Engineering Task Force (http://www.ietf.org/). Some browser manufacturers, however, support extensions to HTML that are not yet part of the HTML standards, and some of these extensions push HTML to become more of a layout than semantic markup language. This chapter covers levels 0 and 1 HTML-a core set of HTML constructs that should work for presenting information in any Web browser. The next chapter covers level 2 HTML and higher.

Tools are available that can support HTML document editing in a WYSIWYG manner (see Chapter 17, "Implementation Tools") and environments to support systems of documents (see Chapter 18, "Development and Language Environments"). A web implementer, however, should be familiar with "raw" HTML itself. The benefit of HTML is that it is created in plain ASCII text with no control characters or embedded binary codes, so that developers easily can look at or edit an HTML file in a simple text editor or e-mail it.

The Reality of HTML

The reality of HTML doesn't live up to the ideal of an open system of information dissemination and display. The proliferation of proprietary HTML elements that only certain brands of browsers recognize, combined with the wide variety of ways that different browsers on different computer platforms render text and graphics, has made a reliance on browser-independent rendering of content a pragmatic impossibility. Even following only "strict HTML" (without proprietary extensions), developers usually find that incompatibilities in graphics or an unsatisfactory rendering of certain effects occurs.

HTML Description (Levels 0 and 1)

An HTML document consists of text and tags used to convey the data of a document and to mark its structure. Listing 12.1 shows a sample HTML document.

Listing 12.1. A sample HTML document.

<HTML> <HEAD> <TITLE>Hello World Demonstration Document</TITLE> </HEAD> <BODY> <H1>Hello, World!</H1> This is a minimal "hello world" HTML document. It demonstrates the basic structure of an HTML file and anchors. <P> For more information, see the HTML Station at: <A Href="http://www.december.com/html/"> http://www.december.com/html/</A> <HR> <ADDRESS> <A Href="http://www.december.com/john/">John December</A> (john@december.com) / 04 May 1996 </ADDRESS> </BODY> </HTML>

The < and > symbols that, to a new user, might seem to dominate an HTML file, are the beginnings and endings of the tags that mark a document's structure. With an understanding of what these tags do, a developer quickly can learn that tags mark familiar structures: titles, headings, paragraphs, and lists. Once a developer knows the meaning of the tags, the meaning of the document's structure becomes clear, but the document's appearance in a browser can't be determined solely from the HTML file itself.

Elements

The < and > symbols in an HTML document are used to make tags to delimit elements. These elements identify the document's structure. In Listing 12.1, the title of the document, "Hello World Demonstration Document," is identified using the TITLE element, which is delimited by the start tag of <TITLE> and the end tag, </TITLE>.

Basics of Elements

The letters in the element tags are not case sensitive; a browser interprets the word TITLE in the tag <TITLE> the same, whether it's written as <Title>, <title>, or <tItLe>. (Note, however, that the character entities in a document are case sensitive.)

Some elements, such as the LINE BREAK element, can be delimited by just one tag (which is considered the start tag): <BR>. Elements such as the PARAGRAPH element, <P>, can be delimited by just a start tag, but may be delimited by an optional end tag, </P>. Other elements, such as the TITLE element described previously, must be delimited by both a start and end tag.

Some elements also have attributes. One example of an element with attributes is the IMG element, used to place an image in a document. The IMAGE element, IMG, uses the attribute Src to identify the file of the image to be included in the document. The attributes can occur in any order in the element. A sample attribute follows:

<IMG Src="http://host/dir/file.gif">

The attribute Src is set to its value http://host/dir/file.gif by the use of the = and the " marks. It generally is considered to be good marking style to put the quotation marks around every attribute value.

Types of Elements

Elements can be classified according to where they fit in an HTML document, such as in the head and body, or how they function-as comments and document structure elements or as graphics. The following discussion covers each of these types of elements for level 0 and level 1 HTML. Each of the elements described are level 0 except for the semantic and physical character formatting.

Structure, Comment, and Document-Type Declarations

The HTML element brackets the HTML elements of a file. Its start tag is <HTML> and its end tag is </HTML>. As shown in Listing 12.1, the HTML element encloses the entire HTML statements of a document and therefore is the HTML element containing all other elements and entities.

A comment can be placed in an HTML file. Start the comment with . The text between can consist of any characters, and the comment can cross several lines of text. A comment won't be visibly rendered in a browser's display of a document. Comments are useful for administrative control or comments about HTML documents.

A document-type declaration can be placed at the start of an HTML document that identifies it as a document conforming to a particular level of HTML, as this example shows:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML level 0//EN">

This indicates that the document conforms to the level 0 HTML.

HEAD Elements

The HEAD element is used to identify properties of the whole document, such as the title, links to indicate the relationship of one document to another, and the base URL of the document. These descriptive elements go inside the start tag for the HEAD element, <HEAD>, and the end tag, </HEAD>. This information is not displayed as part of the document itself, but is information about the document that is used by browsers in various ways. The elements in the head can be listed in any order.

The level 0 HEAD elements follow:

BASE Records the URL of the original version of a document when the source file is transported elsewhere. The BASE element has one attribute, Href, which is used to define the base URL of the document. Partial URLs in the document are resolved by using this base address as the start of the URL.

ISINDEX Marks the document as searchable. The server on which the document is located must have a search engine defined that supports this searching.

LINK Defines a relationship between the document and other objects or documents. A LINK element can indicate authorship or the tree structure of a document, for example. The LINK element has the same attributes as the ANCHOR (1) element.

LINK Attributes

Href Identifies the document or part of a document to which this link refers.

Methods Describes the HTTP methods the object referred to by the Href of the LINK element supports. One method is searching; a browser could use this Methods attribute to give information to the user about the document defined by the LINK element.

Name Names this link as a possible destination for another hypertext document.

Rel Describes the relationship defined by this link, according to the possible relationships as defined by the HTML Registration Authority's (http://www.w3.org/hypertext/WWW/MarkUp/) list of relationships (http://www.w3.org/hypertext/WWW/MarkUp/Relationships.html).

Rev Indicates the reverse relationship of Rel. The link with Rel="made", for example, specifies that the Href attribute indicates that the URL given in the Href is the author of the current document. Using the Rev="made" link indicates that the current document is the author of the URL given in the Href attribute.

Title This attribute is not to be used as a substitute for the Title attribute of the document itself, but as a title for the document given by the Href attribute of the LINK element. This attribute rarely is used or supported by browsers, but may have value for cross referencing the relationships that the LINK element defines.

Urn Indicates the uniform resource name of the document. The specification for URN and other addressing is still in development (see http://www.w3.org/hypertext/WWW/Addressing/Addressing.html).

META Identifies metainformation (information about information) in the document. This element is not meant to take the place of elements that already have a purpose-for example, the TITLE element-but to identify other information useful for parsing.

META Attributes

Content A metaname for the content associated with the given name (defined by the Name attribute) or the response defined in Http-equiv.

Http-equiv Connects this META element to a particular protocol response, which is generated by the HTTP server hosting the document.

Name Specifies a name for the information in the document-not the title of the document (which should be defined in the TITLE element) but a metaname classifying this information.

NEXTID Used by text-generated software in creating identifiers. Its attribute, N, is used to define the next identifier to be allocated by the text-generator program. Normally, writers of HTML don't use this element, and Web browsers ignore this element.

TITLE Has a start tag, <TITLE>, and a stop tag, </TITLE>. Every HTML must have one TITLE element that identifies the contents of the document. The title cannot contain anchors, paragraph elements, or highlighting. A title that is descriptive outside the context of a document's context works best, because the title is commonly used to identify a document in navigation and indexing applications (for example, hotlists and spiders).

Here is an example title:

<TITLE> An Example title </TITLE>

BODY Elements

BODY elements are used to mark text as content of a document. Unlike the HEAD elements, almost all these marks lead to some visual expression in the browser.

The BODY elements follow:

A This is the ANCHOR element, which is used as the basis for linking documents together.

A Attributes

Href Identifies the URL of the hypertext reference for this anchor in the form Href="URL", where the URL given is the resource that the browser retrieves when the user clicks the anchor's hotspot. For example,

<A Name="W3C-reference" Href="http://www.w3.org/">W3C</A> will take the user to the World Wide Web's home.

Methods Provides information about the functions that the user can perform on the Href object.

Name Creates a name for an anchor. This name then can be used within the document or outside the document to refer to the portion of text identified by the name. For example,

<A Name="AnchorName">Text can have a named anchor.</A> <A Href="#AnchorName">A jump can go to that anchor within the file in which it is named...</A> <A Href="level0.html#AnchorName"> ...or from another file (perhaps on a remote host).</A>

Rel Defines the relationship defined from the current document to the target (Href document). See the discussion of the Rel attribute in the LINK element.

Rev Defines the relationship defined from the target (Href document) to the current document. (See the discussion of the Rev attribute in the LINK element.)

Title Specifies the title of the document given by the Href attribute of the anchor. A browser could use this information to display this title before retrieving it or to provide a title for the Href document when it is retrieved (for example, if the document is at an FTP site, it will not have a title defined).

Urn Indicates the uniform resource name of the target (Href) document. The specification for URN and other addressing is still in development (see http://www.w3.org/hypertext/WWW/Addressing/Addressing.html).

Note that an anchor can have both the Name and Href attributes:

<A Name="W3C-reference" Href="http://www.w3.org/">W3C</A><BR>

ADDRESS Brackets ownership or authorship information, typically at the start or end of a document.

BLOCKQUOTE Brackets text that is an extended quotation from another source. A typical rendering of a BLOCKQUOTE is to provide extra indentation on both sides and possibly highlight the characters in the BLOCKQUOTE.

BODY The BODY element's start (<BODY>) and stop (</BODY>) tags mark the content of an HTML document.

BR Forces a line break. Typically, BR is used to represent postal addresses or text (such as poetry) where line breaks are significant. For example,

The HTML Institute<BR> 45 General Square<BR> Markup, LA 70462<BR>

Is rendered as

The HTML Institute 45 General Square Markup, LA 70462

DIR (LI) Brackets a list of items that are at most 20 characters wide. The intent is that a browser can render this in column widths of 24 characters. DIR can use the Compact attribute as the start of the list-for example, <DIR Compact>. Like MENU, however, the DIR element rarely is used except when it is described in HTML books and lists of HTML elements.

DL (DT, DD) A definition list, or glossary, has these parts:

A term A detailed explanation of a term, identified with the <DT> element.

Another term An explanation of a term, which may include several lines of text and is identified with the <DD> element.

DL can have the Compact attribute. Use the Compact attribute <DL Compact> as the start of the list.

H1, H2, H3, H4, H5, H6 These are elements that create an information hierarchy in a document in the form of headers. H1 is the major header; H2 and the others are subordinate to it.

HR A horizontal rule separator that divides sections of text.

IMAGES (IMG) Allows graphical browsers to place graphics images in a document at the location of the Element tag (to create an inline image). For example,

<IMG Src="http://www.december.com/images/stats.gif" Alt="statistics sphere" Align="middle">

IMG Attributes

Align Sets the positioning relationship between the graphic and the text that follows it. Values include the following:

bottom Specifies that the text following the graphic should be aligned with the bottom of the graphic.

middle Specifies that the text following the graphic should be aligned with the middle of the graphic.

top Specifies that the text following the graphic should be aligned with the top of the graphic.

Alt A string of characters can be defined that will be displayed in nongraphical browsers or browsers with image loading turned off. Nongraphical browsers otherwise ignore the IMG element.

Ismap Identifies the image as an imagemap, where regions of the graphic are mapped to defined URLs. Hooking up these relationships requires knowledge of setting an imagemap file on the server or the MAP element to define client-side imagemaps. See Chapter 16, "Imagemaps."

Src Indicates the source file of the image.

MENU (LI) The MENU element brackets a more compact unordered list of items. It also employs the LI element to mark the elements. Typically, it is rendered using bullets to start items. Very few browsers-that I know of none-actually display the MENU element in any different way than the UL element. You also can use the Compact attribute as the start of the menu. For example, <MENU Compact>.

OL (LI) The OL element brackets an ordered list of items. It also contains one or more LI elements to mark the elements. Typically, a Web browser renders these elements as a list numbered with Arabic numerals in order, starting with 1. You also can use the Compact attribute as the start of the list. For example, <OL Compact>.

P A paragraph start; optionally, it can have a paragraph end tag, </P>.

PRE Sets up a block of text that will be presented in a fixed-width font, with spaces counting as characters.

PRE's one attribute, Width, can be used to specify the width of the presentation. Anchors and character formatting can be placed within PRE, but not elements that define paragraph breaks (for example, headings, address, the P element, and so on).

UL (LI) The UL element brackets an unordered list of items and contains the LI element to mark the elements. Typically, the list elements are rendered using bullets to start the items.

You can use the Compact attribute to suggest to the Web browser that the items in the list should be close together. For example, the tag <UL Compact> is the start tag for a compact unordered list.

Character Formatting

Level 1 HTML defines several semantic elements for character formatting:

CITE Marks a citation of a book or other work. For example,

<CITE>The Castle</CITE>

CODE Marks computer language source code; often rendered as monospace type, as in this example:

<CODE>

Note that the CODE element's rendering does not keep the line breaks that the PRE element does:

</CODE>

EM Marks <EM>emphasis</EM>; typically rendered the same as the physical tag for italics or as underlined text.

KBD Used in computer instructions to mark text that the user enters on a keyboard. Typically rendered as

<KBD>monospaced text</KBD>

SAMP Delimits a sequence of characters that are to be rendered as is ("sample" text). For example,

<SAMP># @ % * !</SAMP>

STRONG Marks strong emphasis. Often rendered the same as the physical bold element.

VAR Marks a variable used in computer code, equations, or other work. A <VAR>variable</VAR> typically is rendered in italics.

Level 1 HTML also defines several physical format elements that allow the formatting of characters in a document. These are called physical elements because they dictate the appearance of the text rather than the semantic intent of the words (contrast this with level 1's semantic elements for character formatting).

B  Marks bold text.
I  Marks italic (or underlined) text.
TT  Marks teletype (fixed with typewriter) text.

Characters and Entities

You can write text in HTML using any of the ASCII characters, such as all the keyboard characters:

a-z, A-Z, 0-9

! @ # $ % ^ & * ( ) _ + - = | \ { } [ ] : " ~ ; ' ' ? , . / .

Because some characters (for example, & ") are used within HTML to create tags, some browsers don't render them. Special entities can be used within documents to represent these characters:

Less than sign  < = <
Greater than sign  > = >
Ampersand  & = &
Quotation mark  " = "

An HTML document can have the set of ISO Latin character entities. See ISO Latin 1-character entities. (See the ISO Latin character entity table in Appendix B, "HTML Language Reference," or http://www.december.com/html/spec/latin1.html.)

Use numeric codes to represent characters. See numeric code references in HTML. (See the Numeric Code Entity table in Appendix B or http://www.december.com/html/spec/codes.html.)

An HTML Document Layout

To make an HTML document, you place head elements (information about the document) and body elements (the content of the document) in a file.

It is a good idea to wrap these parts in tags that mark the start and end of the head and body. Then wrap this up inside tags marking the start and end of the HTML code, as shown in this example:

<HTML> <HEAD> head elements go here </HEAD> <BODY> body elements go here </BODY> </HTML>

HTML Tutorial

The preceding description of the tags, elements, and entities in an HTML document gives an overview of the language syntax. This section now applies that syntax to a sample implementation that illustrates the most popular elements and entities used. Before that, however, developers should be aware of the limitations of using level 0 and 1 HTML. This tutorial begins by leading you through building a generic HTML page, and then specializing this page as a look-and-feel template for the design shown in Figure 11.15.

What HTML Levels 0 and 1 Can't Do

A new user of HTML often wants to do some things that might not seem all that complex for a text formatting language. There are, however, some things (basic) that HTML can't do, including making tables, rendering mathematical equations, specifying multiple columns of text or graphics, using tab characters, including an external HTML file, or embedding a movie into a document. Some of these features are included in HTML at level 2 and higher (see the next chapter). Some features, like mathematical equations, are still in the works.

HTML Features That Many Developers Find Tricky

Writers of HTML also will find that some things seem to create more errors than others. Often, HTML writers find the following difficult:

Beginning users of HTML often spend a great deal of time and frustration trying to micro-manage what HTML was never meant to do: page layout. Precise alignments and spacing are not always possible; font style and size can't be controlled with basic HTML (HTML 3.2 now allows control of font size and color). And, of course, the browser width of and other characteristics of the user's browser can't be controlled.

Not all HTML elements are implemented in all browsers. Text-based browsers, of course, will not render many of the character-formatting elements. Other elements, such as MENU, aren't rendered in many browsers and therefore are not used often in current practice.

People writing HTML files sometimes have trouble making sure that the < and > all match up when composing an anchor. For example,

The <A Href="http://www.w3.org/hypertext/WWW/MarkUp/Tags.html">Elements of HTML</A> are head, body, and graphics.

Notice how the anchor starts with <A and ends with </A>, and what's between are the Href attribute, the URL of the resource, and the hotspot for the hypertext. The absence of just one of the symbols (", >, <, or /) causes an error.

Developers should check to make sure that many different browsers will read the HTML file without problems. Some browsers are forgiving and let minor errors slide in HTML. Another browser might not be forgiving, so it is a good idea to check HTML code in at least one or two other browsers, as well as to go through some of the HTML validation checks described in Chapter 6, "Web Analysis."

Developers should check the links to other documents if using relative links. It is possible to refer to other HTML files that are located on a server by using relative links. If a developer is writing the top HTML document (top.html), for example, and then is referring to the index document (myindex.html) that is located in the same directory, a link can be made from top.html to myindex.html:

<A Href="myindex.html">Index</A>

Anyone who links to a top document-perhaps from a distant host-will use the link

<A Href="http://your.host.com/Project/top.html">Top Document</A>

After this user clicks the Index hotspot, the reference to myindex.html/FONT> is resolved to be the URL

http://your.host.com/Project/myindex.html

even though a developer used myindex.html only in the HTML document. This is called relative naming (or relative addressing or linking).

The Basics of Getting Started

Because there are certain tags a developer will have in all HTML documents, it's a good idea to make a template (create a file called template.html) that contains the basics:

<HTML> <HEAD> <TITLE>Document Title</TITLE> </HEAD> <BODY> <ADDRESS>Developer Name (email@host.domain) / Date </ADDRESS> </BODY> </HTML>

Using this template as a base, the following discussion adds most of the commonly used HTML structures.

The Document Title

A title often is used as an identifier of the HTML document in many contexts on the Web (in spider databases and in users' hotlists). Therefore, the title should be meaningful outside of the context of a document's contents (but not be overloaded with every conceivable buzz word to grab a Web spider's attention). A document might be the home page for a research center, for example. Using the title "Home Page," however, won't have any meaning to anyone else who might come across this title. The title "Research Center" would be a bit better, but it's still too generic. The title "The Virtual Reality Research Center" would have more meaning to anyone seeing the document's title in a spider list.

The title is placed between the <TITLE> and </TITLE> brackets in the head of the document:

<TITLE>The Virtual Reality Research Center</TITLE>

This title does not necessarily show up directly in the document's representation in a browser display. (Some browsers do display the title, although this is not a requirement of HTML, because the title is a head element rather than a body element.) Therefore, it sometimes is good practice to repeat the document's title in the text itself (usually as an H1 heading).

Headings

The six levels of headings give the opportunity to create an information hierarchy within a document. As such, the heading elements are used to indicate semantic hierarchy, not necessarily to take advantage of the varying sizes of type that the headings might offer in some browsers. Therefore, developers should attempt to use these headings in sequence, starting with level 1 <H1> and continuing to <H6> in step sizes of 1 (that is, not jumping from using heading 1 to heading 6). If developers are tempted to violate this rule (to use a high-numbered heading to take advantage of the type display change in graphical browsers), they should remember that not all browsers will support a type size change in headings, so that while Mosaic users see small type with <H6>, Lynx users are seeing the same-size type with heading <H1>. Moreover, automatic tools can be used to generate tables of contents that depend on the semantic, not the physical, rendering of these headings.

Similar to the title, the headings should be as descriptive as possible, particularly because some spiders use heading information to index a document's content.

The very first heading that might go in a document could reflect the purpose of the document itself. Continuing with the Virtual Reality Research Center example, the first heading in the document might be a reidentification of this. I put this as a major heading:

<H1>The Virtual Reality Research Center</H1>

Because this entire page is devoted to the Virtual Reality Research Center, a developer normally wouldn't put another <H1> header on the page. Headers showing a hierarchy for the information, such as <H2> and <H3>, one or two levels down from the main description might be included. Hypertext gives the opportunity to avoid extreme nesting of headings within the same HTML page. Instead of nesting information to many levels with headings, consider breaking the page into several HTML pages (possibly using the clustering technique described in Chapter 7, "Web Design").

Paragraphs

The text in an HTML file outside the elements within the < and > tags is rendered as text in paragraphs. Only the <P> tag marks the start of a paragraph, no matter how many blank lines or intervening spaces exist in the source file. Most browsers chew up any extraneous white space between words so that a developer won't be able to format a text using spacing (the preformatted text element, <PRE>, should be used for this).

So, after the initial <H1> heading, it is helpful to explain a little bit about The Virtual Reality Research Center:

<P> Founded in May 1995, The Virtual Reality Research Center is dedicated to collecting and presenting the most current and comprehensive collections of on-line information about Virtual Reality. <P> The Center seeks to create an on-line community of scholars in VR and to be a one-stop source for VR-related information.

Notice that in this example, only a <P> is used to mark the start of the two paragraphs. For readability in the HTML source file, blank lines are placed between the paragraphs, but this is not required. Note also that the line breaks in the HTML source file don't matter. The browser wraps and breaks lines based on how wide the browser display area for the text is, not based on the HTML source (unless a <BR> was used to force a line break).

Lists

Lists provide a very useful way to focus a user's attention on a series of items. As described in the previous element summary, a developer has a variety of choices for lists. Generally, an ordered list (OL) works best for steps or directions that must be done in a particular order or for a list of counted items. An unordered list (UL) is useful for listing items that are not necessarily in order but are at the same level of detail.

An unordered list can be used to show the specific features the VR Center provides:

<P> The VR Center offers information on: <UL> <LI>People interested in VR study and research <LI>Online resources related to VR <LI>Activities related to VR </UL>

A developer quickly can change a list that is unordered to an ordered list by changing the starting tag from <UL> to <OL> and the ending tag from </UL> to </OL>.

Links

Links are the essential ingredients in hypertext and are created by the Anchor tag, A.

The home page in the VR Center example includes links to other pages in the VR Center web. These files are to pages for resources, people, and activities, so file names of resources.html, people.html, and activities.html are good names for these files that will be implemented later. The unordered list of offerings of the VR Center then can include links to these pages:

<P> The VR Center offers information on: <UL> <LI><A Href="people.html">People</A> interested in VR study and research <LI><A Href="resources.html">Online resources</A> related to VR <LI><A Href="activities.html">Activities</A> related to VR</UL>

Notice that the links shown here are relative links. The basic form of making a link follows:

<A Href="URL">Hotspot</A>

Here, URL is the uniform resource locator for the document referenced by the link, and Hotspot is the explanatory text for the link that usually is highlighted (or underlined) in the browser.

Listing 12.2 shows the VR Center example at this point.

Listing 12.2. The VR Center example.

<HTML> <HEAD> <TITLE>The Virtual Reality Research Center</TITLE> </HEAD> <BODY> <H1>The Virtual Reality Research Center</H1> <P> Founded in May 1995, The Virtual Reality Research Center is dedicated to collecting and presenting the most current and comprehensive collections of on-line information about Virtual Reality. <P> The Center seeks to create an on-line community of scholars in VR and to be a one-stop source for VR-related information. <P> The VR Center offers information on: <UL> <LI><A Href="people.html">People</A> interested in VR study and research <LI><A Href="resources.html">Online resources</A> related to VR <LI><A Href="activities.html">Activities</A> related to VR </UL> <ADDRESS>Developer Name (email@host.domain) / Date </ADDRESS> </BODY> </HTML>

Some Flairs and Details

In the preceding section, the most common things in an HTML document were shown: the basic document structure, the headings and body tags, the title, a heading, some paragraphs, a list, and some links. A few other flairs can go in a document to add visual cues to draw and focus the user's attention. These include small images and horizontal lines, as well as details such as a revision link in the head of the document and comment lines in the HTML source code. This section presents some flairs and details to add to the VR Center example.

A Logo

The page for the VR Center, although providing an overview of its offerings, is a bit dry-particularly for users with graphical browsers. One possible flair is a small logo or inline image in the document. First, a developer needs to create the logo itself (or scan it in) with graphics tools on a computer and create a file in a graphics format that can be recognized by the browsers that users are expected to have. A common type of graphics file that works is a Graphics Interchange Format (GIF) file. Of course, nongraphical browsers won't show the logo.

After a VR Center logo is created (in file vr.gif in the same directory as an HTML page), it can be added to the document by the following line just below the <BODY> start tag:

<IMG Src="vr.gif" ALT=" "> The Virtual Reality Research Center

The inline image element <IMG> brings the image in the file given directly into the text of the document. The text to the right of the logo helps identify the full name of the organization. Note also that the Alt=" " attribute can be used to include a descriptive title that will be displayed in browsers that do not support graphics or in some graphical browsers with unloaded images. This is important because, otherwise, the users of these browsers will just see the word IMAGE and might wonder what they're missing.

Horizontal Lines

Just as the fine lines going horizontally across the top of a page in a magazine serve to bracket the text visually for a pleasing appearance, horizontal lines in HTML pages help bracket text. The key is to not overuse these lines, but to use them selectively to help guide the reader's attention in a document. If a document has too many horizontal lines, the value of the lines as guides is reduced. A common strategy using classic page design is to have a horizontal line below the document heading information and just above the foot information (see Fig. 11.4). Note that the head information described here is not the HEAD HTML metainformation but the HEAD content information.

A horizontal line can be added after the logo:

<IMG Src="vr.gif" ALT="VR Logo"> The Virtual Reality Research Center <HR>

and just before the ADDRESS element:

<HR> <ADDRESS>Developer Name (email@host.domain) / Date </ADDRESS>

The two horizontal lines created by <HR> bracket the body of text that contains the page's main information, with the header being the logo and the signature being the address at the bottom of the page. In this way, this organization corresponds closely with a letter style, in which a company logo starts off the letter, a signature ends it, and the content of the letter is bracketed between.

An Address

An address for the developer or maintainer of a web page is very important as a means for contact. An ADDRESS element is not required in an HTML document, nor is its placement within the BODY element restricted to the bottom of the page. Convention usually places it at the bottom, however.

The contents of the address can be the name of the developer for the page or an organizational unit's name and e-mail address. There also can be a link to a home page for that person or organizational unit. See Chapter 11 for a list of more informational cues that might go in the footer of a page. Another technique is to use the "mailto" link to provide a quick way for users to send a letter to the contact address:

<ADDRESS>VR Web Team (<A Href="mailto:web@vrcc.org">web@vrrc.org</A>) / 31 Oct 95</ADDRESS>

Revision Link

Similar to the tradition of signing a page so that users can contact the developers, including a revision link in the header of the document is a valuable (but not necessary) detail. The revision link is created as in this example:

<HEAD> <TITLE>The Virtual Reality Research Center</TITLE> <LINK Rev="made" Href="mailto:web@vrrc.org"> </HEAD>

This LINK Rev element directs anyone who wants to find out more about the revision of this document to contact web@vrrc.org. Although this same information is included in the ADDRESS element, its inclusion in the HEAD element (which actually is not displayed) makes it accessible to browsers that recognize the special function of the LINK Rev element. (In the Lynx browser, for example, pressing C sets up a session to send e-mail to the address given by the LINK Rev="made" element.)

Comments in the HTML Code

Just as the ADDRESS and LINK Rev elements added important contact information as well as documentation to an HTML file, comments add helpful information to an HTML file itself. Although comments are not required (as well as not displayed) in the browser, they can add significant value to a work by providing background and administrative information, labeling the information to show who wrote it, why, and any special considerations for it. Comments are bracketed within .

For example,

     

A Sample HTML Page

Listing 12.3 shows the complete sample HTML page as developed in the preceding discussion.

Listing 12.3. The sample HTML page.

<HTML>      <HEAD> <TITLE>The Virtual Reality Research Center</TITLE> <LINK REV="made" Href="mailto:web@vrrc.org"> </HEAD> <BODY> <IMG Src="vr.gif" ALT="VR Logo"> The Virtual Reality Research Center <HR> <H1>The Virtual Reality Research Center</H1> <P> Founded in May 1995, The Virtual Reality Research Center is dedicated to collecting and presenting the most current and comprehensive collections of on-line information about Virtual Reality. <P> The Center seeks to create an on-line community of scholars in VR and to be a one-stop source for VR-related information. <P> The VR Center offers information on: <UL> <LI><A Href="people.html">People</A> interested in VR study and research <LI><A Href="resources.html">Online resources</A> related to VR <LI><A Href="activities.html">Activities</A> related to VR </UL> <HR> <ADDRESS>VR Web Team (<A Href="mailto:web@vrcc.org">web@vrrc.org</A>) / 31 Oct 95</ADDRESS> </BODY> </HTML>

Figure 12.2 shows this HTML as rendered in the Netscape Navigator for X browser. Figure 12.3 shows this HTML rendered in the Lynx browser.

Figure 12.2 : The Virtual Reality Research Center HTML page example (in Netscape).

Figure 12.3 : The Virtual Reality Research Center HTML page example (in Lynx).

Implementing a Look-and-Feel Template

Besides building HTML documents from scratch, as shown in the Virtual Reality Research Center example, an implementer might use the templates technique to implement pages of a web. Using templates, the look and feel of a web as defined by a diagram such as Figure 11.15 can be used as the basis for implementing all the other pages of a web.

The diagram in Figure 11.15 shows an HTML page that can be created using the basic HTML elements discussed previously. Listing 12.4 shows the HTML source code.

Listing 12.4. The HTML source code.

<HTML>      <HEAD> <TITLE>WEB TITLE - PAGE TITLE</TITLE> <LINK REV="made" Href="mailto:userid@host.domain"> </HEAD> <BODY> <P> <IMG Src="icon.gif" ALT="?? WEB"> WEB TITLE - PAGE TITLE <HR> <A Href="index.html">home page</A> / <A Href="index.html">index</A> <P> information information information information information information information information information information information information <UL> <LI>LIST <LI>LIST </UL> <P> information information information information information information <P> last revised: NAME / DATE <HR> <A Href="resources.html">resources</A> / <A Href="activities.html">activities</A> / <A Href="people.html">people</A> </BODY> </HTML>

Based on this HTML template file, the developer can "fill in the blanks" for all the other pages of the web. This technique can speed up implementation time as well as help enforce the visual consistency of the web's design. Figure 12.4 shows this look-and-feel template as rendered in Netscape.

Figure 12.4 : A look-and-feel template rendered in Netscape.

More HTML Features

Although the VR Research Center sample page illustrates many common features of HTML, some features deserve a closer look because of their complexity and their special uses.

Anchors

One kind of anchor links a hotspot in an HTML document to another resource somewhere out on the Net. The phrase "Virtual Reality," for example, can be linked to a collection of VR information at http://www.stars.com/WebStars/VR.html like this:

<A Href="http://www.stars.com/WebStars/VR.html">Virtual Reality</A>

Another kind of anchor links a hotspot in a document to another place in a document (for example, to allow the reader to jump quickly to another section). At the hotspot, a link can be made as the following:

You can find more about this same topic at the <A Href="#JUMP-TO-NAME">Jump Spot< A> elsewhere in this document.

Notice that, instead of a URL after Href=", a # symbol was placed and a string of characters, JUMP-TO-NAME. At the point in a document to which this phrase refers, a named anchor is made like this:

This <A Name="JUMP-TO-NAME">topic</A> can be defined as follows: ...

This allows users of a document to jump from a hotspot to the portion of the text marked by the destination anchor. The attribute Name identifies a place in the text. These named anchors are where anyone can create a "jump" to that place in the document.

A variation on this anchoring occurs when the document is at a remote place and the jump is between documents on different servers rather than within the same document. If this sample document is in the file example.html on the server www.vrrc.org, a developer who creates a file on another server can jump to the specific place in the example.html document, like this:

You can find more information about <A Href="http://www.vrrc.org/example.html#JUMP-TO- NAME">that topic.

Notice that the full URL of the document was used and then the string #JUMP-TO-NAME to mark the anchor point in that document where the browser should jump.

Nesting

Lists can be nested, as Listing 12.5 shows.

Listing 12.5. A nested list.

Regions of the USA and representative states and cities <UL> <LI>East <OL> <LI>New York <MENU> <LI>Rochester <LI>Latham </MENU> <LI>Delaware </OL> <LI>Great Lakes <OL> <LI>Michigan <MENU> <LI>Troy <LI>Escanaba </MENU> <LI>Wisconsin <MENU> <LI>Milwaukee <LI>Appleton </MENU> </OL> <LI>Midwest <LI>Plains <LI>West </UL>

Physical or semantic character formatting can't be reliably nested:

<B><I>The House of Seven Gables</I></B> is a great book.

This last line won't necessarily display bold italics (although some browsers, such as Netscape, support such an accumulation of character formatting). If you try this, make sure that you keep the nesting from overlapping. If you begin with the bold start tag, for example, finish the formatted text with the bold end tag.

Semantic versus Physical Tags

The tags used for character highlights (bold and italics) can be physical, defining the appearance of the characters, as in this example:

<B>Bold</B> <I>Italics</I> <U>Underline</U> <TT>Fixed-width</TT>

Or, the tags can be semantic-that is, they define the meaning of the characters highlighted:

<STRONG>Strong emphasis, often same as bold</STRONG> <VAR>A variable name</VAR> <CITE>A citation</CITE>

The physical tags go against the HTML and SGML philosophies of marking the meaning and structure rather than the appearance. The existence of the physical tags, however, is an acknowledgment that bold, italics, and other forms of character highlights are meaningful in certain contexts. The semantic tags provide an alternative means to mark the meaning of the character highlights. The semantic tag style uses <STRONG>...</STRONG> to indicate emphasis rather than <B>...</B>, for example.

These semantic alternatives help achieve an appearance-independent HTML file. One problem with semantic tags is that a tag's appearance might not correspond to the context in which it is used. A <CITE>Citation</CITE> tag, for example, typically is rendered in italics. This might be fine for many contexts. It might be, however, that citations within a discipline or field of study always should be marked by quotation marks around the cite (short stories or poem titles, for example). Therefore, the semantic tags in many cases provide a useful alternative to the physical tags and can be used where possible. But in situations where the rendering of the characters is important, such as where a particular physical style is required, a developer must use a physical tag.

Nicks and Cuts

Whenever a developer creates an HTML page, some time should be spent examining its rendering with several different brands of browsers. Often, particularly when working with links to graphics displayed by Mosaic, a developer finds marks and irregularities in the display. One example is a "nick" that can occur when making a logo a hotspot. For example, a developer might make an image a hotspot, as in this code:

<A Href="http://www.vrrc.org/"><IMG Src="vr.gif" ALT="VR Logo"> </A>

Some browsers' interpretation of the space between the <IMG SRC="vr.gif"> and the end of the anchor, </A>, however (for example, Mosaic's) causes a small line (a nick) to appear in the Mosaic display, as shown in Figure 12.5. The nick is the small line after the logo. Removing the space between <IMG SRC="vr.gif"> and </A> cures the nick.

Figure 12.5 : A nick in an icon hotspost (magnified).

Similarly "cuts" can appear under other conditions in specific browsers. A physical tag such as <I>, for example, can be placed within a hotspot, as in this example:

You can find more about this same topic at the <A Href="#JUMP-TO-NAME">Jump <I>Spot</I></A> later in this document.

Some browsers display a cut or discontinuity in the display of the anchor line in Jump Spot. Although curing all nicks and cuts is not crucial for a successful HTML document (and it actually goes against the philosophy of HTML itself to not worry about a browser display), fine-tuning HTML sometimes can help make its appearance more pleasing in a target browser. If an unusual display appears in a browser, it might be an indication that the syntax of HTML has been violated and the browser can't determine a satisfactory way to resolve the error.

Key HTML Information Sources

Here is a list of on-line resources that are useful for further information about HTML:

The HTML Writer's Guild (http://www.hwg.org/)

HTML information from the World Wide Web Consortium (http://www.w3.org/hypertext/WWW/MarkUp/HTML.html)

The HTML Station support site for readers of this book. It links you to reference information and demonstrations of HTML. You'll find information about all levels HTML, as well as examples, tag summaries, and links to supporting and reference information (http://www.december.com/html/).

"WWW Names and Addresses, URIs, URLs, URNs," from the World Wide Web Consortium (http://www.w3.org/hypertext/WWW/Addressing/Addressing.html)

Drafts of Internet Engineering Task Force: Check for information from the HTML working group (http://www.ietf.org/)

Information Quality Virtual Library from Coombs Computing Unit, Research Schools of Social Sciences & Pacific and Asian Studies, The Australian National University (http://coombs.anu.edu.au/WWWVL-InfoQuality.html)

Basic HTML Check

HTML is a way to express information and ideas in hypertext. Based on a philosophy of marking up the meaning of a text rather than its appearance, HTML gives a developer a great deal of flexibility in defining semantic structures in a document but discourages attempts to manipulate the appearance of text in any particular browser.
HTML itself is written in text files following a specific format for elements and entities. Head elements identify information about a document, such as its title, that are not displayed directly in a browser. Body elements such as headings, lists, block quotes, preformatted text, and physical and semantic character highlights mark the structure of a document. The image element embeds inline images in a document. Entities are special characters that a developer can have displayed in most browsers.
To create HTML files, a developer should make a template to hold the basic tags to mark the head, body, and address parts of a document. Based on this template, a developer can add headings, paragraphs, lists, and links. Horizontal rules and inline images can improve the appearance of an HTML file. Comments, the ADDRESS element, or a revision link in the head of the file can help document a file.
The guidelines to making anchors, nesting elements, and physical and semantic tags can help a developer be prepared for special situations or struggles with the structure of a document. Finally, a careful examination of a document in a variety of browsers might reveal a variety of anomalous displays-nicks and cuts-that can be cured by removing spaces or fixing errors in the HTML itself.
Writing HTML, although conceptually fairly straightforward, involves a great deal of syntax and detail work that might make it cumbersome to routinely produce. You can use various tools to prepare HTML code (see Chapter 17). Also, the basic HTML covered in this chapter doesn't do everything you'll want it to. The next chapter provides an overview of advanced features and extensions of HTML (level 2 and higher).

Chapter 12

Basic HTML 3.2

CONTENTS