Introduction to HTML

HTML is one of three fundamental concepts on which the WWW rests. (The other two are the HTTP protocol and the URL addressing/naming scheme.) Consequently, it is important to understand not just how to use HTML but also its role within the web, its capabilities and limitations, and possible future directions it may take.
Currently, HTML provides three primary capabilities. First, it provides facilities for describing how information should be represented or displayed. Second, it provides a mechanism for including within a document links or references to other documents; this mechanism includes a concept of behavior, so that when a user "selects" a link, the document associated with that link is accessed. Third, HTML allows a document to include other forms of data, such as images, that are displayed within the document. Other capabilities, such as providing forms that can be filled out and, in turn, used to invoke an associated process, can be viewed as extensions of these three basic functions. Future developments, such as passing active program from server to browser/interpreter (Java/HotJava), will extend this list of basic capabilities, but for now, the three capabilities noted above are fundamental.
The reason HTML is important within the Web architecture is that it serves as a de facto standard for describing, in general terms, how information is structured and/or should be displayed. Thus, it permits different vendors to develop a variety of browsers that can run on different hardware and software platforms, but display data in approximately the same way. This support for heterogeneity is consistent with the overall concept of global scale that runs through web thinking. However, the consistency provided by HTML comes at a cost: it is currently very limited in its formatting capabilities, far more so than what most users have come to expect from their word processing and page layout software. The temptation -- and, indeed, current practice -- is for individual vendors, such as Netscape, to implement new, nonstandard features in their systems. In some cases, such innovations have been incorporated into standard versions of HTML, but keeping HTML sufficiently coherent and consistent across platforms so that the web remains heterogeneous will be a major challenge over the next few years.
For additional information on HTML, see HTML Primer for a basic "howto" manual and the WWW Consortium's HTML page for pointers to technical discussions of HTML and related issues, including discussions and specifications for current and future versions of HTML.

How To
This section provides suggestions for getting started using HTML. It is intended to supplement more through discussions of HTML. To assist with this discussion, I have included pointers to two null documents: a blank document that includes the basic HTML forms and a dummy document that includes those forms as well as dummy "content." As you read this introduction, you should view the source and formatted versions of the dummy document.
Basics
Data is structured and formatted in HTML through tags that are embedded within a document. The basic form of a tag is:
 
less_than tag greater_than
<tag>
For example:
 
<BODY>
identifies the basic content of a document.
Tags are typically paired, with a beginning and an ending form. The ending form repeats the tag, but with a slash (/) inserted before the less_than and the tag name, as in </BODY>.
The document as a whole is bounded by a beginning and an ending HTML tag:
 
<HTML>
the document
</HTML>
What I have referred to as the document is composed to two main parts: HEAD and BODY. The head includes meta information about the document, while the body includes the document's content. Thus, a dummy document would appear as follows:
 
<HTML>

<HEAD>
  meta information
</HEAD>

<BODY>
  document content, including HTML tags
</BODY>

</HTML>
While the head can contain a half-dozen or so forms of information, for now the most likely form to include is a title -- a short 3-4 word identifier for the document that is displayed by some browsers. For example:
 
<TITLE> intro to html </TITLE>
The body normally includes the majority of information in a document. The rest of this discussion will outline HTML features that are used in the body. Refer to the dummy HTML document to see how these features are specified.
Headings

HTML provides six levels of headings, as illustrated in the dummy document. All are left justified, although by using the Netscape CENTER extension, you have some flexibility. However, such "physical" formatting contrary to the principle of indicating logical relationships in documents, which underlies the concept of levels of heading.

Spacing

There are four basic ways to specify spacing of text, and several more specialized means. the "basic" spacing commands includes the following:

paragraphs, which denote a logical break
break, which denotes a physical break
preformatted text, which reproduces line spacing in the HTML formatted text as found in the original text
blockquote, which indents long blocks of quoted text
horizontal rule is a separator that indicates a break in content by both space and a horizontal line or bar across the page (or some portion of it)

Lists

HTML provides five basic kinds of lists:

unordered, or "bulletted," lists
ordered, or numbered, lists
definition lists that include pairs of terms and definitions
menu lists that are normally short anchors
directories that are even shorter anchors, usually shown on the same line.
Note, also, that the basic lists can be nested and intermingled.

Fonts

HTML currently supports only three or four fonts. they are designated in two ways. First, fonts are indicated logically, as in text that is to be emphasized or STRONGLY EMPHASIZED. Second, the physical font can be indicated, as in bold or italics.

Anchors

Anchors are symbolic pointers to other web pages or to positions within the current or another page. The syntax for the anchor appears complicated and ugly at first glance, but is fairly simple after you understand it. Note, first, that like most other tags, the anchor tag has an opening and closing form:
<A> followed by </A>
What comes between the opening and closing tags -- the text, "followed by," in the example above -- is regarded as an anchor marker. When a user selects this data, the browser will move to another web page or location on a page. The anchor marker is usually text and is displayed in a particular color and/or is underscored. However, it may also be an image, designated in the normal way (i.e., through an IMG SRC tag, explained below).
The "magic" in all of this is the designation of the web page that is to be accessed by the browser when the anchor marker is selected. It is designated thorough a parameter or attribute, called HREF, within the opening anchor tag and an associated value, most often a URL. While most anchors point to a different web document, they may also point to a specific location, such as a section or heading, within the document. This is done through the NAME facility. See the dummy HTML page or the HTML primer for details.

Images

Images can be included within a document through the IMG tag. The URL for the image is supplied through the SRC attribute. Additional features permit an alternative text message to be indicated that will be displayed when the image cannot be located by the browser, and the user has rudimentary control over the placement of the image with respect to right and left alignment. Additional layout options are provided through the Netscape extensions, listed below.

Misc.

By convention, most documents should include information at the bottom about the author(s), date of creation or update, and who to contact for additional information or comments. With respect to who to contact, include anchors that point to the contact person's homepage, if possible, and to his or her e-mail address. Most should also conclude with references to the next document in logical sequence and/or a reference back to a higher level starting point. This information will, of course, be specified as anchors.