!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Lecture 3
XML and XHTML
Ketan Mayer-Patel
University of North Carolina
Announcements
- Get a CS Account
- Keeping up with Readings
- Assignment 1
Character References (Entities)
- The problem
- Most commonly used references:
- & for &
- < for <
- > for >
- " for "
- General numeric character reference form
- Example
Back to document type.
- DTD indicates standard to follow.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
- Identifying string
- URL pointing to a copy of the official document type definition
- Flavors of HTML
- Strict
- Transitional
- Frameset
- The official HTML 4.0 standard
Validation
- Type declaration is
machine readable
- Specifies legal elements, attributes, content constraints, etc.
- Advantages of validation
- Tool for validating HTML
- A word about character sets.
HTML shortcomings
- Mixed case
- Inconsistent nesting
- Empty tags and implied tags.
- Ambiguous validation errors.
- Attribute quoting
- Closed tag set for a specific domain
Motivating XML
- Separate structure from semantics
- Allow domain-specific tags
- Provide a way to mix tag vocabularies
- Use it for more than the just the web.
- Enforce
good
structure
- Case sensitivity
- Perfect nesting
- Quoted attributes
- XHTML 1.0 = HTML 4.0 + XML
- Abide by the rules of XML for structure
- Meaning of tags provided by HTML 4.0
XML Overview
- Format for structured documents
- Structure model = heirarchically nested elements
- Domain-specific element tags
- Meaning determined by common agreements and standards
- Separates form from meaning
XML Standards
- Thin, purpose-specific standards

XML Document Structure
- XML Declaration
- Document Type Declaration
- Generally optional, but needed for XHTML
- Document Body
XML Declaration
<?xml version="1.0" encoding="UTF-8" ?>
- Must be first line of the document
- Nothing can come before (not even whitespace)
- version attribute is required
- encoding is optional but widely used
Examples
<?xml version="1.0" ?>
<?xml version="1.0" encoding ="1.0" ?>
Invalid declarations:
<xml version="1.0" ?>
<?xml version="1.0" >
<?XML version="1.0" ?>
<?xml version=1.0 ?>
DTD Declaration
<!DOCTYPE Root_Element SYSTEM Public_ID DTD_URL?>
- Specifies encompassing root element
- ID string is unique and well-known
- DTD URL not guaranteed to work
- DTD form is compatible with HTML 4.0 DTDs
XML Document Body
- Mix of markup elements and content.
- Content also known as character data.
- Root element encloses all.
XML Elements
<element_name attributes*>
End tag form:
</element_name>
Element name rules
- Case sensitive
- Starts with letter, underscore or colon
- May contain letters, underscore, colon, digits, hyphen, or period.
XML Attributes
name="value"
name='value'
Attribute name rules same as for elements
Values MUST be quoted
Value MUST be given
Difference with HTML
Reserved Attributes
- Commonly used:
- xml:lang
- Language/character set information for an element
- xmlns
- xml:base
- xml:space
Empty Elements
- Special element tag syntax for empty elements
<element_name attributes* />
Example:
<br />
<br id="break1" />
XML Comment Mechanism
- Start of comment: <!--
- End of comment: -->
- Comment content ignored.
- No meaning as either markup or character data
- Restrictions:
- Cannot contain
--
- No nesting.
- Cannot be inside start, end, or empty tags
XML Entities
- Same mechanism as HTML character references
- Numeric character value entity forms:
&#decimal_value;
&#xhex_value;
General XML defines only 5 by name:
- <, >, &, ", '
- <, > & ", &apos
Specific uses of XML can define additional named entities
CDATA
- Useful when character data (i.e., non-markup content) contains lots of special characters.
- Start of CDATA: <![CDATA[
- End of CDATA: ]]>
- Whatever is inside is not processed for markup
- Means that we don't have to use entities for &, <, >, etc.
- Example
Well-Formedness
- XML provides rules for structure (not meaning)
- Conforming documents known as "well-formed"
- Rules for well-formedness:
- Must start with xml declaration.
- Tags must next perfectly
- Good: <a><b></b></a>
- Bad: <a><b></a></b>
- Empty tags must use empty tag form:
- All attributes must have values
- All attribute values must be quoted
- All elements must be enclosed by a single root tag.
Sharing Vocabularies
- Suppose we wanted to invent a set of XML tags for a bookstore.
- What elements might we define?
- How might attributes be?
- What problems come up if I want to publish or distribute my XML files which describe my books?
Sharing XML Problems
- P1: Meaning of tags are specific to whomever invented them.
- P2: Different people/organizations my use the same tag names.
- Our bookstore vs Library of Congress vs Amazon vs...
- Solutions?
Sharing XML Solutions
- Solution to P1: Standardization and documentation.
- Create standardized description for what particular tag vocabularies mean.
- Still have problem of different tag sets using the same names
- Solution to P2: Namespaces
Namespace
- Motivation:
Define a mechanism for unqiuely naming elements and attributes so different
vocabularies can be mixed into an XML document without name conflicts.
-- Sall, XML Family of Specifications, p. 211.
XML Namespaces Overview
- Tag vocabularies identified with a unique URL
- URL does not have to point to anything.
- Often points to explaining documentation or website of organization
- Simply used as a unique string to identify the tag vocabulary
- Associate a prefix with the namespace.
- Use qualified element and attribute names
Qualified Names
prefix:name
Prefix given in namespace declaration.
Name is element or attribute name defined within namespace.
Examples:
- xhtml:ul
- xlink:href
- kmp:book
- foo:bar
Declaring a Namespace
- xmlns attribute
- Form: xmlns:prefix="ID_URL"
- Example: <kmp:bookstore xmlns:kmp="http://kmp-books.com/xml-tags">
- Associates tags and attributes with prefix
- Prefix used in qualified element and attribute names
- Prefix can be almost anything
Namespace Prefix Scope
- Scope of prefix is element where declared
- Applies immediately
- Example
Default namespace
- Useful if most/all tags are part of the same tag vocabulary
- Form: xmlns="ID_URL"
- Example:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Validating XML
- XML demands well-formedness
- Meaning of elements/attributes specific to application domain
- Validity defined by some other mechanism
- One mechanism: DTD
- Others: XML-Scheme, RELAXNG, Schematron
- XML schema language comparison
XHTML (at last!)
- XHTML 1.0 = XML using tags defined by HTML 4.0
- Namespace identified by this URL:
http://www.w3.org/1999/xhtml
Format specific tags and attributes deprecated
DTD's defined for different levels of conformity for validation.
XHTML DTDs
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Transitional and Frameset DTDs also defined
Correspond to HTML 4.0 DTDs (but the DTDs are not the same!)
Important XHTML Rules
- Must be well-formed XML
- Perfect nesting
- Quoted attribute values
- Empty element form
- Must specify one of the three XML DTDs
- Must declare XHTML namespace
- Element and attribute names are all lowercase.
- Small but important difference with HTML 4.0
Learning and Using XHTML
- Official XHTML specification
- W3 Schools XHTML reference and tutorials
State of XHTML Today
- Most current version: 1.1
- Difference between 1.1 and 1.0 very small.
- Work has started on XHTML 2.0
- Work also started on HTML 5.0
- HTML 5.0 != XHTML 2.0
- Likely not to be an application of XML