HTML, SGML, XML, XHTML
<html>
<head>
<title>Example: HelloWorld</title>
<head>
<body>
<h1>HelloWorld Example</h1>
<body>
</html>
In 1980, physicist
Tim Berners-Lee, who was an independent contractor at
CERN,
proposed and prototyped
ENQUIRE, a system for
CERN
researchers to use and share documents. In 1989, Berners-Lee and CERN data
systems engineer
Robert Cailliau each
submitted separate proposals for an
Internet-based
hypertext system providing similar
functionality. The following year, they collaborated on a joint proposal,
the WorldWideWeb (W3) project, which was accepted by CERN.
The first publicly available description of HTML was a document called
HTML Tags, first mentioned on the Internet by Berners-Lee in late 1991. It describes 22 elements comprising
the initial, relatively simple design of HTML. Thirteen of these elements
still exist in HTML 4. HTML is a text and image formatting language used by
internet browsers to dynamically format web pages. The
semantics of many of its tags can be traced to early text
formatting languages such as runoff. Runoff was developed in the
early 1960s for the
CTSS (Compatible Time-Sharing System) operating system, and its
formatting commands were derived from the commands used by typesetters to
manually format documents. Runoff was later incorporated into the
UNIX
operating system in more advanced formatting programs such as
roff,
nroff, and
troff.
Berners-Lee considered HTML to be, at the time, an application of SGML, but it was not formally defined as such until the mid-1993 publication, by the IETF, of the first proposal for an HTML specification: Berners-Lee and Dan Connolly's "Hypertext Markup Language (HTML)" Internet-Draft, which included an SGML Document Type Definition to define the grammar. The draft expired after six months, but was notable for its acknowledgment of the NCSA Mosaic browser's custom tag for embedding in-line images, reflecting the IETF's philosophy of basing standards on successful prototypes. Similarly, Dave Raggett's competing Internet-Draft, "HTML+ (Hypertext Markup Format)", from late 1993, suggested standardizing already-implemented features like tables and fill-out forms.
After the HTML and HTML+ drafts expired in early 1994, the IETF created an HTML Working Group, which in 1995 completed "HTML 2.0", the first HTML specification intended to be treated as a standard against which future implementations should be based. Published as Request for Comments 1866, HTML 2.0 included ideas from the HTML and HTML+ drafts. There was no "HTML 1.0"; the 2.0 designation was intended to distinguish the new edition from previous drafts.
Further development under the auspices of the IETF was stalled by competing
interests. Since 1996, the HTML specifications have been maintained, with
input from commercial software vendors, by the World Wide Web
Consortium
(W3C). However, in 2000, HTML also became an international standard;
The last HTML specification published by the W3C is the HTML
4.01 Recommendation, published in late 1999. Its issues and errors were last
acknowledged by errata published in 2001.
Since its inception, HTML and its associated protocols gained acceptance
relatively quickly. However, no clear standards existed in the early years
of the language. Though its creators originally conceived of HTML as a
semantic language devoid of presentation details, practical uses pushed many
presentational elements and attributes into the language, driven largely by
the various browser vendors. The latest standards surrounding HTML reflect
efforts to overcome the sometimes chaotic development of the language and to
create a rational foundation for building both meaningful and well-presented
documents. To return HTML to its role as a semantic language, the
W3C has developed style
languages such as
CSS and XSL to shoulder the
burden of presentation. In conjunction, the HTML specification has slowly
reined in the presentational elements.
There are two axes differentiating various flavors of HTML as currently
specified: SGML-based HTML versus XML-based HTML (referred to as XHTML) on
one axis, and strict versus transitional (loose) versus frameset on the
other axis.
One difference in the latest HTML specifications lies in the distinction between the SGML-based specification and the XML-based specification. The XML-based specification is usually called XHTML to distinguish it clearly from the more traditional definition; however, the root element name continues to be 'html' even in the XHTML-specified HTML. The W3C intended XHTML 1.0 to be identical to HTML 4.01 except where limitations of XML over the more complex SGML require workarounds. Because XHTML and HTML are closely related, they are sometimes documented in parallel. In such circumstances, some authors conflate the two names as (X)HTML or X(HTML).
Like HTML 4.01, XHTML 1.0 has three sub-specifications: strict, loose, and
frameset.
Aside from the different opening declarations for a document, the
differences between an HTML 4.01 and XHTML 1.0 document—in each of the
corresponding DTDs—are largely syntactic. The underlying syntax of HTML
allows many shortcuts that XHTML does not, such as elements with optional
opening or closing tags, and even EMPTY elements which must not have an end
tag. By contrast, XHTML requires all elements to have an opening tag or a
closing tag. XHTML, however, also introduces a new shortcut: an XHTML tag
may be opened and closed within the same tag, by including a slash before
the end of the tag like this: <br/>. The introduction of this shorthand, which is not
used in the SGML declaration for HTML 4.01, may confuse earlier software
unfamiliar with this new convention.
To understand the subtle differences between HTML and XHTML, consider the
transformation of a valid and well-formed XHTML 1.0 document that adheres to
Appendix C (see below) into a valid HTML 4.01 document. To make this
translation requires the following steps:
Those are the main changes necessary to translate a document from XHTML 1.0
to HTML 4.01. To translate from HTML to XHTML would also require the
addition of any omitted opening or closing tags. Whether coding in HTML or
XHTML it may just be best to always include the optional tags within an HTML
document rather than remembering which tags can be omitted.
A well-formed XHTML document adheres to all the syntax requirements of XML.
A valid document adheres to the content specification for XHTML, which
describes the document structure.
The W3C recommends several conventions to ensure an easy migration between
HTML and XHTML).
The following steps can be applied to XHTML 1.0 documents only:
By carefully following the W3C’s compatibility guidelines, a user agent
should be able to interpret the document equally as HTML or XHTML. For
documents that are XHTML 1.0 and have been made compatible in this way, the
W3C permits them to be served either as HTML (with a
text/html MME type), or as XHTML (with an
application/xhtml+xml or
application/xml MIME type). When delivered as XHTML, browsers
should use an XML parser, which adheres strictly to the XML specifications
for parsing the document's contents.
As this list demonstrates, the loose flavours of the specification are
maintained for legacy support. However, contrary to popular misconceptions,
the move to XHTML does not imply a removal of this legacy support. Rather
the X in XML stands for extensible and the W3C is modularizing the entire
specification and opening it up to independent extensions. The primary
achievement in the move from XHTML 1.0 to XHTML 1.1 is the modularization of
the entire specification. The strict version of HTML is deployed in XHTML
1.1 through a set of modular extensions to the base XHTML 1.1 specification.
Likewise someone looking for the loose (transitional) or frameset
specifications will find similar extended XHTML 1.1 support (much of it is
contained in the legacy or frame modules). The modularization also allows
for separate features to develop on their own timetable. So for example
XHTML 1.1 will allow quicker migration to emerging XML standards such as
MathML
(a presentational and semantic math language based on
XML) and XForms-a new highly advanced web-form technology to replace the
existing HTML forms.
In summary, the HTML 4.01 specification primarily reined in all the various
HTML implementations into a single clear written specification based on
SGML. XHTML 1.0, ported this specification, as is, to the new XML defined
specification. Next, XHTML 1.1 takes advantage of the extensible nature of
XML and modularizes the whole specification. XHTML 2.0 will be the first
step in adding new features to the specification in a standards-body-based
approach.