Sei sulla pagina 1di 7

HTML Overview

HTML (Hypertext Markup Language) is the language used to create web


documents. It defines the syntax and placement of the elements that make
up the structure of a web document. All web page elements are identified by
special tags that give browsers instructions on how to display the content
(the tags themselves do not display). Some HTML tags are used to create
links to other documents, either locally or over a network such as the
Internet.

The HTML Standard


The HTML standard and all other Web-related standards are developed under
the authority of the World Wide Web Consortium (W3C). Standards,
specifications, and drafts of new proposals can be found
at http://www.w3.org. The most recent standard for document markup is the
HTML 4.01 specification.

The W3C has pulled in the reins with the HTML 4.0 specification
(which is further refined in the current 4.01 version). It incorporates
many of the tags introduced by the popular browsers that improve
web functionality. It also officially "deprecates" tags that are used in
common practice but are not in keeping with the priorities of the
markup language (such as keeping style information out of content).
Before HTML there was SGML (Standard Generalized Markup Language),
which established the system of describing documents in terms of their
structure, independent of appearance. SGML is a vast set of rules for
developing markup languages such as HTML, but it is so all-encompassing
that HTML uses only a small subset of its capabilities.

Publishers began storing SGML versions of their documents so that they


could be translated into a variety of end uses. For example, text that is
tagged as a heading may be formatted one way if the end product is a
printed book, but another way for a CD-ROM. The advantage is that a single
source file can be used to create a variety of end products. The way it is
interpreted and displayed (i.e., the way it looks) depends on the end use.

Because HTML is one application of an SGML tagging system, this principle of


keeping style information separate from the structure of the document
remains inherent to the HTML purpose. Over the past few years, this ideal
has been compromised by the creation of HTML tags that contain explicit
style instructions, such as the <font> tag.

Cascading Style Sheets promise to keep style information out of the content
by storing all style instructions in a separate document (or a separate section
of the source document).

Three Flavors of HTML 4.01


While the W3C has definite ideas on how HTML should work, they are also
aware that it is going to be a while before old browsers are phased out and
web authors begin to mark up documents properly. For that reason, the
HTML 4.01 specification actually encompasses three slightly different
specification documents: one "strict," one "transitional," and one just for
framed documents. These documents, called Document Type Definitions (or
DTDs), define every tag, attribute, and entity along with the rules for their
use. DTDs are written following the rules and conventions of SGML (Standard
Generalized Markup Language).

The HTML 4.01 Strict DTD excludes all deprecated tags and attributes (those
scheduled to be phased out). In an ideal world, all developers would mark up
the structure of their documents according to the strict version of HTML,
leaving all presentation to be handled by style sheets.

The HTML 4.01 Transitional DTD is less restrictive, and it includes many of
the elements dedicated to appearance (such as the <font> tag and
the align attribute) that are in common use today. Most developers today
comply with the transitional specification because it allows more control over
presentation while the industry waits for older browsers (those that don't
support new features such as style sheets) to fade away.

The Frameset DTD is identical to the Transitional DTD, except that it allows
for the <frameset> element to be used in place of the
standard <body> element.

The Future of HTML


According to the W3C, HTML 4.01 is the end of the line for HTML as we know
it. The next version of HTML is the XHTML Version 1.0 specification. XHTML is
the same HTML specification as we know it today, but rewritten using the
new-and-improved rules of XML (Extensible Markup Language). XHTML uses
all the same HTML 4.01 tags, but it enforces a set of rules (such as closing all
tags, putting attribute values in quotation marks, and keeping tags all
lowercase) that make a document "well-formed." Well-formed XHTML will
work in next-generation XML-based browsers, where HTML will not. Our
current HTML coding standards are incredibly lax by comparison.

HTML Tags
Elements in the HTML specification are indicated by tags. An HTML tag is
made up of the element name followed by an optional list of attributes, all of
which appears between angle brackets (< >). Nothing within the brackets is
displayed in the browser. The tag name is generally an abbreviation of the
element's name or the tag's function (this makes them fairly simple to learn).
Attributes are properties that extend or refine the tag's function.

In the current specification, the name and attributes within a tag are
not case sensitive. <BODY BGCOLOR=white> works the same
as <body bgcolor=white>. However, values for particular attributes may be
case sensitive, particularly URLs and filenames.

Containers
Most HTML elements or components are containers, meaning they have
a start tag and an end tag. The text enclosed within the tags follows the
tag's instructions. In the following example, the <I> container tags make the
enclosed text italic:

The weather is <I>gorgeous</I> today.

The end tag contains the same name as the start tag, but it is preceded by
a slash ( / ). You can think of it as an "off " switch for the tag.

For some tags, the end tag is optional and the browser determines when the
tag ends by context. This practice is most common with the <p> (paragraph)
tag. Most browsers automatically end a paragraph when they encounter a
new start tag (although Navigator 4.x has some problems with autoclosing),
so many web authors take advantage of the shortcut. Not all tags allow this,
however, and not all browsers are forgiving, so when in doubt include the
end tag. This is especially important when using Cascading Style Sheets with
your document. The new XHTML standard also requires that all tags be
closed.

Empty ("Standalone") Tags


A few tags do not have end tags because they are used to place standalone
elements in the document or on the page. The image tag (<img>) is such a
tag; it simply plops a graphic into the flow of the page. Other standalone
tags include the linebreak (<br>), horizontal rule (<hr>), and tags that
provide information about a document and don't affect its displayed content,
such as the <meta> and <base> tags.

Empty HTML tags


<area> <frame> <link> <br> <col>
<base> <hr> <meta> <input>

<basefont> <img> <param> <isindex>

Attributes
Attributes are added within a tag to extend or modify the tag's actions.
Attributes always go in the start tag only (end tags never contain attributes).
You can add multiple attributes within a single tag. Tag attributes, if any, go
after the tag name, each separated by one or more spaces. Their order of
appearance is not important.

Most attributes take values, which follow an equals sign (=) after the
attribute's name. Most browsers cannot handle attribute values more than
1,024 characters in length. Values may be case-sensitive, particularly
filenames or URLs.

The syntax for a container tag with attributes is as follows:


<ELEMENT ATTRIBUTE="value">Affected text</ELEMENT>

The following are examples of tags that contain attributes:


<IMG SRC="graphics/pixie.gif" WIDTH="45" HEIGHT="60">
<BODY BGCOLOR="#000000">...</BODY>
<FONT FACE="Trebuchet MS, Arial, Helvetica" SIZE="4">...</FONT>
The HTML 4.01 specification recommends that all attribute values be
enclosed in quotation marks, but it acknowledges that in some cases, they
may be omitted. If the value is a single word containing only letters (a-z or A-
Z), digits (0-9), hyphens (-), periods (.), underscores ( _ ), and colons (:), then
it can be placed directly after the equals sign without quotation marks. If you
are still unsure, using quotation marks consistently for all values works just
fine and is definitely a good idea. In the XHTML specification, all attribute
values must be enclosed in quotation marks in order to be well-formed.

Nesting HTML Tags


HTML elements may be contained within other HTML elements. This is
called nesting, and to do it properly, both the beginning and end tags of the
enclosed tag must be completely contained within the beginning and end
tags of the applied tag. In this example, a bold style (<b> ) is applied to
already italic text:

The weather is <B><I>gorgeous</I></B> today.

Result: The weather is gorgeous today.

Nested tags do not necessarily need to appear right next to each other. In
this example, the bold text is nested within a longer link.

This links to <A HREF="document.html">a really <B>cool</B> page</A>.

Result: This links to a really cool page.

A common mistake is simply overlapping the tags. Nested tags must be


contained entirely (both the start and end tags) within the outer set of tags.
Although some browsers display content marked up this way, other browsers
do not allow the violation, so it is important to nest tags correctly. The
following example shows incorrect nesting of tags (the <I> tag should have
been closed before the <B> tag):
INCORRECT: The weather is <B><I>gorgeous</B></I> today.

Information Browsers Ignore

Some information in an HTML document, including certain tags, is ignored


when the document is viewed in a browser. These include:
Line breaks
Line returns in the HTML document are ignored. Text and elements wrap
continuously until they encounter a <p> or <br> tag within the flow of the
document text. Line breaks are displayed, however, when text is tagged as
preformatted text (<pre>).

Tabs and multiple spaces


When a browser encounters a tab or more than one consecutive blank
character space in an HTML document, it displays it as a single space. So, if
the document contains:

far, far away

the browser displays:

far, far away

Extra spaces can be added within the flow of text by using the nonbreaking
spacecharacter entity (&nbsp;). Multiple spaces are displayed, however,
when text is tagged as preformatted text (<pre>).

Multiple <p> tags


A series of paragraph tags (<p>...</p> or <p> alone) with no intervening
text is interpreted as redundant by all browsers and displays as though it
were only a single paragraph break. Most browsers display
multiple <br> tags as multiple line breaks.

Unrecognized tags
A browser simply ignores any tag it doesn't understand or that was
incorrectly specified. Depending on the tag and the browser, this can have
varied results. The browser displays nothing at all, or it may display the
contents of the tag as though it were normal text.

Text in comments
Browsers do not display text between the special <!-- and - -> elements
used to denote a comment. Here is a sample comment:

<!-- This is a comment -->

<!-- This is a

multiple line comment

that ends here. -->


There must be a space after the initial <!-- and preceding the final - ->, but
you can put nearly anything inside the comment otherwise. You cannot nest
comments. Microsoft Internet Explorer also supports its own proprietary way
of indicating comments with <comment> ... </comment> tags. Comments
are useful for leaving notes within a long HTML file, for example:

<!-- navigation table starts here -->

Potrebbero piacerti anche