Sei sulla pagina 1di 5

What is XML?

• XML is a cross-platform, software and hardware independent tool for

transmitting information.
• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• XML was designed to describe data
• XML tags are not predefined. You must define your own tags
• XML uses a Document Type Definition (DTD) or an XML Schema to describe
the data
• XML with a DTD or XML Schema is designed to be self-descriptive
• XML is a W3C Recommendation

The Main Difference Between XML and HTML

XML was designed to carry data.

XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.

HTML is about displaying information, while XML is about describing information.

XML tags are not predefined. You must "invent" your own tags.

The tags used to mark up HTML documents and the structure of HTML documents are
predefined. The author of HTML documents can only use tags that are defined in the
HTML standard (like <p>, <h1>, etc.).

XML allows the author to define his own tags and his own document structure.

User definable tags Defined set of tags designed for
web display
Content driven Format driven
End tags required for well formed End tags not required
Quotes required around attributes Quotes not required
Slash required in empty tags Slash not required
XML Does not DO Anything

XML was not designed to DO anything.

Maybe it is a little hard to understand, but XML does not DO anything. XML was
created to structure, store and to send information.

The following example is a note to Tove from Jani, stored as XML:

<body>Don't forget me this weekend!</body>

The note has a header and a message body. It also has sender and receiver information.
But still, this XML document does not DO anything. It is just pure information wrapped
in XML tags. Someone must write a piece of software to send, receive or display it.

It defines an interface that enables programs to access and update the style, structure, and
contents of XML documents.
The Document Object Model (DOM) is an interface specification maintained by the W3C
DOM Workgroup that defines an application independent mechanism to access, parse, or
update XML data. i.e. it is a hierarchical model that allows developers to manipulate
XML documents easily
When we parse an XML document with a DOM parser, we get back a tree structure that
contains all of the elements of the document. The DOM provides a variety of functions
that can be used to examine the contents and structure of the document.

//Create instance of DocumentBuilderFactory

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
//Get the DocumentBuilder
DocumentBuilder parser = factory.newDocumentBuilder();
//Create blank DOM Document
Document doc = parser.newDocument();

//create the root element

Element root = doc.createElement("root");
//all it to the xml tree

//create a comment
Comment comment = doc.createComment("This is comment");
//add in the root element

//create child element

Element childElement = doc.createElement("Child");
//Add the atribute to the child
childElement.setAttribute("attribute1","The value of Attribute 1");

SAX provides a mechanism for reading data from an XML document.

XML Processing with SAX.

XMLReader xr = XMLReaderFactory.createXMLReader();

A SAX Parser functions as a stream parser, with an event-driven API. The user defines a
number of callback methods that will be called when events occur during parsing.
The SAX events include:

• XML Text nodes

• XML Element nodes
• XML Processing Instructions
• XML Comments

Events are fired when each of these XML features are encountered, and again when the
end of them is encountered. XML attributes are provided as part of the data passed to
element events.

SAX parsing is unidirectional; previously parsed data cannot be re-read without starting
the parsing operation again.


SAX parsers have certain benefits over DOM-style parsers. The quantity of memory that
a SAX parser must use in order to function is typically much smaller than that of a DOM
parser. DOM parsers must have the entire tree in memory before any processing can
begin, so the amount of memory used by a DOM parser depends entirely on the size of
the input data. The memory footprint of a SAX parser, by contrast, is based only on the
maximum depth of the XML file (the maximum depth of the XML tree) and the
maximum data stored in XML attributes on a single XML element. Both of these are
always smaller than the size of the parsed tree itself.
Because of the event-driven nature of SAX, processing documents can often be faster
than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of
DOM is also a performance issue.

Due to the nature of SAX, streamed reading from disk is possible. Processing XML
documents that could never fit into memory is only possible through the use of a SAX
parser (or another kind of stream XML parser).


The event-driven model of SAX is useful for XML parsing, but it does have certain

Certain kinds of XML validation requires access to the document in full. For example, a
DTD IDREF attribute requires that there be an element in the document that uses the
given string as a DTD ID attribute. To validate this in a SAX parser, one would need to
keep track of every previously encountered ID attribute and every previously encountered
IDREF attribute, to see if any matches are made. Furthermore, if an IDREF does not
match an ID, the user only discovers this after the document has been parsed; if this
linkage was important to the building functioning output, then time has been wasted in
processing the entire document only to throw it away.


xsl:template to match the appropriate XML element, xsl:value-of to select the attribute
value, and the optional xsl:apply-templates to continue processing the document.

Extract Attributes from XML Data

Example 1.

<xsl:template match="element-name">
Attribute Value:
<xsl:value-of select="@attribute"/>

SOAP (Simple Object Access Protocol) is a protocol for exchanging XML-based
messages over computer networks, normally using HTTP. SOAP forms the foundation
layer of the Web services stack, providing a basic messaging framework that more
abstract layers can build on.
SOAP uses XML to define a protocol for the exchange of information in distributed
computing environments.
SOAP consists of three components:
• an envelope,
• a set of encoding rules, and
• a convention for representing remote procedure calls.

How would you build a search engine for large volumes of XML data?

The way candidates answer this question may provide insight into their view of XML
data. For those who view XML primarily as a way to denote structure for text files, a
common answer is to build a full-text search and handle the data similarly to the way
Internet portals handle HTML pages. Others consider XML as a standard way of
transferring structured data between disparate systems. These candidates often describe
some scheme of importing XML into a relational or object database and relying on the
database's engine for searching. Lastly, candidates that have worked with vendors
specializing in this area often say that the best way the handle this situation is to use a
third party software package optimized for XML data.