Sei sulla pagina 1di 12

Pre-reading Materials

Overview

Essentials of XML

Welcome to Essentials of XML.


This class introduces you to the Extensible Markup Language (XML).
It gets into the basics of XML, DTD, XSD and also looks at DOM
and SAX.

Topics

TIBCO Software Inc.

What Is XML?, page 2

XML Enables Platform Independent Data Exchange, page 3

XML Separates the Format from the Content, page 4

Basic XML File, page 5

Anatomy of an XML File, page 6

Well-formed vs. Valid XML, page 7

Reading XML: The XML Parser, page 8

Interpreting XML, page 9

XML Schema Standards, page 10

XML Transformation (XSLT), page 11

XML Namespaces, page 12

Page 1

XML-NHBC: Essentials of XML

What is XML?

! Widely-adopted standard that enables users to


create semantic vocabulary (i.e., meaningful tags)
for describing data and its structure
Meta-language for creating mark-up
Not a programming language like
Java or C++
Not a formatting language like HTML

TIBCO Software Inc.

What Is XML?
The Extensible Markup Language (XML) is a markup language for
putting structured data into a text file. Defined by the W3C, the XML
specification provides a syntactically strict set of standards for
document structure while allowing developers, organizations, and
communities to define their own vocabularies. XML files are not
meant to be read, but being text files they can be debugged by an
expert, using a simple text editor.
XML is considered a markup language because it uses tags to
delimit pieces of data, leaving the interpretation of that data to the
application processing it. XML is a subset of a richer, and more
complex meta-language known as SGML (Standard Generalized
Markup Language). XML was developed to overcome the complexity
of SGML and the inflexibility of HTML (Hypertext Markup
Language).

Page 2

TIBCO Educational Services

Pre-reading Materials

XML Enables Platform Independent Data Exchange

! Ideal for data interchange among loosely coupled


systems

Platform Independent
Unicode-based (internationalization)
Built-in Validation (schema or DTD)
Predefined programmatic access (DOM & SAX)
Self-aware information (address is address)

! Modeling, exchange and retention of nonrelational data (the other 80%)

TIBCO Software Inc.

XML Enables Platform


Independent Data Exchange
XML provides a number of significant benefits over other
technologies.

TIBCO Software Inc.

XML is platform-independent, license-free, and well-supported.

The fact that XML separates data from its appearance, the same
data can be displayed in any number of ways, making it useful and
effective in a variety of contexts.

XML provides for the development of flexible Web-based


applications.

XML has emerged as the standard for enterprise data interchange


and application integration.

Page 3

XML-NHBC: Essentials of XML

XML Separates the Data Content from Its Format

! XML Data Description:


<invoice>
<header>
<invoiceNumber123</invoiceNumber>
<date>042900</date>
<!- -etc. - ->

! HTML Data Presentation (in a Web Browser):


<font size=3" face="Verdana, Arial, Helvetica, sans-serif">
<b>DATE:</b>
</font><font size="2" face="Verdana, Arial, Helvetica, sans-serif">
<b> 042900 </b>

TIBCO Software Inc.

XML Separates the Format from


the Content
XML focuses on describing document and data structure, whereas
HTML focuses on describing document presentation.
XML separates the parts of a document needed by an application (for
example, search engines and data mining tools) from the parts of a
document that can be ignored.
You can get away without closing a tag in HTML, but in XML, lazy
tagging is not tolerated and will cause the application to abort
processing and issue an error.

Page 4

TIBCO Educational Services

Pre-reading Materials

Basic XML File

<?xml version ="1.0"?>


<name>
<first>John</first>
<last>Doe</last>
</name>

TIBCO Software Inc.

Basic XML File


Note the following about the contents of a simple XML document:

TIBCO Software Inc.

A unique, top-level XML element called the document element or


the root element.

Other elements may be cleanly nested within the document or root


element.

Elements are delimited by start and end tags that describe the data
found in between, e.g. <first>John</first>.

Note that the element includes both the start and end tags, as well
as the data contained within.

XML tags are case-sensitive.

XML documents are hierarchical, often abstracted as a tree.

Page 5

XML-NHBC: Essentials of XML

Anatomy of an XML File

<?xml version = "1.0" encoding = "UTF-8"?>


<!This is an email XML file. -->
<?Put this file to use somehow.?>
<Email attachment = "no" importance = "normal">
<To:>Tom</To:>
<Cc:>Joel</Cc:>
<Bcc:/>
<From:>Scott</From:>
<Subject>meeting</Subject>
<Body>Let's meet Monday afternoon; what do you
think?</Body>
<Received>Thu 1/5/2001 3:13 PM</Received>
</Email>

TIBCO Software Inc.

Anatomy of an XML File


Prolog: XML declaration, document type declaration etc.

XML Declaration: Optional Ex: <?xml version =1.0?>


Elements: Bound by start /end tags, and everything in between. Ex:
<Email> , </Email>, and the Email element includes those tags plus
everything in between. Attributes :Additional info. describing the
content of that element. Within an elements opening (or empty) tag,
value enclosed in either single or double quotes: attachment =
"no"
Comments: Begin with <!-- and end with -->. Ex: <!-- This is an email
XML file. --> Processing instructions (PI): Application-specific
information. Begin with <? and end with ?>. Ex:

<?Processing Instruction: Put this file to use somehow ?>

Page 6

TIBCO Educational Services

Pre-reading Materials

Well-formed vs. Valid XML


<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE Email SYSTEM "email.dtd">
<Email attachment = "no" importance =
"normal">
<To:>Tom</To:>
<Cc:>Joel</Cc:>
<Bcc:/>
<From:>Scott</From:>
<Subject>meeting</Subject>
<Body>Let's meet Monday afternoon; what
do you think?</Body>
<Received>Thu 1/5/2001 3:13 PM</Received>
</Email>

This document has a


governing schema
which enables
validation of the
structure and data.

TIBCO Software Inc.

Well-formed vs. Valid XML


Well formed XML :

Contains Start and end tags for every element.

Contains only one document element. Empty elements are


formatted correctly.

Elements nest correctly. All attribute values are in quotes.

Valid XML: is well-formed and validates against a DTD or other


XML schema type (e.g. XSD).
The XML document shown in the slide above is well-formed, and, if
it runs without errors against email.dtd while using a validating
parser, it is also valid. In contrast, the XML document shown on the
previous slide, though well-formed, references no ruling schema. As
such, it cannot be considered valid.

TIBCO Software Inc.

Page 7

XML-NHBC: Essentials of XML

Reading XML: The XML Parser

DTD
or
Schema

DTD
or
Schema

XML
Parser

XML
Application

XML
instance
XML
instance

XML
instance

TIBCO Software Inc.

Reading XML: The XML Parser


XML parsers read XML documents and pass the information to an
application. There are two types of XML parsers:

Page 8

Validating parsers must enforce the XML 1.0 specification in its


entirety, validate your XML documents against a particular
schema, and must also ensure that the XML document is
well-formed.

Non-validating parsers are quicker, but will only check your


XML document against well-formedness constraints.

TIBCO Educational Services

Pre-reading Materials

Interpreting XML

"

Two common APIs for accessing XML


data

DOM- Document Object Model


SAX- Simple API for XML

TIBCO Software Inc.

Interpreting XML

TIBCO Software Inc.

Document Object Model (DOM) A DOM parser builds a tree


structure of an XML document in memory and holds it there,
allowing an application to write to the document and manipulate
the data therein. However, because the tree is built and retained in
memory during processing, the DOM processes tend to be
memory-intensive and time-consuming. Because of this, it is
best-suited to implementations in editors and forms.

Simple API for XML (SAX) SAX is an event-driven API


optimized for reading larger XML documents. However, a SAX
implementation does not allow you to write to or manipulate the
data in your document. A SAX processor reads the document
piece by piece, and does not build a representation of the
document in memory. The advantage of this is speed; SAX is
best-suited for application-to-application exchange.

Page 9

XML-NHBC: Essentials of XML

XML Schema Standards


! DTDs

! SOX v2.0

Part of 1.0 spec


Datatype support now
available
www.extensibility.com/dt4dt
d

! XML-Data Reduced (XDR)


Implemented in IE5
Foundation for BizTalk.org
initiative
Datatyping support

SDK support from


CommerceOne
Rich feature set (e.g.
inheritance)

! XML Schema Definition


Language (XSDL)
Recommendation 5-2-01
Eventual standard (w3c
work product)
Superset feature set

TIBCO Software Inc.

XML Schema Standards


Currently there are four primary XML schemas in use today.
DTD was the initial schema made available for use with XML. As
DTD limitations became apparent, two vendor-sponsored schema
implementations emerged: XDR and SOX.
The XSDL schema standard was released by W3C and is likely to
replace DTDs and vendor standards.
XSDL is a schema definition language under development by the
W3C Schema working group. Expressed in XML document syntax,
XSDL is designed to support an extensible data typing system,
inheritance, and namespaces

Page 10

TIBCO Educational Services

Pre-reading Materials

XML Transformation (XSLT)


CBL
Contact Definition

HR XML
Contact Definition

XML Transform

Contact
ContactRecord
Record
ininBilling
BillingSystem
System
Person
PersonName
Name
PositionTitle
PositionTitle
VoiceNumber
VoiceNumber
E-mail
E-mail
Fax
FaxNumber
Number

Contac
ContacRecord
Record
ininCRM
CRMSystem
System
Identifier
Identifier

Contact
ContactName
Name
Telephone
Telephone
Address
Address
Email
Email
Fax
Fax

TIBCO Software Inc.

XML Transformation (XSLT)


The Extensible Stylesheet Language: Transformations (XSLT) is an
XML-based language used to transform the grammar (tags) and
structure of XML documents. Common applications of XSLT
include:

Transforming XML files based on one schema to files based on


another, Extracting information from one XML document for use
in another, Transforming XML into well-formed HTML for
display in a web browser

The XML document used to define XSL transformations is called a


XSLT stylesheet. An XSLT processor takes a source XML document
and a stylesheet as input and outputs a completely different version of
the document, called a result tree, based on the rules contained within
that stylesheet.

TIBCO Software Inc.

Page 11

XML-NHBC: Essentials of XML

XML Namespaces

<pr:billingInfo
xmlns:pr = "www.company.com/NS/patientRecords"
xmlns:ir = www.company.com/NS/mortgageRecords>

<pr:billingInfo>

Mortgage Records Namespace

mr:title

Patient Records Namespace

pr:phoneGroup

mr:terms
mr:taxValue

mr:rate

mr:owner
mr:interestRate

pr:patientRecord

pr:title
pr:name

TIBCO Software Inc.

XML Namespaces
Schema is a collection of type definitions and element declarations
whose names belong to a particular namespace. Namespaces enable
us to differentiate between definitions and declarations from different
vocabularies. Using namespaces helps avoid naming conflicts and
promotes re-usability.
Components from different namespaces are differentiated via a
prefix. This enables an application to understand the definition and
context of elements and other components from many different
sources, even if some components share the same name. (Above:
Namespaces can be used to distinguish a title within the mortgage
industry from a title used as part of a persons name.)
These prefixes used are each associated with a URI (Uniform
Resource Identifier), which must be unique for each schema
definition.

Page 12

TIBCO Educational Services

Potrebbero piacerti anche