Sei sulla pagina 1di 20

XML and Related Technologies certification prep,

Part 1: Architecture
Learn where and when to use XML in system design

Skill Level: Intermediate

Mark Lorenz (mlorenz@nc.rr.com)


Senior Application Architect
Hatteras Software, Inc.

29 Aug 2006

A software system's architecture and performance requirements affect your decision of


which XML technologies are most appropriate for your application's needs. This tutorial
on architecture teaches you how to discern where and when to use XML in system
design. It is the first tutorial in a series of five tutorials that you can use to help prepare
for the IBM certification Test 142, XML and Related Technologies.

Section 1. Before you start


In this section, you'll find out what to expect from this tutorial and how to get the
most out of it.

About this series


This series of five tutorials helps you prepare to take the IBM certification Test 142,
XML and Related Technologies, to attain the IBM Certified Solution Developer - XML
and Related Technologies certification. This certification identifies an
intermediate-level developer who designs and implements applications that make
use of XML and related technologies, such as XML Schema, Extensible Stylesheet
Language Transformation (XSLT), and XPath. This developer has a strong
understanding of XML fundamentals; has knowledge of XML concepts and related

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 1 of 20
developerWorks® ibm.com/developerWorks

technologies; understands how data relates to XML, in particular with issues


associated with information modeling, XML processing, XML rendering, and Web
services; has a thorough knowledge of core XML-related World Wide Web
Consortium (W3C) recommendations; and is familiar with well-known, best
practices.

Anyone working in software development for the last few years is aware that XML
provides cross-platform capabilities for data, just as the Java™ programming
language does for application logic. This series of tutorials is for anyone who wants
to go beyond the basics of using XML technologies.

About this tutorial


This tutorial is the first in the "XML and Related Technologies certification prep"
series that takes you through the key aspects of effectively using XML technologies
with Java projects. This first tutorial focuses on architecture -- that is, which
technologies to use in which situations in ways that will perform well.

This tutorial lays the groundwork for Part 2, which focuses on information modeling,
including the use of namespaces and the definition of Document Type Definition
(DTD) schemas.

This tutorial is written for Java programmers who have a basic understanding of
XML and whose skills and experience are at a beginning to intermediate level. You
should have a general familiarity with defining, validating, and reading XML
documents and a working knowledge of the Java language.

Objectives
After completing this tutorial, you will know how to:

• Determine the implications of a given architecture on XML design


considerations

• Select appropriate XML technologies for a given architecture

• Assess performance considerations for XML parsing, validation, and


transformation

• Implement Java classes using Java Architecture for XML Binding (JAXB)

Architecture
Page 2 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

• Address XML security using XML encryption and signatures

Prerequisites
This tutorial is written for developers who have a background in programming and
scripting and who have an understanding of basic computer-science models and
data structures. You should be familiar with the following XML-related,
computer-science concepts: tree traversal, recursion, and reuse of data. You should
be familiar with Internet standards and concepts, such as Web browser,
client-server, documenting, formatting, e-commerce, and Web applications.
Experience designing and implementing Java-based computer applications and
working with relational databases is also recommended.

System requirements
You need a system with an up-to-date browser.

Section 2. XML Architecture


This section of the tutorial will discuss the most effective uses of XML technologies
given the particular aspects of your system architecture. By the end of this section,
you will:

• Identify areas of your system as they relate to the use of XML

• Choose optimal XML technologies for different portions of your system,


taking into account performance and security

• Understand how to bind XML to Java

Uses of XML abound, such as Asynchronous JavaScript and XML (Ajax) for
dynamic Web pages, and Rich Site Summary (RSS) for blogs and feeds. The future
will bring even more. This series focuses on the core technologies, including Simple
API for XML (SAX), Document Object Model (DOM), DTD, XML Schema, XPath,

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 3 of 20
developerWorks® ibm.com/developerWorks

XLink, and XQuery.

You can find an alphabet soup of acronyms out there. Just read technical articles
and you'll find XDI, RDF, REST, SVG, XUL, and much more. That's to be expected,
as XML is not just a hot topic, it's the über-hot topic. Why all the hype? The main
reason is that XML offers cross-platform, cross-language capabilities for data, just as
Java offers cross-platform support for application logic. Take a look at some uses of
XML that have hit the world market recently:

• Feeds (RSS and Atom)

• Dynamic Web (Ajax)

• Blogs (Representational State Transfer, or REST)

• Service-Oriented Architecture (SOA) and Web services

These uses are depicted in Figure 1 and Figure 2, which show how you can
integrate XML technologies in an application architecture for e-business and the
dynamic Web, respectively.

Figure 1. e-commerce using XML technologies

Figure 2. Dynamic Web using XML technologies

Architecture
Page 4 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

To benefit from any of these uses, you need grounding in XML technologies, which
is what this tutorial series provides.

What is an architecture and how does it relate to XML?


"An architecture is a framework for the disciplined introduction of
change." -- Tom DeMarco

If you've ever received a support call late at night for a system with a
less-than-optimal architecture, you know how important it is to make wise choices in
the technologies you use. Architecture comes in different aspects, including physical
and logical. Figure 3 shows an example of a physical architecture.

Figure 3. Example of a physical architecture

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 5 of 20
developerWorks® ibm.com/developerWorks

"The fundamental organization of a system, embodied in its


components, their relationships to each other and the environment,
and the principles governing its design and evolution." -- ANSI/IEEE
1471-2000, Recommended Practice for Architecture Description of
Software-Intensive Systems

Many definitions of system architecture abound. For the purposes of this tutorial,
let's view software system architecture as:

• Building on top of an existing structure, where available (such as


extending a framework and reusing common components)

• Distributing across processes and processors as appropriate for the


requirements, with published interfaces to each piece of the system

A particular technology can help certain areas in an architecture and not help others.
In the example system from Figure 3, XML could play a role in multiple areas:

• Browser
You can render Web pages using XML content and related XSL

Architecture
Page 6 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

stylesheets. XSLT supports this capability as well as conversion to many


different formats.

• Client request
An XMLHttpRequest is at the heart of Ajax.

• Server reply
When an XMLHttpRequest comes back to you, the response contents
can be in XML. But even if they aren't, the browser will use the DOM to
manipulate the Web page. As you'll see in Part 3 of this tutorial series, the
DOM is built from XML.

• Web services
SOAP is an XML-based protocol for exchanging information through
HTTP (in other words, over the Web). Its primary use is to request Web
services remotely. It is a successor to XML Remote Procedure Call
(RPC).

• Java Message Service (JMS)


JMS is for sending messages between processes asynchronously.
Connectivity and latency issues are bypassed with guaranteed delivery.
XML content of the messages provides a lingua franca, so that all parties
can understand, no matter what language they use or what platform they
run on.

• Reporting
Besides rendering for Web browsers, PDAs, and other devices, you can
render XML for reports. In addition to being useful for rendering Web
page content, you can also use XSLT to render reports in multiple
formats.

• Database
This isn't your dad's database anymore. Not wanting to be left out of the
XML opportunities, both IBM® and Oracle have come out with native XML
databases that store XML document structures and support XQuery. The
third installment of this series will cover this in more detail, but for now
keep in mind that XML is plain-text at heart, so you can store it in flat files
and databases even if you don't have an XML-aware database.

BIRT

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 7 of 20
developerWorks® ibm.com/developerWorks

Business Intelligence and Reporting Tools (BIRT) is an open source


Eclipse-based framework written in Java that supports the design of
reports with output to HTML and PDF. The report designs are
stored on disk as XML .rptdesign files (see Resources).

This is just one example architecture. Kevin Dick's book, XML: A Managers Guide
(p. 216; see Resources), lists five different enterprise applications that receive
significant benefits from the use of XML:

1. Workforce automation

2. Knowledge management

3. Trading partner coordination

4. Application integration

5. Data integration

The point is that XML can be used in many different domains, including yours.

OK, now that you have some ideas of where XML can play a part, how do you
choose which technologies and which locations in your system to actually use it? I'll
touch on a number of considerations in this part of the tutorial, so read on.

Using XML with an existing application


One of XML's strengths is its ability to be understood by disparate systems. If you
have an existing application, whether it's written in C and running on a Linux®
machine or in Java code running on a Microsoft® Windows® machine, you can
integrate the legacy application into other parts of the system through XML-based
communication.

In addition, some products and frameworks use XML for configuration files. For
example, struts uses a struts-config.xml file to define how the controlling servlet
should work; Web applications use web.xml files to define how to deploy the
application for running on a server. More peripheral uses of XML are appearing all
the time. Your applications can certainly make good use of these capabilities.

Architecture
Page 8 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

I'll focus instead on the more core, integrated use of XML technologies with your
applications. Table 1 lists some characteristics of applications, and gives advice on
when XML technologies can play a role.

Table 1. Advice on XML use


Characteristic Discussion Advice

Output targets and formats The more types of output, the Use XML when multiple output
(PDA, browser, iPod, PDF) more benefit from XML formats are required.
transformation.

Content size The larger the content, the more Use XML when messaging and
performance hurdles you'll have processing efficiency is less
to overcome using XML. This important than interoperability
leads to consideration of and availability of standard
alternatives, such as tools.
compression, or another format
entirely, such as Abstract
Syntax Notation One (ASN.1),
which loses the human
readability benefit.

Interoperability XML's greatest strength is Use XML when you must


arguably its cross-language, communicate with diverse
cross-platform format that systems.
diverse systems can
understand.

Searching XML supports relatively simple Don't use XML documents when
queries through XPath and searching is important. Instead,
more complex queries with the store the content in a database
more recent XQuery. While or use an XML-aware database.
maturing, XML technologies
have been relatively weaker at
searching. It is yet to be seen if
XML-aware databases can help
with this, since they store the
XML in a tree structure. See
XML-aware databases.

Summarizing XML technologies are weak at Don't use XML documents when
summarizing data -- for summarization is important.
example, for reports. See Instead, store the content in a
XML-aware databases. database or use an XML-aware
database.

Project size To use XML, you need a parser For small projects with simple

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 9 of 20
developerWorks® ibm.com/developerWorks

and code to deal with the XML requirements, you might not
events or tree. want to incur the overhead of
XML.

XML-aware databases
Database vendors want to support projects using XML technologies, but relational
databases don't make it easy to store and retrieve XML files. IBM has introduced a
new DB2® version formerly known as Viper, which supports XML data storage and
indexing in a native format (in other words, it doesn't pull the XML apart to fit a
relational model). Databases that store XML support XQuery, which is the XML
equivalent of SQL.

XML plain-text alternatives


More efficient alternatives to plain-text XML are being examined,
including binary XML and XML compression (see Resources).

So, what do these new database capabilities mean for your projects? The main thing
is that you can achieve the typical strengths of databases, such as searching and
summarizing, with XML data in its native form.

Performance
In this section of the tutorial, I'll discuss some of the issues that can affect
performance when using XML technologies.

Choosing an appropriate processing model

As outlined in the book, Designing Web Services with the J2EE™ 1.4 Platform:
JAX-RPC, SOAP, and XML Technologies (see Resources), you can choose from
one of four main XML processing models, available through the following APIs:

1. SAX: Provides an event-based programming model

2. DOM: Provides an in-memory tree-traversal programming model

3. XML data binding: Provides an in-memory Java content class-bound


programming model

4. XSLT: Provides a template-based programming model

Architecture
Page 10 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

SAX and DOM comprise the most common programming models. Along with XSLT,
these two models are available through Java API for XML Processing (JAXP). The
XML data binding model is available through the JAXB technology.

All of these choices will be discussed later in this series of tutorials, but let's examine
the implications of the processing model on performance. Table 2 compares some
attributes of the SAX parser to the DOM parser.

Table 2. Parsers: SAX versus DOM


SAX DOM

Event-driven Tree manipulation

Scales to large sizes with little change in memory Larger documents take more memory
use

Must write to new document to change the Can manipulate the document in memory
contents

More difficult to manage complex changes Easier to make complex changes

In general, faster Comparatively slower

More control over parsing, but can be more work Generally, less work for you
for you

The system requirements, as in most things, usually determine which parser to use.
Some examples include:

• Merging documents
This certainly requires working with a DOM tree. It hurts my head to think
about doing this tag-by-tag using SAX.

• Small devices
If memory is a premium, SAX uses very little. DOM must build a tree
structure of the entire document.

• Looking for certain tags


If a certain event is to happen whenever a certain tag occurs, SAX will

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 11 of 20
developerWorks® ibm.com/developerWorks

work nicely.

• Complex manipulation
If changes to different parts of the document are required based upon
values from other portions of the document, then it will most likely be
easier to use the DOM parser.

Finally, you can also use the two parsers in tandem. For example, you can parse a
number of small documents with the SAX parser to pull out information that you
need to merge into an existing document, and modify the document using the DOM
parser and tree manipulation.

StAX
A new API called Streaming API for XML (StAX) is to be released in
late 2006. It's a pull API, as opposed to SAX's push model, so it
keeps control with the application rather than the parser. StAX can
also modify the document being parsed. See Resources for more
details.

Caching stylesheets

If you use XSLT to convert XML documents into different formats, you can cache the
compiled thread-safe stylesheet Templates in memory, and reuse them for
individual users to create their own Transformers (see Figure 4). This results in a
smaller footprint for your application, and it saves the time for parsing and compiling
the stylesheets.

Figure 4. Caching XSLT stylesheets

Using namespaces

As you might know already, namespaces are used to declare names in your

Architecture
Page 12 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

documents independent of names declared elsewhere. This can become an issue


when stylesheets and other documents are incorporated through statements such as
include or import. It can also be an issue if you merge multiple documents, each
with their own grammar. If you use a colon in an element or attribute name, you can
distinguish between the namespace prefix (to the left of the colon) and the name
within the context of the namespace (in other words, local to the namespace). For
example, xmlns:prefix=URI would allow you to use names like this:
prefix:myname.

An upcoming tutorial in this series will discuss namespaces at length. At this time,
though, I'll mention how namespaces affect performance. As you saw earlier, SAX is
an event-based parser. When the parser encounters a namespace declaration, it
sends the application a startPrefixMapping call and an endPrefixMapping
call. These callbacks slow down your application processing. The point is not to
avoid namespaces altogether -- in fact, you probably can't -- but rather to use them
sparingly if you think performance will be an issue.

Binding to Java classes

As you know, XML documents contain tags and other content in a plain-text format.
This incurs a performance hit. What if you could speed this up? I'll discuss two ways:
JAXB and XSLT Compiler (XSLTC).

JAXB

JAXB takes XML documents and creates a semantic tree of Java objects that
represents the document contents (see Figure 5). You can then manipulate these
objects according to the rules defined in the related XML schema, which you
previously compiled and used to create a JAXB binding framework. You can also
use this framework to marshal the tree into a resulting XML document.

Besides being faster to process documents, JAXB enables you to manipulate XML
through Java objects. JAXB also makes it easy to keep up with schema changes.

Figure 5. JAXB

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 13 of 20
developerWorks® ibm.com/developerWorks

Note: JAXB does not support the use of DTDs -- you must use XML Schema as
your schema language.
Schemas
Technically speaking, DTDs, XML Schemas (capital S), and RELAX
NG are all types of XML schema (little s). XML Schemas (capital S)
are strictly called W3C XML Schemas. In this tutorial, whenever you
see XML Schema, realize that it is the W3C language and not the
generic schema document description.

XSLT Compiler

You know what XSL Transformation is. XSLTC adds a compiled aspect to the mix.
XLSTC is composed of two parts (see Figure 6). The first part is a compiler that
creates a translet, which is a set of Java classes, from an XSL stylesheet. The
second part is a processor that applies the translet to an XML instance document to
transform it to the desired output format. This allows you to parse the stylesheet
once and reuse it later, and thus speed up processing.

Figure 6. XSLTC

Architecture
Page 14 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

Security
Applications must feature end-to-end data security when they communicate over the
Internet. No one whose computer is hit with a virus or whose site is hacked into will
question the importance of securing a company's information.

So, what is available to secure communications involving XML? At its heart, sending
XML document contents over the Internet securely involves both XML encryption
and XML digital signature.

XML encryption involves converting the content into an unintelligible form to enforce
confidentiality. Of course, the intended recipient must be able to convert it back to its
original form. XML encryption has some unique capabilities too, such as being able
to encrypt certain elements or element contents. This is useful, for example, when
conducting sales transactions between a customer, a vendor, and the customer's
bank, where different parties need to read certain portions of the document contents

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 15 of 20
developerWorks® ibm.com/developerWorks

but should not read other portions.

XML digital signature handles the integrity part of XML security (in other words, it
determines if content was changed in any way). Like its encryption peer, XML digital
signature allows more granularity -- in other words, you can sign portions of
documents.

Issues related to XML digital signatures, such as keeping the order of attributes
during document manipulation, ensure the document can be verified on the receiving
end of a communication. This is beyond the scope of this tutorial, but you can read
more about it on the JavaWorld Web site (see Resources).

Section 3. Conclusion
XML technologies have numerous uses in the marketplace. The key to their
successful integration into an application architecture is to recognize where to use
them to leverage their strengths. Knowledge of the core XML technologies as well as
an understanding of architectural choices are key to the successful introduction of
XML into your projects.

Summary
In this tutorial on Architecture, you learned how to:

• Determine the implications of a given architecture on XML design


considerations

• Select appropriate XML technologies for a given architecture

• Assess performance considerations for XML parsing, validation, and


transformation

• Implement Java classes using JAXB

• Address XML security using XML encryption and signatures

Architecture
Page 16 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

Part 2 of this five-part series focuses on information modeling, including the use of
namespaces and the definition of DTDs and schemas.

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 17 of 20
developerWorks® ibm.com/developerWorks

Resources
Learn
• XML: A Manager's Guide, Second Edition (Kevin Dick, Addison-Wesley
Professional, 2002): Read about uses of XML technologies in enterprise
applications.
• The BIRT home page: Learn more about Business Intelligence and Reporting
Tools (BIRT).
• An Introduction to StAX (Elliotte Rusty Harold, O'Reilly Media, September 17,
2003): Read more about Streaming API for XML (StAX) in this article.
• Java and XSLT (Eric M. Burke, O'Reilly Media, September 2001): See Chapter 5
for an implementation of a stylesheet cache.
• On Systems Architecture (Tom DeMarco, proceedings of the 1995 Monterey
Workshop on Specification-Based Software Architectures, US Naval
Postgraduate School, Monterey, California, September, 1995): Look at the
technological, economic, political and sociological influences on architecture.
• Yes, you can secure your Web services documents, Part 1: XML Encryption
keeps your XML documents safe and secure by Ray Djajadinata (JavaWorld,
August 23, 2002): Learn about XML encryption -- what the technology is, why you
want to understand it, and how to implement it.
• Yes, you can secure your Web services documents, Part 2: XML Signature
ensures your XML documents' integrity (Ray Djajadinata, JavaWorld, October 11,
2002): Learn about the XML Signature standard, and how to write XML Signature
code.
• XML in a Nutshell, Third Edition (Elliotte Rusty Harold and W. Scott Means,
O'Reilly Media, 2004): Learn about parsing with SAX and DOM as well as
validating using DTDs and XML Schemas.
• Designing Web Services with the J2EE 1.4 Platform: JAX-RPC, SOAP, and XML
Technologies (Inderjeet Singh, Sean Brydon, Greg Murray, Vijay Ramachandran,
Thierry Violleau, and Beth Stearns, Addison-Wesley, 2004), In this free online
book, read about XML technologies and Web services, including advice on
application design.
• Tip: Compress XML files for efficient transmission (Uche Ogbuji, developerWorks,
April 2004): Examine working with binary XML files and compression that
prepares XML for transmission over Web services.
• Managing XML data: Native XML databases (Elliotte Rusty Harold,

Architecture
Page 18 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.
ibm.com/developerWorks developerWorks®

developerWorks, June 2005): Read about using XML-aware databases in this


article.
• Binary XML proponents stir the waters (Michael S. Mimoso, November 2004):
Explore binary options for storing and processing XML files.
• IBM XML 1.1 certification: Become an IBM Certified Developer in XML 1.1 and
related technologies.
• XML: See developerWorks XML Zone for a wide range of technical articles and
tips, tutorials, standards, and IBM Redbooks.
• developerWorks technical events and webcasts: Stay current with technology in
these sessions.
Get products and technologies
• IBM® WebSphere® Application Server Version 6.1: Download a free trial version
of this Java 2 Enterprise Edition (J2EE) and Web services technology-based
application platform.
• IBM trial software: Build your next development project with software, available
for download directly from developerWorks.
Discuss
• XML zone discussion forums: Participate in any of several XML-centered forums.
• developerWorks blogs: Get involved in the developerWorks community.

About the author


Mark Lorenz
Mark Lorenz is the founder of Hatteras Software, an object-oriented consulting firm,
and the author of multiple books on software development. He is certified in
object-oriented analysis and design (OOAD), XML, RAD, and Java. He uses XHTML,
Web services, Ajax, JSF, Spring, BIRT, and related Eclipse-based tools to develop
Java enterprise applications. You can read Mark's blog on technology.

Trademarks
IBM, DB2, Lotus, Rational, Tivoli, and WebSphere are trademarks of IBM Corporation
in the United States, other countries, or both.

Architecture
© Copyright IBM Corporation 1994, 2006. All rights reserved. Page 19 of 20
developerWorks® ibm.com/developerWorks

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the
United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both.

Architecture
Page 20 of 20 © Copyright IBM Corporation 1994, 2006. All rights reserved.

Potrebbero piacerti anche