Sei sulla pagina 1di 20

Chapter 6

QUERYING XML DOCUMENTS WITH XQUERY


In this chapter, you will learn
How xml generalizes relational databases

The XQuery language

How XML may be supported in relational

databases
Querying XML Documents
An ambitious application of XML documents aims at
generalizing the traditional relational model of databases.
The actual generalization is fairly straightforward, as we
shall see.
The main challenge lies extending the query language
correspondingly, thus inventing an XML analog of the
popular SQL query language.
There are several reasons why a merger of XML and
databases is attractive. For XML developers, the creation of
XML language is akin to data modeling in databases, only
with fewer constraints on the data format.
There is an immediate desire to extract information directly
from these richer models.
Querying XML Documents
In the database community, there has been a long quest
for a richer data model to replace the plain relational
tables, there has been a long reached.
Some of the earlier suggestions have been hierarchical
databases, object-oriented databases and multi-
dimensional databases that all seek in different ways to
extend the relational data model.
XML us tempting alternative, since we have previously seen
that XML may describe both data-oriented and document-
oriented information. Thus, it directly offers a richer data
model that includes many aspects of the previous
proposals; however, XML comes with a ready-made
consensus.
Querying XML Documents
Another strategic advantage for the database community is
that if XML is the frame of the World Wide Web and XML
becomes the future of databases, then in the World Wide
Web will simply become one gigantic database.

From Relations to Trees


XML queries are easily motivated by observing that XML
documents generalize relational tables in a surprisingly
obvious manner.
A few pictures should be sufficient to get the point across.
A table could typically look as follows.
Querying XML Documents
Here, we have a table with a four records, each containing
three fields, corresponding to first names, last names and
ages of some people. Relational databases tables are often
like this, but we may, in fact, just as well draw them as
trees:
Querying XML Documents
In this version, the table itself corresponds to the root of the
tree, the records correspond to the nodes in the layer just
below, and the fields are present as the above of the tree.
Clearly, this representation is an embedding of the table
view; we can translate a table into a tree and back again
without loss of information.
The potential generalization if XML documents is now
immediate: relational tables corresponds to some trees, while
XML documents correspond to all trees. In face, we can quite
precisely characterize those trees that correspond to tables:
They have height two
The root has an unbounded number of child nodes
All nodes in the second layer
Querying XML Documents
Note that the reverse embedding is not immediately
possible for several reasons:
Not all trees satisfy the above characterization; and
Trees are ordered, while both rows and columns of tables
may be permuted without changing the meaning of the
data.
Why then, do we want to use general trees? As we are now
familiar with XML technology. The answer may seem
obvious, but from a database point of view, we are about to
perform an interesting generalization.
Consider again the example of student records, where we
store information about the student id, name, age, major
and exam result of students.
Querying XML Documents
The first four data fit well into relational model, since they
just become four fields in a record. However, the number
of exam results is not fixed. Which requires a less intuitive
structure. Things are further complicated by the possibility
of double majors. A typical relational representation of
such data would involve a number of tables:
Querying XML Documents
Usage Scenarios
It is quite to envision usage scenarios for a hypothetical
query language on general XML documents. These relate
to the rough classification of XML language that we
discussed in 2.5
For data-oriented languages, like the above XML
representation of student records, we wish to carry over
the kinds of queries that we performed in the original
relational model.
We need to be able to transform data into new XML
representations, and to integrate data from multiple
heterogeneous data sources.
Querying XML Documents
For document oriented languages, queries could be used
to retrieve parts of documents, to provide dynamic
indexes. To perform context-sensitive searching and to
generate new documents as combination of existing
documents.
In fact, these tasks have long been studied in the area
called information retrieval, dating back to the 1960s. The
need for intelligent manipulation of vast amounts of
ordinary documents is made clear the success of search
engines, which presently do not employ anything as
advanced as a full query language, but basically just uses
string matching.
Querying XML Documents
For protocols and programming languages, examples are less
natural, but queries could be used to automatically generate
documentation, similarly to the Javadoc tool.
For hybrid languages, the situation is even more intriguing,
these are documents of which parts are data-oriented and other
parts are human-readable and less structured.
The archetypical example is a hospital record for a patient. It
contains some highly structures and such. Other data is much
less structures, like notes from doctors and nurses or
background interviews. An interesting query might span both of
these: find those patients who experiences a sudden rise in
blood pressure following a certain post-operative medication and
where the GP has previously noted that the patient probably
drinks too much.
Querying XML Documents
Once the idea of performing XML queries has been
suggested, ideas from possible applications are easy to
come by.
The Xquery Design
The Xquery language has been designed with very specific
goals in mind. The working group identified several
technical requirements, of which the most important were
that the language
Must be able to transform and create XML trees and make it
possible to combine information from multiple documents.
Must be declarative
Must be namespace aware:
Must be coordinated with XML schema and support simple
and complex data types; and
Must be at least one XML syntax and at least one human-
readable syntax.
Querying XML Documents
Underlying most of these requirements is a desire to maintain
the flavor of SQL and generalize its expressive power to XML
documents.
Note that Xquery, unlike XSLT, will not only have an XML
syntax, in fact, Xquery us supposed to be programmed in a
syntax that of SQL.
This choice is partly a practical consideration, since XML syntax
is rather verbose and can be hard to read, and to a large
extent a tactical consideration, since the transition form SQL to
Xquery will then to vastly simple for database programmers.
A similar choice has been made for the schema language
RELAX NG, which also offers a non-XML syntax as an
alternative.
Querying XML Documents
Also note that Xquery is tied to the XML schema language
other schemas languages are not considered here.
The development of Xquery followed a number of independent
research projects that each made attempts to solve the
fundamental problem of generalizing SQL to arbitrary XML trees.
The most important example were XML-QL, YATL, Loral and
Quit, whose proponents are strongly represented in the XQuery
working group. While these languages turned out very different
and looked nothing alike with respect to syntax, it becomes
apparent that they describe very similar computational models.
This situations seems to indicate that the design of a
generalized query language could be more canonical than
perhaps previously suspected.
Querying XML Documents
Relationship to XPath
All the early prototypes of query languages contained
mechanisms for pointing at sets or sequences of nodes in
XML trees. This observation was influential in identifying
the need for the XPath language, furthermore, it has been
the driving force behind the design of XPath 2.0.
The XQuery 1.0 language is designed to be a part to be a
strict of the XPath 2.0 language. That is, every XPath 2.0
expression is directly an XQuery 1.0 expression.
We have already seen that many XPath feels like a
Queries, in that extract fairly complicated information from
XML documents.
Querying XML Documents
In fact, the only thing that XQuery need beyond the
expressive power of XPath is the ability to join information
from different sources and to generate new XML fragments.
XQuery introduces user defined functions and thus permits
arbitrary computations.

Relationship to XSLT
XQuery and XSLT seem to share many ambitions, as they
both are domain-specific languages for combining and
transforming XML data from multiple sources. They are vastly
different in design. Partly fro historical reasons. While Xquery
is designed from scratch with inspiration from SQL.
Querying XML Documents
XSLT is an intellectual descendant of CSS and has developed
for a long time. Technically, they actually have different
fortes. XSLT is exceedingly good for defining complicated
recursive traversals and transformation to arbitrary depths
of XML documents.
Where XQuery must use explicit recursion of user-defined
functions. Conversely, Xquery has more the flovor of
database programming language and allows simple solutions
for simple problems, where XSLT may be more verbose.
It is around that XSLT is soon deprecated and sholud be
replaced by Xquery. Apart from the huge amount of legacy
code to consider, XSLY is, however, still the simplest choice
for many applications.
Querying XML Documents
Alsp, XSLT already has several very efficient
implementations, while the general design of XQuery
seems to pose more of a challenge. Thus, both languages
will most likely be around for a while yet.
When raw expressive power is considered, XQuery and
XSLT are surprisingly evenly matched, as shown in section
6.6. however, while the languages may emulate each
other. They each display a lack of elegance when doing so
emphasizing that domain specific syntax is a valuable
asset.