Sei sulla pagina 1di 10

Recap from last week

 Information about a book:


– Title, Author, Chapters
 Decide on a way to structure the information, in
Lecture 2 Week 2 this case:
<?xml version="1.0"?>
<book>
Creating a Valid XML Document Using <title>Web Applications</title>
<author>John Doe</author>
Document Type Definitions (DTDs) <chapters>
<chapter>Introduction</chapter>
<chapter>ASP</chapter>
<chapter>XML</chapter>
</chapters>
</book>

DWAX 2010.1 1 DWAX 2010.1 2

Parsers, Well-formed and Valid XML DTDs


Documents
 Uses Extended Backus-Naur Form (EBNF) grammar
 Parsers  Used to define the structure of an XML document:
– Nonvalidating – Ensure all required elements are present in the document
• Checks if document is well formed (contains no syntax errors – Prevent undefined elements from being used
and conforms to the XML specifications) – Enforce a specific data structure
– Specify the use of attributes and define their possible values
– Validating
– Define default values for attributes
• Checks if document is well formed
– Describe how the parser should access non-XML or non-textual
• Checks if XML document is Valid (conforms to the rules set content
out in it’s DTD or Schema)
– By definition, a valid document is also well formed

DWAX 2010.1 3 DWAX 2010.1 4

1
Document Type Declaration
Declaring a DTD
 Document Type Declaration
– Introduce DTDs into XML documents
– Placed in XML document’s prolog
 A document type definition is a collection of – Begins with ‘<!DOCTYPE’ and ends with ‘>’
rules or declarations that define the content and – Often referred to as the DOCTYPE declaration
structure of the document. – Can point to
• External subsets
 A document type declaration attaches those – Declarations outside document
rules to the document’s content. – Exist in different file
– typically ending with .dtd extension
• Internal subsets
– Declarations inside document
– Visible only within document in which it resides

DWAX 2010.1 5 DWAX 2010.1 6

Declaring an internal DTD Adding a Document Type Declaration


(Note: no definitions added yet, this is not valid)

 The DOCTYPE declaration for an internal subset is: <?xml version="1.0"?>


<!DOCTYPE root <!DOCTYPE book
[ [
declarations
]>
]>
<book>
<title>Web Applications</title>
 Where root is the name of the document’s root element, <author>John Doe</author>
and declarations are the statements that comprise the <chapters>
DTD. <chapter>Introduction</chapter>
<chapter>ASP</chapter>
<chapter>XML</chapter>
</chapters>
</book>

DWAX 2010.1 7 DWAX 2010.1 8

2
Starting the Document Type Definition:
Declaring Document Elements Declaring Document Elements
 The element name is case sensitive.
 Every element used in the document must be declared in the DTD  DTDs define five different types of element content:
for the document to be valid. – Any elements. No restrictions on the element’s content.
• <!ELEMENT element ANY>
 An element type declaration specifies the name of the element
– Empty elements. The element cannot store any content.
and indicates what kind of content the element can contain.
• <!ELEMENT element EMPTY>
– Character data. The element can only contain a text string.
 The element declaration syntax is: • <!ELEMENT element (#PCDATA)>
• The keyword #PCDATA stands for “parsed-character data” and is any well-
<!ELEMENT element content-model> formed text string.
– Elements. The element can only contain child elements.
• <!ELEMENT element (child elements)>
 Where element is the element name and content-model specifies
what type of content the element contains. – Mixed. The element can contain both character data and child
elements.
• <!ELEMENT element (#PCDATA|child1|child2|…)*>
• The parent element can contain character data or any number of the
specified child elements, or it can contain no content at all.

DWAX 2010.1 9 DWAX 2010.1 10

Sequences
Types of Element Content
 Sequences
– Specify order in which elements occur
 The declaration – Comma (,) is used as delimiter
<!ELEMENT element (child1, child2, …)>
<!ELEMENT customer (phone)>
<!ELEMENT phone (#PCDATA)> <classroom>
indicates the Customer element can only have one child, named <teacher>1</teacher>
Phone. You cannot repeat the same child element more than once <student>20</student>
with this declaration. The phone element can only contain character </classroom>
data.
 Defining the classroom element in a DTD:
 Valid XML markup could be:
<customer> <!ELEMENT classroom (teacher, student)>
<phone>02 3333 4444</phone>
</customer>  The order of the child elements in the XML file must match the
order defined in the element declaration

DWAX 2010.1 11 DWAX 2010.1 12

3
Sequences Occurrence Indicators
(Modifying Symbols)

<!ELEMENT chapters (chapter, chapter, chapter)>  Occurrence indicators


– Specify element’s frequency
 The above element declaration indicates that the – Plus sign (+) indicates minimum one occurrence of
element chapters must contain exactly three the element
child elements named chapter. <!ELEMENT album ( song+ )>
– Asterisk (*) indicates zero or more (optional) element
<!ELEMENT library ( book* )>
– Question mark (?) indicates zero or one occurence of
element
<!ELEMENT seat ( person? )>

DWAX 2010.1 13 DWAX 2010.1 14

Pipe Characters (Choice) DTD – Internal subset


<?xml version="1.0"?>
<!DOCTYPE book
[
 Pipe characters (|) <!ELEMENT book(title,author,chapters)>
– Specify choices <!ELEMENT title(#PCDATA)>
<!ELEMENT author(#PCDATA)>
– Presents a set of possible child elements <!ELEMENT chapters(chapter+)>
<!ELEMENT chapter(#PCDATA)>
– Syntax: ]>
<!ELEMENT element ( child|child )> <book>
<title>Web Applications</title>
– Example: <author>John Doe</author>
<!ELEMENT dessert ( iceCream|pastry )> <chapters>
<chapter>Introduction</chapter>
<chapter>ASP</chapter>
<chapter>XML</chapter>
</chapters>
</book>

DWAX 2010.1 15 DWAX 2010.1 16

4
Declaring an external DTD DTD – External Subset
Book.xml:
 The real power of XML comes from an external DTD that can be shared <?xml version="1.0"?>
among many documents written by different authors.
<!DOCTYPE book SYSTEM “book.dtd”>
 Each XML document can only be linked to one external DTD <book>
<title>Web Applications</title>
 The DOCTYPE declaration for an external subset is: <author>John Doe</author>
<chapters>
<!DOCTYPE root SYSTEM “URL”> Or <chapter>Introduction</chapter>
<chapter>ASP</chapter>
<!DOCTYPE root SYSTEM “URL” <chapter>XML</chapter>
[ </chapters>
declarations </book>
]> Book.dtd:
 Where root is the name of the document’s root element, URL is the location <!ELEMENT book(title,author,chapters)>
and name of the external dtd file, and declarations are the statements that <!ELEMENT title(#PCDATA)>
comprise the DTD. <!ELEMENT author(#PCDATA)>
<!ELEMENT chapters(chapter+)>
<!ELEMENT chapter(#PCDATA)>

DWAX 2010.1 17 DWAX 2010.1 18

Combining an External and Internal DTD


Internal/External subset precedence Subset
This figure shows how to combine an external and an internal DTD subset
 If a document contains both an internal and an
external subset, the internal subset takes
precedence over the external subset if there is a
conflict between the two.

 This way, the external subset would define basic


rules for all the documents, and the internal
subset would define those rules specific to each
document.

DWAX 2010.1 19 DWAX 2010.1 20

5
Attribute Declarations Declaring Element Attributes
 For a document to be valid, all the attributes associated  The syntax to declare a list of attributes is:
with elements must also be declared.
<!ATTLIST element attribute1 type1 default1
– You must add an attribute-list declaration to the document’s
attribute2 type2 default2
DTD.
attribute3 type3 default3…>
 Attribute Declaration:
– Specifies all the attributes an element has – element is the name of the element associated with
the attributes,
– Uses ATTLIST attribute list declaration
– attribute is the name of an attribute,
• Lists the names of all attributes associated with a specific element
• Specifies the data type of the attribute
– type is the attribute’s data type, and
• Indicates whether the attribute is required or optional – default indicates whether the attribute is required or
• Provides a default value for the attribute, if necessary
implied, and whether it has a fixed or default value.

DWAX 2010.1 21 DWAX 2010.1 22

Adding Attributes Attribute Types


Book.xml:
<?xml version="1.0"?>  Attribute types
<!DOCTYPE book SYSTEM “book.dtd”> – Strings (CDATA)
<book> • No constraints on attribute values
<title isbn=“0-22-4444”>Web Applications</title> – Except for disallowing <, >, &, ’and ” characters
<author>John Doe</author> – Tokenized attributes
<chapters> • ID, IDREF, ENTITY and NMTOKEN
<chapter>Introduction</chapter> – Enumerated attributes
<chapter>ASP</chapter> • Most restrictive, limited to a set of possible values
<chapter>XML</chapter> • The general form of an enumerated type is:
</chapters>
</book> attribute (value1 | value2 | value3 | …)

Book.dtd: • For example, the following declaration:

<!ELEMENT book(title,author,chapters)> <!ATTLIST Customer CustType (home | business )>


<!ELEMENT title(#PCDATA)>
• restricts CustType to either “home” or “business”
<!ATTLIST title isbn CDATA #REQUIRED>
<!ELEMENT author(#PCDATA)>
<!ELEMENT chapters(chapter+)>
<!ELEMENT chapter(#PCDATA)>
DWAX 2010.1 23 DWAX 2010.1 24

6
1 <?xml version = "1.0"?>

Attribute Tokens 2

3 <!-- Fig. 6.8: IDExample.xml -->

4 <!-- Example for ID and IDREF values of attributes -->

5
This figure shows the seven attribute tokens
6 <!DOCTYPE bookstore [

7 <!ELEMENT bookstore ( shipping+, book+ )> Each shipping element has a unique identifier
(shipID)
8 <!ELEMENT shipping ( duration )>

9 <!ATTLIST shipping shipID ID #REQUIRED>

10 <!ELEMENT book ( #PCDATA )>

11 <!ATTLIST book shippedBy IDREF #IMPLIED>

12 <!ELEMENT duration ( #PCDATA )>

13 ]> Attribute shippedBy points to shipping


element by matching shipID attribute
14

15 <bookstore>

16 <shipping shipID = "s1">

17 <duration>2 to 4 days</duration>

18 </shipping>

19

DWAX 2010.1 25 DWAX 2010.1 26

20 <shipping shipID = "s2">


21
22
<duration>1 day</duration>
</shipping>
Attribute Defaults
23
24 <book shippedBy = "s2">  Attribute defaults:
25 Java How to Program 3rd edition.
– #REQUIRED:
26 </book>
27
• The attribute must appear in element
Declare book elements with attribute

28 <book shippedBy = "s2">


shippedBy
• Document is not valid if attribute is missing
29 C How to Program 3rd edition.
– #IMPLIED
30 </book>
31
• The attribute is optional.
32 <book shippedBy = "s1"> – #FIXED:
33 C++ How to Program 3rd edition.
• The attribute is optional but if one is specified, it must match
34 </book>
the default.
35 </bookstore>

DWAX 2010.1 27 DWAX 2010.1 28

7
Attribute Declarations (fig. 3.13)
Entities
 Entities are storage units for a document’s
content.
 The most fundamental entity is the XML
document itself and is known as the document
entity.
 Entities can also refer to:
– a text string
– a DTD
– an element or attribute declaration
– an external file containing character or binary data

DWAX 2010.1 29 DWAX 2010.1 30

Entities cont Working with Entities

 Entities can be declared in a DTD. How to


declare an entity depends on how it is classified. This figure shows the three entity classifications

 There are three factors involved in classifying


entities:
– The content of the entity
– How the entity is constructed
– Where the definition of the entity is located.

DWAX 2010.1 31 DWAX 2010.1 32

8
General Parsed Entities General External Entities
 General entities are declared in the DTD of a document. The syntax is:
<!ENTITY entity “value”>
 Where entity is the name assigned to the entity and value is the general  General entities can refer to values located in
entity’s value. external files. The syntax is:
 For example, an entity named “Pixal” can be created to store a company's <!ENTITY entity SYSTEM “URL”>
official name:
<!ENTITY Pixal “Pixal Digital Products”>  For example, in the declaration:
<!ENTITY headlines SYSTEM
 After an entity is declared, it can be referenced anywhere within the http://www.newsflash.com/stories.xml>
document.
<Title>This is the home page of &Pixal;</Title>  An entity named “headlines” gets its value from
 This is interpreted as the document stories.xml, located at
<Title>This is the home page of Pixal Digital Products</Title> http://www.newsflash.com/stories.xml

DWAX 2010.1 33 DWAX 2010.1 34

Parameter Entities Parameter Entities


 Parameter entities are used to store the content of a  Parameter entity references can only be placed
DTD. For internal parameter entities, the syntax is:
where a declaration would normally occur, such
<!ENTITY % entity “value”>
as an internal or external DTD.
 where entity is the name of the parameter entity and
value is a text string of the entity’s value.  Parameter entities used with an internal DTD do
 For external parameter entities, the syntax is: not offer any time or effort savings. However, an
<!ENTITY % entity SYSTEM “URL”> external parameter entity can allow XML to use
 where URL is the name assigned to the parameter more than one DTD per document by combining
entity.
declarations from multiple DTDs.

DWAX 2010.1 35 DWAX 2010.1 36

9
Using Parameter Entities to Combine Multiple
DTDs

 Go to the DTD section of w3schools for some


examples of DTDs:
http://www.w3schools.com/dtd/dtd_examples.asp

DWAX 2010.1 37 DWAX 2010.1 38

Readings

 Carey: Tutorial 3 – Creating a valid XML


Document

DWAX 2010.1 39

10

Potrebbero piacerti anche