Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Motivation
A DTD adds syntactical requirements in addition to the well-formed requirement It helps in eliminating errors when creating or editing XML documents It clarifies the intended semantics It simplifies the processing of XML documents
2
An Example
In an address book, where can a phone number appear?
Under <person>, under <name> or under both?
If we have to check for all possibilities, processing takes longer and it may not be clear to whom a phone belongs
Exactly one name <greet> Dr. H. Simpson </greet> At most one greeting As many address <addr>1234 Springwater Road </addr> lines as needed <addr> Springfield USA, 98765 </addr> (in order)
<tel> (321) 786 2543 </tel>
<fax> (321) 786 2544 </fax> <tel> (321) 786 2544 </tel>
As many as needed
5
(#PCDATA)
11
Internal means that the DTD and the 13 XML Document are in the same file
Regular Expressions
Each regular expression determines a corresponding finite-state automaton Lets start with a simpler example: A double
Another Example
name,address*,(tel | fax)*,email*
address tel tel email fax fax email email
name
16
<!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | ... )>
Suppose that there were many more fields!
17
There are n! different orders of n elements Suppose there were many more fields! It is not even polynomial
18
The dimension attribute is required The accuracy attribute is optional CDATA is the type of the attribute it means character data, and may take any literal string as a value
19
value
The default value of the attribute if none is given
21
Recursive DTDs
<DOCTYPE genealogy [ <!ELEMENT genealogy (person*)> <!ELEMENT person ( name, dateOfBirth, person, -- mother person )> -- father ... ]>
What is the problem with this? A parser does not notice it!
Each person should have a father and a mother. This leads to either infinite data or a person that is a descendent of herself.
22
If a person only has a father, how can you tell that he has a father and does not have a mother?
IDREF attribute: its value must be some other elements ID value in the document. IDREFS attribute: its value is a set, each element of the set is the ID value of some other element in the document. <person id=898 father=332 mother=336 children=982 984 986>
25
27
An Alternative Specification
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE family [ <!ELEMENT family (person)*> <!ELEMENT person (name, mother?, father?, children?)> <!ATTLIST person id ID #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT mother EMPTY> <!ATTLIST mother idref IDREF #REQUIRED> <!ELEMENT father EMPTY> <!ATTLIST father idref IDREF #REQUIRED> <!ELEMENT children EMPTY> <!ATTLIST children idrefs IDREFS #REQUIRED> ]>
28
Similarly for all the values of an IDREFS attribute ID, IDREF and IDREFS attributes are not typed
30
or external
A start tag does not have two occurrences of the same attribute
33
Valid Documents
A well-formed XML document isvalid if it conforms to its DTD, that is,
The document conforms to the regularexpression grammar, The types of attributes are correct, and The constraints on references are satisfied
34