Extensible Markup Language (XML) is a markup language that defnes a set of
rules for encoding documents in a format that is both human-readable and
machine-readable. It is defned b the !"#$s XML %.& 'pecifcation()* and b se+eral other related specifcations,("* all of -hich are free open standards. (.* /he design goals of XML emphasi0e simplicit, generalit and usabilit across the Internet.(1* It is a textual data format -ith strong support +ia 2nicode for di3erent human languages. 4lthough the design of XML focuses on documents, it is -idel used for the representation of arbitrar data structures(5* such as those used in -eb ser+ices. 'e+eral schema sstems exist to aid in the defnition of XML-based languages, -hile man application programming interfaces (46Is) ha+e been de+eloped to aid the processing of XML data./he material in this section is based on the XML 'pecifcation. /his is not an exhausti+e list of all the constructs that appear in XML7 it pro+ides an introduction to the ke constructs most often encountered in da-to-da use. (2nicode) character 8 defnition, an XML document is a string of characters. 4lmost e+er legal 2nicode character ma appear in an XML document. 6rocessor and application /he processor anal0es the markup and passes structured information to an application. /he specifcation places re9uirements on -hat an XML processor must do and not do, but the application is outside its scope. /he processor (as the specifcation calls it) is often referred to collo9uiall as an XML parser. Markup and content /he characters making up an XML document are di+ided into markup and content, -hich ma be distinguished b the application of simple sntactic rules. :enerall, strings that constitute markup either begin -ith the character ; and end -ith a <, or the begin -ith the character = and end -ith a 7. 'trings of characters that are not markup are content. >o-e+er, in a #?4/4 section, the delimiters ;@(#?4/4( and **< are classifed as markup, -hile the text bet-een them is classifed as content. In addition, -hitespace before and after the outermost element is classifed as markup. /ag 4 markup construct that begins -ith ; and ends -ith <. /ags come in three Aa+orsB start-tags7 for exampleB ;section< end-tags7 for exampleB ;Csection< empt-element tags7 for exampleB ;line-break C< Element 4 logical document component -hich either begins -ith a start-tag and ends -ith a matching end-tag or consists onl of an empt-element tag. /he characters bet-een the start- and end-tags, if an, are the element$s content, and ma contain markup, including other elements, -hich are called child elements. 4n example of an element is ;:reeting<>ello, -orld.;C:reeting< (see hello -orld). 4nother is ;line-break C<. 4ttribute 4 markup construct consisting of a nameC+alue pair that exists -ithin a start- tag or empt-element tag. In the example (belo-) the element img has t-o attributes, src and altB ;img srcDEmadonna.FpgE altD$Goligno Madonna, b Haphael$ C< 4nother example -ould be ;step numberDE"E<#onnect 4 to 8.;Cstep< -here the name of the attribute is EnumberE and the +alue is E"E. 4n XML attribute can onl ha+e a single +alue and each attribute can appear at most once on each element. In the common situation -here a list of multiple +alues is desired, this must be done b encoding the list into a -ell- formed XML attribute(note %* -ith some format beond -hat XML defnes itself. 2suall this is either a comma or semi-colon delimited list or, if the indi+idual +alues are kno-n not to contain spaces,(note )* a space-delimited list can be used. ;di+ classDEinner greeting-boxE <>ello@;Cdi+< -here the attribute EclassE has both the +alue Einner greeting-boxE and also indicates the t-o #'' class names EinnerE and Egreeting-boxE. XML declaration XML documents ma begin b declaring some information about themsel+es, as in the follo-ing exampleB ;Ixml +ersionDE%.&E encodingDE2/G-JEI< #haracters and escaping(edit* XML documents consist entirel of characters from the 2nicode repertoire. Except for a small number of specifcall excluded control characters, an character defned b 2nicode ma appear -ithin the content of an XML document. XML includes facilities for identifing the encoding of the 2nicode characters that make up the document, and for expressing characters that, for one reason or another, cannot be used directl. Kalid characters(edit* Main articleB Kalid characters in XML 2nicode code points in the follo-ing ranges are +alid in XML %.& documentsB (%&* 2L&&&M, 2L&&&4, 2L&&&?B these are the onl #& controls accepted in XML %.&7 2L&&)&N2L?OGG, 2LE&&&N2LGGG?B this excludes some (not all) non- characters in the 8M6 (all surrogates, 2LGGGE and 2LGGGG are forbidden)7 2L%&&&&N2L%&GGGGB this includes all code points in supplementar planes, including non-characters. XML %.%(%%* extends the set of allo-ed characters to include all the abo+e, plus the remaining characters in the range 2L&&&%N2L&&%G. 4t the same time, ho-e+er, it restricts the use of #& and #% control characters other than 2L&&&M, 2L&&&4, 2L&&&?, and 2L&&J1 b re9uiring them to be -ritten in escaped form (for example 2L&&&% must be -ritten as =Px&%7 or its e9ui+alent). In the case of #% characters, this restriction is a back-ards incompatibilit7 it -as introduced to allo- common encoding errors to be detected. /he code point 2L&&&& is the onl character that is not permitted in an XML %.& or %.% document. Encoding detection(edit* /he 2nicode character set can be encoded into btes for storage or transmission in a +ariet of di3erent -as, called EencodingsE. 2nicode itself defnes encodings that co+er the entire repertoire7 -ell-kno-n ones include 2/G-J and 2/G-%5.(%)* /here are man other text encodings that predate 2nicode, such as 4'#II and I'QCIE# JJ1M7 their character repertoires in almost e+er case are subsets of the 2nicode character set. XML allo-s the use of an of the 2nicode-defned encodings, and an other encodings -hose characters also appear in 2nicode. XML also pro+ides a mechanism -hereb an XML processor can reliabl, -ithout an prior kno-ledge, determine -hich encoding is being used.(%"* Encodings other than 2/G-J and 2/G-%5 -ill not necessaril be recogni0ed b e+er XML parser. Escaping(edit* XML pro+ides escape facilities for including characters -hich are problematic to include directl. Gor exampleB /he characters E;E and E=E are ke sntax markers and ma ne+er appear in content outside a #?4/4 section.(%.* 'ome character encodings support onl a subset of 2nicode. Gor example, it is legal to encode an XML document in 4'#II, but 4'#II lacks code points for 2nicode characters such as ERE. It might not be possible to tpe the character on the author$s machine. 'ome characters ha+e glphs that cannot be +isuall distinguished from other charactersB examples are non-breaking space (=Pxa&7) E E compare space (=Px)&7) E E #rillic #apital Letter 4 (=Px.%&7) ESE compare Latin #apital Letter 4 (=Px.%7) E4E /here are f+e predefned entitiesB =lt7 represents E;E =gt7 represents E<E =amp7 represents E=E =apos7 represents $ =9uot7 represents E 4ll permitted 2nicode characters ma be represented -ith a numeric character reference. #onsider the #hinese character E", whose numeric code in Unicode is hexadecimal 4E2D, or decimal 20,013. A user whose keyboard oers no me!hod or en!erin" !his charac!er could s!ill inser! i! in an #$% documen! encoded ei!her as &'20013( or &'x4e2d(. )imilarly, !he s!rin" "* +3 ,-r"" could be encoded or inclusion in an #$% documen! as "* &l!(3 ,&'x./(r"". "&'0(" is no! 0ermi!!ed, howe1er, because !he null charac!er is one o !he con!rol charac!ers excluded rom #$%, e1en when usin" a numeric charac!er reerence.2134 An al!erna!i1e encodin" mechanism such as 5ase/4 is needed !o re0resen! such charac!ers. 6ommen!s2edi!4 6ommen!s may a00ear anywhere in a documen! ou!side o!her marku0. 6ommen!s canno! a00ear beore !he #$% declara!ion. 6ommen!s s!ar! wi!h "+788" and end wi!h "889". .or com0a!ibili!y wi!h ):$%, !he s!rin" "88" ;double8hy0hen< is no! allowed inside commen!s(21/4 !his means commen!s canno! be nes!ed. =he am0ersand has no s0ecial si"niicance wi!hin commen!s, so en!i!y and charac!er reerences are no! reco"ni>ed as such, and !here is no way !o re0resen! charac!ers ou!side !he charac!er se! o !he documen! encodin". An exam0le o a 1alid commen!? "+788no need !o esca0e +code9 & such in commen!s889" *n!erna!ional use2edi!4 =his exam0le con!ains 6hinese !ex!. @i!hou! 0ro0er renderin" su00or!, you may see Aues!ion marks, boxes, or o!her symbols ins!ead o 6hinese charac!ers. =his exam0le con!ains 6yrillic !ex!. @i!hou! 0ro0er renderin" su00or!, you may see Aues!ion marks or boxes, mis0laced 1owels or missin" conBunc!s ins!ead o 6yrillic le!!ers. #$% 1.0 ;.i!h Edi!ion< and #$% 1.1 su00or! !he direc! use o almos! any Unicode charac!er in elemen! names, a!!ribu!es, commen!s, charac!er da!a, and 0rocessin" ins!ruc!ions ;o!her !han !he ones !ha! ha1e s0ecial symbolic meanin" in #$% i!sel, such as !he less8!han si"n, "+"<. =he ollowin" is a well8ormed #$% documen! includin" 6hinese, Armenian and 6yrillic charac!ers? +Cxml 1ersionD"1.0" encodin"D"U=.8E"C9 + TUVWXDEYWXZU[U\E<]^__`a;C9 @ell8ormedness and error8handlin"2edi!4 $ain ar!icle? @ell8ormed documen! =he #$% s0eciica!ion deines an #$% documen! as a well8ormed !ex! F meanin" !ha! i! sa!isies a lis! o syn!ax rules 0ro1ided in !he s0eciica!ion. )ome key 0oin!s in !he airly len"!hy lis! include? =he documen! con!ains only 0ro0erly encoded le"al Unicode charac!ers Gone o !he s0ecial syn!ax charac!ers such as + and & a00ear exce0! when 0erormin" !heir marku08delinea!ion roles =he be"in, end, and em0!y8elemen! !a"s !ha! delimi! !he elemen!s are correc!ly nes!ed, wi!h none missin" and none o1erla00in" =he elemen! !a"s are case8sensi!i1e( !he be"innin" and end !a"s mus! ma!ch exac!ly. =a" names canno! con!ain any o !he charac!ers 7"'HI&J;<KL,M(+D9C N2O4PQRSTU, nor a s0ace charac!er, and canno! s!ar! wi!h 8, ., or a numeric di"i!. A sin"le "roo!" elemen! con!ains all !he o!her elemen!s =he deini!ion o an #$% documen! excludes !ex!s !ha! con!ain 1iola!ions o well8ormedness rules( !hey are sim0ly no! #$%. An #$% 0rocessor !ha! encoun!ers such a 1iola!ion is reAuired !o re0or! such errors and !o cease normal 0rocessin". =his 0olicy, occasionally reerred !o as "draconian error handlin"," s!ands in no!able con!ras! !o !he beha1ior o 0ro"rams !ha! 0rocess V=$%, which are desi"ned !o 0roduce a reasonable resul! e1en in !he 0resence o se1ere marku0 errors.21W4 #$%Js 0olicy in !his area has been cri!ici>ed as a 1iola!ion o Xos!elJs law ;"5e conser1a!i1e in wha! you send( be liberal in wha! you acce0!"<.21E4 =he #$% s0eciica!ion deines a 1alid #$% documen! as a well8ormed #$% documen! which also conorms !o !he rules o a Documen! =y0e Deini!ion ;D=D<. )chemas and 1alida!ion2edi!4 *n addi!ion !o bein" well8ormed, an #$% documen! may be 1alid. =his means !ha! i! con!ains a reerence !o a Documen! =y0e Deini!ion ;D=D<, and !ha! i!s elemen!s and a!!ribu!es are declared in !ha! D=D and ollow !he "ramma!ical rules or !hem !ha! !he D=D s0eciies. #$% 0rocessors are classiied as 1alida!in" or non81alida!in" de0endin" on whe!her or no! !hey check #$% documen!s or 1alidi!y. A 0rocessor !ha! disco1ers a 1alidi!y error mus! be able !o re0or! i!, bu! may con!inue normal 0rocessin". A D=D is an exam0le o a schema or "rammar. )ince !he ini!ial 0ublica!ion o #$% 1.0, !here has been subs!an!ial work in !he area o schema lan"ua"es or #$%. )uch schema lan"ua"es !y0ically cons!rain !he se! o elemen!s !ha! may be used in a documen!, which a!!ribu!es may be a00lied !o !hem, !he order in which !hey may a00ear, and !he allowable 0aren!Mchild rela!ionshi0s. Documen! =y0e Deini!ion2edi!4 $ain ar!icle? Documen! =y0e Deini!ion =he oldes! schema lan"ua"e or #$% is !he Documen! =y0e Deini!ion ;D=D<, inheri!ed rom ):$%. D=Ds ha1e !he ollowin" benei!s? D=D su00or! is ubiAui!ous due !o i!s inclusion in !he #$% 1.0 s!andard. D=Ds are !erse com0ared !o elemen!8based schema lan"ua"es and conseAuen!ly 0resen! more inorma!ion in a sin"le screen. D=Ds allow !he declara!ion o s!andard 0ublic en!i!y se!s or 0ublishin" charac!ers. D=Ds deine a documen! !y0e ra!her !han !he !y0es used by a names0ace, !hus "rou0in" all cons!rain!s or a documen! in a sin"le collec!ion. D=Ds ha1e !he ollowin" limi!a!ions? =hey ha1e no ex0lici! su00or! or newer ea!ures o #$%, mos! im0or!an!ly names0aces. =hey lack ex0ressi1eness. #$% D=Ds are sim0ler !han ):$% D=Ds and !here are cer!ain s!ruc!ures !ha! canno! be ex0ressed wi!h re"ular "rammars. D=Ds only su00or! rudimen!ary da!a!y0es. =hey lack readabili!y. D=D desi"ners !y0ically make hea1y use o 0arame!er en!i!ies ;which beha1e essen!ially as !ex!ual macros<, which make i! easier !o deine com0lex "rammars, bu! a! !he ex0ense o clari!y. =hey use a syn!ax based on re"ular ex0ression syn!ax, inheri!ed rom ):$%, !o describe !he schema. =y0ical #$% AX*s such as )A# do no! a!!em0! !o oer a00lica!ions a s!ruc!ured re0resen!a!ion o !he syn!ax, so i! is less accessible !o 0ro"rammers !han an elemen!8based syn!ax may be. =wo 0eculiar ea!ures !ha! dis!in"uish D=Ds rom o!her schema !y0es are !he syn!ac!ic su00or! or embeddin" a D=D wi!hin #$% documen!s and or deinin" en!i!ies, which are arbi!rary ra"men!s o !ex! andMor marku0 !ha! !he #$% 0rocessor inser!s in !he D=D i!sel and in !he #$% documen! where1er !hey are reerenced, like charac!er esca0es. D=D !echnolo"y is s!ill used in many a00lica!ions because o i!s ubiAui!y. #$% )chema2edi!4 $ain ar!icle? #$% )chema ;@36< A newer schema lan"ua"e, described by !he @36 as !he successor o D=Ds, is #$% )chema, o!en reerred !o by !he ini!ialism or #$% )chema ins!ances, #)D ;#$% )chema Deini!ion<. #)Ds are ar more 0owerul !han D=Ds in describin" #$% lan"ua"es. =hey use a rich da!a!y0in" sys!em and allow or more de!ailed cons!rain!s on an #$% documen!Js lo"ical s!ruc!ure. #)Ds also use an #$%8based orma!, which makes i! 0ossible !o use ordinary #$% !ools !o hel0 0rocess !hem. xs?schema elemen! !ha! deines a schema? +Cxml 1ersionD"1.0" encodin"D"*)Y8EE3Z81" C9 +xs?schema xmlns?xsD"h!!0?MMwww.w3.or"M2001M#$%)chema"9 +Mxs?schema9 [E%A# G:2edi!4 [E%A# G: was ini!ially s0eciied by YA)*) and is now also an *)YM*E6 *n!erna!ional )!andard ;as 0ar! o D)D%<. [E%A# G: schemas may be wri!!en in ei!her an #$% based syn!ax or a more com0ac! non8#$% syn!ax( !he !wo syn!axes are isomor0hic and ,ames 6larkJs con1ersion !ool 8 J=ran"J, can con1er! be!ween !hem wi!hou! loss o inorma!ion. [E%A# G: has a sim0ler deini!ion and 1alida!ion ramework !han #$% )chema, makin" i! easier !o use and im0lemen!. *! also has !he abili!y !o use da!a!y0e ramework 0lu"8ins( a [E%A# G: schema au!hor, or exam0le, can reAuire 1alues in an #$% documen! !o conorm !o deini!ions in #$% )chema Da!a!y0es. )chema!ron2edi!4 )chema!ron is a lan"ua"e or makin" asser!ions abou! !he 0resence or absence o 0a!!erns in an #$% documen!. *! !y0ically uses #Xa!h ex0ressions. *)Y D)D% and o!her schema lan"ua"es2edi!4 =he *)Y D)D% ;Documen! )chema Descri0!ion %an"ua"es< s!andard brin"s !o"e!her a com0rehensi1e se! o small schema lan"ua"es, each !ar"e!ed a! s0eciic 0roblems. D)D% includes [E%A# G: ull and com0ac! syn!ax, )chema!ron asser!ion lan"ua"e, and lan"ua"es or deinin" da!a!y0es, charac!er re0er!oire cons!rain!s, renamin" and en!i!y ex0ansion, and names0ace8based rou!in" o documen! ra"men!s !o dieren! 1alida!ors. D)D% schema lan"ua"es do no! ha1e !he 1endor su00or! o #$% )chemas ye!, and are !o some ex!en! a "rassroo!s reac!ion o indus!rial 0ublishers !o !he lack o u!ili!y o #$% )chemas or 0ublishin". )ome schema lan"ua"es no! only describe !he s!ruc!ure o a 0ar!icular #$% orma! bu! also oer limi!ed acili!ies !o inluence 0rocessin" o indi1idual #$% iles !ha! conorm !o !his orma!. D=Ds and #)Ds bo!h ha1e !his abili!y( !hey can or ins!ance 0ro1ide !he inose! au"men!a!ion acili!y and a!!ribu!e deaul!s. [E%A# G: and )chema!ron in!en!ionally do no! 0ro1ide !hese. [ela!ed s0eciica!ions2edi!4 A clus!er o s0eciica!ions closely rela!ed !o #$% ha1e been de1elo0ed, s!ar!in" soon a!er !he ini!ial 0ublica!ion o #$% 1.0. *! is reAuen!ly !he case !ha! !he !erm "#$%" is used !o reer !o #$% !o"e!her wi!h one or more o !hese o!her !echnolo"ies which ha1e come !o be seen as 0ar! o !he #$% core. #$% Games0aces enable !he same documen! !o con!ain #$% elemen!s and a!!ribu!es !aken rom dieren! 1ocabularies, wi!hou! any namin" collisions occurrin". Al!hou"h #$% Games0aces are no! 0ar! o !he #$% s0eciica!ion i!sel, 1ir!ually all #$% so!ware also su00or!s #$% Games0aces. #$% 5ase deines !he xml?base a!!ribu!e, which may be used !o se! !he base or resolu!ion o rela!i1e U[* reerences wi!hin !he sco0e o a sin"le #$% elemen!. =he #$% *norma!ion )e! or #$% inose! describes an abs!rac! da!a model or #$% documen!s in !erms o inorma!ion i!ems. =he inose! is commonly used in !he s0eciica!ions o #$% lan"ua"es, or con1enience in describin" cons!rain!s on !he #$% cons!ruc!s !hose lan"ua"es allow. xml?id \ersion 1.0 asser!s !ha! an a!!ribu!e named xml?id unc!ions as an "*D a!!ribu!e" in !he sense used in a D=D. #Xa!h deines a syn!ax named #Xa!h ex0ressions which iden!iies one or more o !he in!ernal com0onen!s ;elemen!s, a!!ribu!es, and so on< included in an #$% documen!. #Xa!h is widely used in o!her core8#$% s0eciica!ions and in 0ro"rammin" libraries or accessin" #$%8encoded da!a. #)%= is a lan"ua"e wi!h an #$%8based syn!ax !ha! is used !o !ransorm #$% documen!s in!o o!her #$% documen!s, V=$%, or o!her, uns!ruc!ured orma!s such as 0lain !ex! or [=.. #)%= is 1ery !i"h!ly cou0led wi!h #Xa!h, which i! uses !o address com0onen!s o !he in0u! #$% documen!, mainly elemen!s and a!!ribu!es. #)% .orma!!in" YbBec!s, or #)%8.Y, is a marku0 lan"ua"e or #$% documen! orma!!in" which is mos! o!en used !o "enera!e XD.s. #]uery is an #$%8orien!ed Auery lan"ua"e s!ron"ly roo!ed in #Xa!h and #$% )chema. *! 0ro1ides me!hods !o access, mani0ula!e and re!urn #$%, and is mainly concei1ed as a Auery lan"ua"e or #$% da!abases. #$% )i"na!ure deines syn!ax and 0rocessin" rules or crea!in" di"i!al si"na!ures on #$% con!en!. #$% Encry0!ion deines syn!ax and 0rocessin" rules or encry0!in" #$% con!en!. )ome o!her s0eciica!ions concei1ed as 0ar! o !he "#$% 6ore" ha1e ailed !o ind wide ado0!ion, includin" #*nclude, #%ink, and #Xoin!er. Xro"rammin" in!eraces2edi!4 =he desi"n "oals o #$% include, "*! shall be easy !o wri!e 0ro"rams which 0rocess #$% documen!s."234 Des0i!e !his, !he #$% s0eciica!ion con!ains almos! no inorma!ion abou! how 0ro"rammers mi"h! "o abou! doin" such 0rocessin". =he #$% *nose! s0eciica!ion 0ro1ides a 1ocabulary !o reer !o !he cons!ruc!s wi!hin an #$% documen!, bu! also does no! 0ro1ide any "uidance on how !o access !his inorma!ion. A 1arie!y o AX*s or accessin" #$% ha1e been de1elo0ed and used, and some ha1e been s!andardi>ed. Exis!in" AX*s or #$% 0rocessin" !end !o all in!o !hese ca!e"ories? )!ream8orien!ed AX*s accessible rom a 0ro"rammin" lan"ua"e, or exam0le )A# and )!A#. =ree8!ra1ersal AX*s accessible rom a 0ro"rammin" lan"ua"e, or exam0le DY$. #$% da!a bindin", which 0ro1ides an au!oma!ed !ransla!ion be!ween an #$% documen! and 0ro"rammin"8lan"ua"e obBec!s. Declara!i1e !ransorma!ion lan"ua"es such as #)%= and #]uery. )!ream8orien!ed acili!ies reAuire less memory and, or cer!ain !asks which are based on a linear !ra1ersal o an #$% documen!, are as!er and sim0ler !han o!her al!erna!i1es. =ree8!ra1ersal and da!a8bindin" AX*s !y0ically reAuire !he use o much more memory, bu! are o!en ound more con1enien! or use by 0ro"rammers( some include declara!i1e re!rie1al o documen! com0onen!s 1ia !he use o #Xa!h ex0ressions. #)%= is desi"ned or declara!i1e descri0!ion o #$% documen! !ransorma!ions, and has been widely im0lemen!ed bo!h in ser1er8side 0acka"es and @eb browsers. #]uery o1erla0s #)%= in i!s unc!ionali!y, bu! is desi"ned more or searchin" o lar"e #$% da!abases. )im0le AX* or #$%2edi!4 )im0le AX* or #$% ;)A#< is a lexical, e1en!8dri1en in!erace in which a documen! is read serially and i!s con!en!s are re0or!ed as callbacks !o 1arious me!hods on a handler obBec! o !he userJs desi"n. )A# is as! and eicien! !o im0lemen!, bu! diicul! !o use or ex!rac!in" inorma!ion a! random rom !he #$%, since i! !ends !o burden !he a00lica!ion au!hor wi!h kee0in" !rack o wha! 0ar! o !he documen! is bein" 0rocessed. *! is be!!er sui!ed !o si!ua!ions in which cer!ain !y0es o inorma!ion are always handled !he same way, no ma!!er where !hey occur in !he documen!. Xull 0arsin"2edi!4 Xull 0arsin"21Z4 !rea!s !he documen! as a series o i!ems which are read in seAuence usin" !he *!era!or desi"n 0a!!ern. =his allows or wri!in" o recursi1e8descen! 0arsers in which !he s!ruc!ure o !he code 0erormin" !he 0arsin" mirrors !he s!ruc!ure o !he #$% bein" 0arsed, and in!ermedia!e 0arsed resul!s can be used and accessed as local 1ariables wi!hin !he me!hods 0erormin" !he 0arsin", or 0assed down ;as me!hod 0arame!ers< in!o lower8le1el me!hods, or re!urned ;as me!hod re!urn 1alues< !o hi"her8le1el me!hods. Exam0les o 0ull 0arsers include )!A# in !he ,a1a 0ro"rammin" lan"ua"e, #$%[eader in XVX, Elemen!=ree.i!er0arse in Xy!hon, )ys!em.#ml.#ml[eader in !he .GE= .ramework, and !he DY$ !ra1ersal AX* ;Gode*!era!or and =ree@alker<. A 0ull 0arser crea!es an i!era!or !ha! seAuen!ially 1isi!s !he 1arious elemen!s, a!!ribu!es, and da!a in an #$% documen!. 6ode which uses !his i!era!or can !es! !he curren! i!em ;!o !ell, or exam0le, whe!her i! is a s!ar! or end elemen!, or !ex!<, and ins0ec! i!s a!!ribu!es ;local name, names0ace, 1alues o #$% a!!ribu!es, 1alue o !ex!, e!c.<, and can also mo1e !he i!era!or !o !he nex! i!em. =he code can !hus ex!rac! inorma!ion rom !he documen! as i! !ra1erses i!. =he recursi1e8descen! a00roach !ends !o lend i!sel !o kee0in" da!a as !y0ed local 1ariables in !he code doin" !he 0arsin", while )A#, or ins!ance, !y0ically reAuires a 0arser !o manually main!ain in!ermedia!e da!a wi!hin a s!ack o elemen!s which are 0aren! elemen!s o !he elemen! bein" 0arsed. Xull80arsin" code can be more s!rai"h!orward !o unders!and and main!ain !han )A# 0arsin" code. Documen! YbBec! $odel2edi!4 =he Documen! YbBec! $odel ;DY$< is an in!erace8orien!ed a00lica!ion 0ro"rammin" in!erace !ha! allows or na1i"a!ion o !he en!ire documen! as i i! were a !ree o node obBec!s re0resen!in" !he documen!Js con!en!s. A DY$ documen! can be crea!ed by a 0arser, or can be "enera!ed manually by users ;wi!h limi!a!ions<. Da!a !y0es in DY$ nodes are abs!rac!( im0lemen!a!ions 0ro1ide !heir own 0ro"rammin" lan"ua"e8s0eciic bindin"s. DY$ im0lemen!a!ions !end !o be memory in!ensi1e, as !hey "enerally reAuire !he en!ire documen! !o be loaded in!o memory and cons!ruc!ed as a !ree o obBec!s beore access is allowed. Da!a bindin"2edi!4 Ano!her orm o #$% 0rocessin" AX* is #$% da!a bindin", where #$% da!a are made a1ailable as a hierarchy o cus!om, s!ron"ly !y0ed classes, in con!ras! !o !he "eneric obBec!s crea!ed by a Documen! YbBec! $odel 0arser. =his a00roach sim0liies code de1elo0men!, and in many cases allows 0roblems !o be iden!iied a! com0ile !ime ra!her !han run8!ime. Exam0le da!a bindin" sys!ems include !he ,a1a Archi!ec!ure or #$% 5indin" ;,A#5< and #$% )eriali>a!ion in .GE=.2204 #$% as da!a !y0e2edi!4 #$% has a00eared as a irs!8class da!a !y0e in o!her lan"ua"es. =he E6$A)cri0! or #$% ;E4#< ex!ension !o !he E6$A)cri0!M,a1a)cri0! lan"ua"e ex0lici!ly deines !wo s0eciic obBec!s ;#$% and #$%%is!< or ,a1a)cri0!, which su00or! #$% documen! nodes and #$% node lis!s as dis!inc! obBec!s and use a do!8 no!a!ion s0eciyin" 0aren!8child rela!ionshi0s.2214 E4# is su00or!ed by !he $o>illa 2.3L browsers ;!hou"h now de0reca!ed< and Adobe Ac!ionscri0!, bu! has no! been ado0!ed more uni1ersally. )imilar no!a!ions are used in $icroso!Js %*G] im0lemen!a!ion or $icroso! .GE= 3.3 and abo1e, and in )cala ;which uses !he ,a1a \$<. =he o0en8source xmlsh a00lica!ion, which 0ro1ides a %inux8like shell wi!h s0ecial ea!ures or #$% mani0ula!ion, similarly !rea!s #$% as a da!a !y0e, usin" !he +2 49 no!a!ion.2224 =he [esource Descri0!ion .ramework deines a da!a !y0e rd?#$%%i!eral !o hold wra00ed, canonical #$%.2234 Vis!ory2edi!4 #$% is an a00lica!ion 0roile o ):$% ;*)Y EEWZ<.2244 =he 1ersa!ili!y o ):$% or dynamic inorma!ion dis0lay was unders!ood by early di"i!al media 0ublishers in !he la!e 1ZE0s 0rior !o !he rise o !he *n!erne!.223422/4 5y !he mid81ZZ0s some 0rac!i!ioners o ):$% had "ained ex0erience wi!h !he !hen8new @orld @ide @eb, and belie1ed !ha! ):$% oered solu!ions !o some o !he 0roblems !he @eb was likely !o ace as i! "rew. Dan 6onnolly added ):$% !o !he lis! o @36Js ac!i1i!ies when he Boined !he s!a in 1ZZ3( work be"an in mid81ZZ/ when )un $icrosys!ems en"ineer ,on 5osak de1elo0ed a char!er and recrui!ed collabora!ors. 5osak was well connec!ed in !he small communi!y o 0eo0le who had ex0erience bo!h in ):$% and !he @eb.22W4 #$% was com0iled by a workin" "rou0 o ele1en members,22E4 su00or!ed by a ;rou"hly< 1308member *n!eres! :rou0. =echnical deba!e !ook 0lace on !he *n!eres! :rou0 mailin" lis! and issues were resol1ed by consensus or, when !ha! ailed, maBori!y 1o!e o !he @orkin" :rou0. A record o desi"n decisions and !heir ra!ionales was com0iled by $ichael )0erber"8$c]ueen on December 4, 1ZZW.22Z4 ,ames 6lark ser1ed as =echnical %ead o !he @orkin" :rou0, no!ably con!ribu!in" !he em0!y8elemen! "+em0!y M9" syn!ax and !he name "#$%". Y!her names !ha! had been 0u! orward or considera!ion included "$A:$A" ;$inimal Archi!ec!ure or :enerali>ed $arku0 A00lica!ions<, ")%*$" ;)!ruc!ured %an"ua"e or *n!erne! $arku0< and "$:$%" ;$inimal :enerali>ed $arku0 %an"ua"e<. =he co8 edi!ors o !he s0eciica!ion were ori"inally =im 5ray and $ichael )0erber"8 $c]ueen. Valway !hrou"h !he 0roBec! 5ray acce0!ed a consul!in" en"a"emen! wi!h Ge!sca0e, 0ro1okin" 1ocierous 0ro!es!s rom $icroso!. 5ray was !em0orarily asked !o resi"n !he edi!orshi0. =his led !o in!ense dis0u!e in !he @orkin" :rou0, e1en!ually sol1ed by !he a00oin!men! o $icroso!Js ,ean Xaoli as a !hird co8edi!or. =he #$% @orkin" :rou0 ne1er me! ace8!o8ace( !he desi"n was accom0lished usin" a combina!ion o email and weekly !eleconerences. =he maBor desi"n decisions were reached in a shor! burs! o in!ense work be!ween Au"us! and Go1ember 1ZZ/,2304 when !he irs! @orkin" Dra! o an #$% s0eciica!ion was 0ublished.2314 .ur!her desi"n work con!inued !hrou"h 1ZZW, and #$% 1.0 became a @36 [ecommenda!ion on .ebruary 10, 1ZZE. )ources2edi!4 #$% is a 0roile o an *)Y s!andard ):$%, and mos! o #$% comes rom ):$% unchan"ed. .rom ):$% comes !he se0ara!ion o lo"ical and 0hysical s!ruc!ures ;elemen!s and en!i!ies<, !he a1ailabili!y o "rammar8based 1alida!ion ;D=Ds<, !he se0ara!ion o da!a and me!ada!a ;elemen!s and a!!ribu!es<, mixed con!en!, !he se0ara!ion o 0rocessin" rom re0resen!a!ion ;0rocessin" ins!ruc!ions<, and !he deaul! an"le8bracke! syn!ax. [emo1ed were !he ):$% declara!ion ;#$% has a ixed delimi!er se! and ado0!s Unicode as !he documen! charac!er se!<. Y!her sources o !echnolo"y or #$% were !he =ex! Encodin" *ni!ia!i1e ;=E*<, which deined a 0roile o ):$% or use as a "!ranser syn!ax"( and V=$%, in which elemen!s were synchronous wi!h !heir resource, documen! charac!er se!s were se0ara!e rom resource encodin", !he xml?lan" a!!ribu!e was in1en!ed, and ;like V==X< me!ada!a accom0anied !he resource ra!her !han bein" needed a! !he declara!ion o a link. =he Ex!ended [eerence 6oncre!e )yn!ax ;E[6)< 0roBec! o !he )X[EAD ;)!andardi>a!ion XroBec! [e"ardin" Eas! Asian Documen!s< 0roBec! o !he *)Y8rela!ed 6hinaM,a0anM^orea Documen! Xrocessin" ex0er! "rou0 was !he basis o #$% 1.0Js namin" rules( )X[EAD also in!roduced hexadecimal numeric charac!er reerences and !he conce0! o reerences !o make a1ailable all Unicode charac!ers. =o su00or! E[6), #$% and V=$% be!!er, !he ):$% s!andard *) EEWZ was re1ised in 1ZZ/ and 1ZZE wi!h @eb):$% Ada0!a!ions. =he #$% header ollowed !ha! o *)Y Vy=ime. *deas !ha! de1elo0ed durin" discussion which were no1el in #$% included !he al"ori!hm or encodin" de!ec!ion and !he encodin" header, !he 0rocessin" ins!ruc!ion !ar"e!, !he xml?s0ace a!!ribu!e, and !he new close delimi!er or em0!y8elemen! !a"s. =he no!ion o well8ormedness as o00osed !o 1alidi!y ;which enables 0arsin" wi!hou! a schema< was irs! ormali>ed in #$%, al!hou"h i! had been im0lemen!ed successully in !he Elec!ronic 5ook =echnolo"y "Dyna!ex!" so!ware(2324 !he so!ware rom !he Uni1ersi!y o @a!erloo Gew Yxord En"lish Dic!ionary XroBec!( !he [*)X %*)X ):$% !ex! 0rocessor a! Unisco0e, =okyo( !he U) Army $issile 6ommand *AD) hy0er!ex! sys!em( $en!or :ra0hics 6on!ex!( *n!erlea and #erox Xublishin" )ys!em. \ersions2edi!4 =here are !wo curren! 1ersions o #$%. =he irs! ;#$% 1.0< was ini!ially deined in 1ZZE. *! has under"one minor re1isions since !hen, wi!hou! bein" "i1en a new 1ersion number, and is curren!ly in i!s i!h edi!ion, as 0ublished on Go1ember 2/, 200E. *! is widely im0lemen!ed and s!ill recommended or "eneral use. =he second ;#$% 1.1< was ini!ially 0ublished on .ebruary 4, 2004, !he same day as #$% 1.0 =hird Edi!ion,2334 and is curren!ly in i!s second edi!ion, as 0ublished on Au"us! 1/, 200/. *! con!ains ea!ures ;some con!en!ious< !ha! are in!ended !o make #$% easier !o use in cer!ain cases.2344 =he main chan"es are !o enable !he use o line8endin" charac!ers used on E56D*6 0la!orms, and !he use o scri0!s and charac!ers absen! rom Unicode 3.2. #$% 1.1 is no! 1ery widely im0lemen!ed and is recommended or use only by !hose who need i!s uniAue ea!ures.2334 Xrior !o i!s i!h edi!ion release, #$% 1.0 diered rom #$% 1.1 in ha1in" s!ric!er reAuiremen!s or charac!ers a1ailable or use in elemen! and a!!ribu!e names and uniAue iden!iiers? in !he irs! our edi!ions o #$% 1.0 !he charac!ers were exclusi1ely enumera!ed usin" a s0eciic 1ersion o !he Unicode s!andard ;Unicode 2.0 !o Unicode 3.2.< =he i!h edi!ion subs!i!u!es !he mechanism o #$% 1.1, which is more u!ure80roo bu! reduces redundancy. =he a00roach !aken in !he i!h edi!ion o #$% 1.0 and in all edi!ions o #$% 1.1 is !ha! only cer!ain charac!ers are orbidden in names, and e1ery!hin" else is allowed, in order !o accommoda!e !he use o sui!able name charac!ers in u!ure 1ersions o Unicode. *n !he i!h edi!ion, #$% names may con!ain charac!ers in !he 5alinese, 6ham, or Xhoenician scri0!s amon" many o!hers which ha1e been added !o Unicode since Unicode 3.2.2344 Almos! any Unicode code 0oin! can be used in !he charac!er da!a and a!!ribu!e 1alues o an #$% 1.0 or 1.1 documen!, e1en i !he charac!er corres0ondin" !o !he code 0oin! is no! deined in !he curren! 1ersion o Unicode. *n charac!er da!a and a!!ribu!e 1alues, #$% 1.1 allows !he use o more con!rol charac!ers !han #$% 1.0, bu!, or "robus!ness", mos! o !he con!rol charac!ers in!roduced in #$% 1.1 mus! be ex0ressed as numeric charac!er reerences ;and 'xW. !hrou"h 'xZ., which had been allowed in #$% 1.0, are in #$% 1.1 e1en reAuired !o be ex0ressed as numeric charac!er reerences23/4<. Amon" !he su00or!ed con!rol charac!ers in #$% 1.1 are !wo line break codes !ha! mus! be !rea!ed as whi!es0ace. @hi!es0ace charac!ers are !he only con!rol codes !ha! can be wri!!en direc!ly. =here has been discussion o an #$% 2.0, al!hou"h no or"ani>a!ion has announced 0lans or work on such a 0roBec!. #$%8)@ ;)@ or skunkworks<, wri!!en by one o !he ori"inal de1elo0ers o #$%,23W4 con!ains some 0ro0osals or wha! an #$% 2.0 mi"h! look like? elimina!ion o D=Ds rom syn!ax, in!e"ra!ion o names0aces, #$% 5ase and #$% *norma!ion )e! ;inose!< in!o !he base s!andard. =he @orld @ide @eb 6onsor!ium also has an #$% 5inary 6harac!eri>a!ion @orkin" :rou0 doin" 0reliminary research in!o use cases and 0ro0er!ies or a binary encodin" o !he #$% inose!. =he workin" "rou0 is no! char!ered !o 0roduce any oicial s!andards. )ince #$% is by deini!ion !ex!8based, *=U8= and *)Y are usin" !he name .as! *nose! or !heir own binary inose! !o a1oid conusion ;see *=U8= [ec. #.EZ1 S *)YM*E6 24E2481<. 6ri!icism2edi!4 #$% and i!s ex!ensions ha1e re"ularly been cri!ici>ed or 1erbosi!y and com0lexi!y.23E4 $a00in" !he basic !ree model o #$% !o !y0e sys!ems o 0ro"rammin" lan"ua"es or da!abases can be diicul!, es0ecially when #$% is used or exchan"in" hi"hly s!ruc!ured da!a be!ween a00lica!ions, which was no! i!s 0rimary desi"n "oal. Y!her cri!icisms a!!em0! !o reu!e !he claim !ha! #$% is a sel8describin" lan"ua"e23Z4 ;!hou"h !he #$% s0eciica!ion i!sel makes no such claim<. ,)YG, _A$%, and )8Ex0ressions are reAuen!ly 0ro0osed as al!erna!i1es ;see 6om0arison o da!a seriali>a!ion orma!s<(2404 which ocus on re0resen!in" hi"hly s!ruc!ured da!a ra!her !han documen!s, which may con!ain bo!h hi"hly s!ruc!ured and rela!i1ely uns!ruc!ured con!en!.