An internal Document Type Definition (DTD), if present, is near the end of the prolog in the document entity .
[28] |
doctypedecl |
::= |
'<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | DeclSep)* ']' S?)? '>' |
[75] |
ExternalID |
::= |
'SYSTEM' S SystemLiteral| 'PUBLIC' S PubidLiteral S SystemLiteral |
[11] |
SystemLiteral |
::= |
('"' [^"]* '"') |("'" [^']* "'") |
[12] |
PubidLiteral |
::= |
'"' PubidChar* '"' | "'" (PubidChar - "'")* "'" |
[13] |
PubidChar |
::= |
#x20 | #xD | #xA |[a-zA-Z0-9] |[-'()+,./:=?;!*#@$_%] |
The syntax for markup assures that the document entity can only contain parameter entity references just like white space between markup:
[28a] |
DeclSep |
::= |
PEReference | S |
[29] |
markupdecl |
::= |
elementdecl | AttlistDecl | EntityDecl | NotationDecl| PI | Comment |
A SystemLiteral is a URI which can be relative — provoking strange results in some processors. URIs employ character replacement %xx with hexadecimal digits for UTF-8 bytes. If a SystemLiteral is used in DOCTYPE it has to reference an extSubset .
If a PEReference, i.e., %name; , is used as DeclSep it's replacement text also must be an extSubset :
[30] |
extSubset |
::= |
TextDecl? extSubsetDecl |
[31] |
extSubsetDecl |
::= |
( markupdecl | conditionalSect | DeclSep)* |
Although the sequence may be different, the internal DTD is considered to logically precede the external DTD. conditionalSect may only be used in the external DTD and PEReference can then be used inside markupdecl as well.
There is a rudimentary mechanism to exclude parts of a document:
[61] |
conditionalSect |
::= |
includeSect | ignoreSect |
[62] |
includeSect |
::= |
'<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' |
[63] |
ignoreSect |
::= |
'<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' |
[64] |
ignoreSectContents |
::= |
Ignore ('<![' ignoreSectContents ']]>' Ignore)* |
[65] |
Ignore |
::= |
Char* - (Char* ('<![' | ']]>') Char*) |
IGNORE can contain an INCLUDE area and vice versa. Such a section can only be completely contained in the content of a parameter entity or not at all.
IGNORE and INCLUDE are typically controlled using parameter entities :
<!ENTITY % draft 'INCLUDE' > <!ENTITY % final 'IGNORE' > <![%draft;[ <!ELEMENT book (comments*, title, body, supplements?)> ]]> <![%final;[ <!ELEMENT book (title, body, supplements?)> ]]>
An elementdecl defines the element type for an element Name ( generic identifier ), i.e., it defines the content for the element. A Name may only be declared once.
[45] |
elementdecl |
::= |
'<!ELEMENT' S Name S contentspec S? '>' |
[46] |
contentspec |
::= |
'EMPTY' | 'ANY' | Mixed | children |
[47] |
children |
::= |
(choice | seq) ('?' | '*' | '+')? |
[48] |
cp |
::= |
(Name | choice | seq) ('?' | '*' | '+')? |
[49] |
choice |
::= |
'(' S? cp ( S? '|' S? cp )+ S? ')' |
[50] |
seq |
::= |
'(' S? cp ( S? ',' S? cp )* S? ')' |
[51] |
Mixed |
::= |
'(' S? '#PCDATA' (S? '|' S? Name)* S? ')*'| '(' S? '#PCDATA' S? ')' |
EMPTY means that the element must be empty (it can but does not have to be represented using EmptyElemTag). ANY means that the element can contain arbitrary elements (but presumably no text).
Mixed permits text ( parsed character data ) combined with elements. Unfortunately, in a DTD the number or sequence of the elements cannot be controlled in this case.
<!ELEMENT text ( #PCDATA ) > <!ELEMENT mixed ( #PCDATA | a | b | c )* >
A variant of EBNF is used to describe nested elements. Sequences employ commas. Alternatives and sequences must be enclosed in parentheses.
<!ELEMENT nest (( a, b ) | ( c, d ))+ >
Ambiguity is possible (but should not be allowed by a validating processor):
<!ELEMENT bits (( n, e )*, (n, e )*) >
An AttlistDecl declares the possible attributes which may, however, be specified in any order. The possible values can be restricted to some degree and there can be defaults. If there is more than one declaration for an attribute, the first one takes precedence::
[52] |
AttlistDecl |
::= |
'<!ATTLIST' S Name AttDef* S? '>' |
[53] |
AttDef |
::= |
S Name S AttType S DefaultDecl |
[54] |
AttType |
::= |
StringType | TokenizedType| EnumeratedType |
[55] |
StringType |
::= |
'CDATA' |
[56] |
TokenizedType |
::= |
'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES'| 'NMTOKEN' | 'NMTOKENS' |
[57] |
EnumeratedType |
::= |
NotationType | Enumeration |
[58] |
NotationType |
::= |
'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' |
[59] |
Enumeration |
::= |
'(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' |
An attribute value can be a string, one or more unique or defined names, or a name from a list.
ID means that the attribute must have a value which is unique in the document. An element can only have a single attribute of type ID .
IDREF and IDREFS are types for attributes which reference attributes with type ID , i.e., they are cross-references within a document.
ENTITY and ENTITIES are types which reference unparsed entities which must be specified as notations .
NMTOKEN and NMTOKENS are arbitrary lists of names with a slightly more permissive definition:
[7] |
Nmtoken |
::= |
(NameChar)+ |
[8] |
Nmtokens |
::= |
Nmtoken (S Nmtoken)* |
An EnumeratedType defines the same types but restricts the possible names.
Every attribute must have a default provision:
[60] |
DefaultDecl |
::= |
'#REQUIRED' |'#IMPLIED' | (('#FIXED' S)? AttValue) |
REQUIRED means that the attribute must be specified. IMPLIED means that the attribute can be omitted. An AttValue is a default value that is used if an attribute was not specified explicitly; FIXED means that only a certain value may be used.
ENTITY is used to specify names for global entities and for parameter entities .
[70] |
EntityDecl |
::= |
GEDecl | PEDecl |
[71] |
GEDecl |
::= |
'<!ENTITY' S Name S EntityDef S? '>' |
[72] |
PEDecl |
::= |
'<!ENTITY' S '%' S Name S PEDef S? '>' |
[73] |
EntityDef |
::= |
EntityValue | (ExternalID NDataDecl?) |
[74] |
PEDef |
::= |
EntityValue | ExternalID |
[9] |
EntityValue |
::= |
'"' ([^%&"] | PEReference | Reference)* '"'| "'" ([^%&'] | PEReference | Reference)* "'" |
[76] |
NDataDecl |
::= |
S 'NDATA' S Name |
Global entities are called with &name; and only outside of the DTD. This can also be used to reference unparsed entities .
Parameter entities are called with %name; and only inside the DTD. They can reference external entities which, however, must fit.
A NotationDecl connects a name to an unparsed entity . The name can then be used as a value following NDATA in an entity definition; the name of the latter can then be used in ENTITY or ENTITIES attributes, i.e., as a reference across documents or files.
[82] |
NotationDecl |
::= |
'<!NOTATION' S Name S (ExternalID | PublicID) S? '>' |
[83] |
PublicID |
::= |
'PUBLIC' S PubidLiteral |