| | | |
Committee Specification 3 December 2001
- This version:
- Committee Specification: 3 December 2001
- Previous versions:
- Committee Specification: 11 August 2001
Copyright © The Organization for the Advancement of Structured
Information Standards [OASIS] 2001. All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it or
assist in its implementation may be prepared, copied, published and
distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are included
on all such copies and derivative works. However, this document itself
may not be modified in any way, such as by removing the copyright notice
or references to OASIS, except as needed for the purpose of developing
OASIS specifications, in which case the procedures for copyrights
defined in the OASIS Intellectual Property Rights document must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by OASIS or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Abstract
This is the definitive specification of RELAX NG, a simple schema
language for XML, based on [RELAX]
and [TREX]. A RELAX NG schema
specifies a pattern for the structure and content of an XML document. A
RELAX NG schema is itself an XML document.
This document specifies
- when an XML document is a correct RELAX NG schema
- when an XML document is valid with respect to a
correct RELAX NG schema
An XML document that is being validated with respect to a RELAX NG schema is
referred to as an instance.
The structure of this document is as follows. Section 2 describes the data model, which is the
abstraction of an XML document used throughout the rest of the document. Section 3 describes the
syntax of a RELAX NG schema; any correct RELAX NG schema must conform to this
syntax. Section 4
describes a sequence of transformations that are applied to simplify a RELAX NG
schema; applying the transformations also involves checking certain restrictions
that must be satisfied by a correct RELAX NG schema. Section 5 describes the syntax that results
from applying the transformations; this simple syntax is a subset of the full
syntax. Section 6 describes the
semantics of a correct RELAX NG schema that uses the simple syntax; the
semantics specify when an element is valid with respect to a RELAX NG schema.
Section 7 describes
restrictions in terms of the simple syntax; a correct RELAX NG schema must be
such that, after transformation into the simple form, it satisfies these
restrictions. Finally, Section
8 describes conformance requirements for RELAX NG validators.
A tutorial is available separately (see [Tutorial]).
RELAX NG deals with XML documents representing both schemas and instances through
an abstract data model. XML documents representing schemas and instances must
be well-formed in conformance with [XML
1.0] and must conform to the constraints of [XML Namespaces].
An XML document is represented by an element. An element consists of
- a name
- a context
- a set of attributes
- an ordered sequence of zero or more children; each
child is either an element or a non-empty string; the sequence never
contains two consecutive strings
A name consists of
- a string representing the namespace URI; the empty
string has special significance, representing the absence of any
namespace
- a string representing the local name; this string
matches the NCName production of [XML Namespaces]
A context consists of
- a base URI
- a namespace map; this maps prefixes to namespace
URIs, and also may specify a default namespace URI (as declared by the
xmlns attribute)
An attribute consists of
- a name
- a string representing the value
A string consists of a sequence of zero or more characters, where a character is
as defined in [XML 1.0].
The element for an XML document is constructed from an instance of the [XML Infoset] as follows. We use
the notation [
x
] to refer to the value of the
x
property of an information item. An element is constructed from a document
information item by constructing an element from the [document element]. An
element is constructed from an element information item by constructing the name
from the [namespace name] and [local name], the context from the [base URI] and
[in-scope namespaces], the attributes from the [attributes], and the children
from the [children]. The attributes of an element are constructed from the
unordered set of attribute information items by constructing an attribute for
each attribute information item. The children of an element are constructed
from the list of child information items first by removing information items
other than element information items and character information items, and then
by constructing an element for each element information item in the list and a
string for each maximal sequence of character information items. An attribute is
constructed from an attribute information item by constructing the name from the
[namespace name] and [local name], and the value from the [normalized value].
When constructing the name of an element or attribute from the [namespace name]
and [local name], if the [namespace name] property is not present, then the name
is constructed from an empty string and the [local name]. A string is
constructed from a sequence of character information items by constructing a
character from the [character code] of each character information item.
It is possible for there to be multiple distinct infosets for a single XML
document. This is because XML parsers are not required to process all DTD
declarations or expand all external parsed general entities. Amongst these
multiple infosets, there is exactly one infoset for which [all declarations
processed] is true and which does not contain any unexpanded entity reference
information items. This is the infoset that is the basis for defining the RELAX
NG data model.
Suppose the document http://www.example.com/doc.xml is as follows:
<?xml version="1.0"?>
<foo><pre1:bar1 xmlns:pre1="http://www.example.com/n1"/><pre2:bar2
xmlns:pre2="http://www.example.com/n2"/></foo>
The element representing this document has
The following grammar summarizes the syntax of RELAX NG. Although we use a
notation based on the XML representation of an RELAX NG schema as a sequence of
characters, the grammar must be understood as operating at the data model level.
For example, although the syntax uses <text/>, an instance or
schema can use <text></text> instead, because they both
represent the same element at the data model level. All elements shown in the
grammar are qualified with the namespace URI:
http://relaxng.org/ns/structure/1.0
The symbols QName and NCName are defined in [XML Namespaces]. The anyURI symbol has the
same meaning as the anyURI datatype of [W3C XML Schema Datatypes]: it
indicates a string that, after escaping of disallowed values as described in
Section 5.4 of [XLink], is a URI reference
as defined in [RFC 2396] (as modified
by [RFC 2732]). The symbol string
matches any string.
In addition to the attributes shown explicitly, any element can have an
ns attribute and any element can have a datatypeLibrary
attribute. The ns attribute can have any value. The value of the
datatypeLibrary attribute must match the anyURI symbol as described
in the previous paragraph; in addition, it must not use the relative form of URI
reference and must not have a fragment identifier; as an exception to this, the
value may be the empty string.
Any element can also have foreign attributes in addition to the attributes shown
in the grammar. A foreign attribute is an attribute with a name whose namespace
URI is neither the empty string nor the RELAX NG namespace URI. Any element
that cannot have string children (that is, any element other than
value, param and name) may have foreign child
elements in addition to the child elements shown in the grammar. A foreign
element is an element with a name whose namespace URI is not the RELAX NG
namespace URI. There are no constraints on the relative position of foreign
child elements with respect to other child elements.
Any element can also have as children strings that consist entirely of whitespace
characters, where a whitespace character is one of #x20, #x9, #xD or #xA. There
are no constraints on the relative position of whitespace string children with
respect to child elements.
Leading and trailing whitespace is allowed for value of each name,
type and combine attribute and for the content of each
name element.
| pattern | ::= | <element
name="QName">
pattern+ </element> |
<element> nameClass
pattern+ </element> | <attribute
name="QName">
[pattern] </attribute> |
<attribute> nameClass
[pattern] </attribute> |
<group> pattern+
</group> | <interleave>
pattern+ </interleave> |
<choice> pattern+
</choice> | <optional>
pattern+ </optional> |
<zeroOrMore> pattern+
</zeroOrMore> |
<oneOrMore> pattern+
</oneOrMore> | <list>
pattern+ </list> |
<mixed> pattern+
</mixed> | <ref
name="NCName"/> |
<parentRef
name="NCName"/> |
<empty/> |
<text/> |
<value
[type="NCName"]>
string </value> |
<data
type="NCName">
param* [exceptPattern]
</data> |
<notAllowed/> |
<externalRef
href="anyURI"/> |
<grammar>
grammarContent* </grammar> | | param | ::= |
<param
name="
NCName
"
>
string
</param>
| | exceptPattern | ::= | <except> pattern+
</except> | | grammarContent | ::= | start | define | <div>
grammarContent* </div> |
<include
href="anyURI">
includeContent* </include> | | includeContent | ::= | start | define | <div>
includeContent* </div> | | start | ::= | <start
[combine="method"]>
pattern </start> | | define | ::= | <define
name="NCName"
[combine="method"]>
pattern+ </define> | | method | ::= | choice | interleave | | nameClass | ::= | <name> QName
</name> | <anyName>
[exceptNameClass] </anyName> |
<nsName>
[exceptNameClass] </nsName> |
<choice> nameClass+
</choice> | | exceptNameClass | ::= | <except> nameClass+
</except> |
Here is an example of a schema in the full syntax for the document in Section 2.1.
<?xml version="1.0"?>
<element name="foo"
xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:a="http://relaxng.org/ns/annotation/1.0"
xmlns:ex1="http://www.example.com/n1"
xmlns:ex2="http://www.example.com/n2">
<a:documentation>A foo element.</a:document>
<element name="ex1:bar1">
<empty/>
</element>
<element name="ex2:bar2">
<empty/>
</element>
</element>
The full syntax given in the previous section is transformed into a simpler
syntax by applying the following transformation rules in order. The effect must
be as if each rule was applied to all elements in the schema before the next
rule is applied. A transformation rule may also specify constraints that must
be satisfied by a correct schema. The transformation rules are applied at the
data model level. Before the transformations are applied, the schema is parsed
into an instance of the data model.
Foreign attributes and elements are removed.
Note
It is safe to remove xml:base attributes at this stage because
xml:base attributes are used in determining the [base URI]
of an element information item, which is in turn used to construct the
base URI of the context of an element. Thus, after a document has been
parsed into an instance of the data model, xml:base attributes
can be discarded.
For each element other than value and param, each child
that is a string containing only whitespace characters is removed.
Leading and trailing whitespace characters are removed from the value of each
name, type and combine attribute and from the
content of each name element.
4.3. datatypeLibrary
attribute
The value of each datatypeLibary attribute is transformed by
escaping disallowed characters as specified in Section 5.4 of [XLink].
For any data or value element that does not have a
datatypeLibrary attribute, a datatypeLibrary attribute
is added. The value of the added datatypeLibrary attribute is the
value of the datatypeLibrary attribute of the nearest ancestor
element that has a datatypeLibrary attribute, or the empty string
if there is no such ancestor. Then, any datatypeLibrary attribute
that is on an element other than data or value is
removed.
4.4. type attribute of
value element
For any value element that does not have a type attribute,
a type attribute is added with value token and the value
of the datatypeLibrary attribute is changed to the empty
string.
The value of the href attribute on an externalRef or
include element is first transformed by escaping disallowed
characters as specified in Section 5.4 of [XLink]. The URI reference is then resolved into an absolute form
as described in section 5.2 of [RFC
2396] using the base URI from the context of the element that bears
the href attribute.
The value of the href attribute will be used to construct an element
(as specified in Section 2).
This must be done as follows. The URI reference consists of the URI itself
and an optional fragment identifier. The resource identified by the URI is
retrieved. The result is a MIME entity: a sequence of bytes labeled with a
MIME media type. The media type determines how an element is constructed
from the MIME entity and optional fragment identifier. When the media type
is application/xml or text/xml, the MIME entity must be
parsed as an XML document in accordance with the applicable RFC (at the term
of writing [RFC 3023]) and an
element constructed from the result of the parse as specified in Section 2. In particular,
the charset parameter must be handled as specified by the RFC. This
specification does not define the handling of media types other than
application/xml and text/xml. The href
attribute must not include a fragment identifier unless the registration of
the media type of the resource identified by the attribute defines the
interpretation of fragment identifiers for that media type.
Note
[RFC 3023] does not define the
interpretation of fragment identifiers for application/xml or
text/xml.
An externalRef element is transformed as follows. An element is
constructed using the URI reference that is the value of href
attribute as specified in Section 4.5. This element must match the syntax for pattern. The
element is transformed by recursively applying the rules from this
subsection and from previous subsections of this section. This must not
result in a loop. In other words, the transformation of the referenced
element must not require the dereferencing of an externalRef
attribute with an href attribute with the same value.
Any ns attribute on the externalRef element is transferred
to the referenced element if the referenced element does not already have an
ns attribute. The externalRef element is then
replaced by the referenced element.
An include element is transformed as follows. An element is
constructed using the URI reference that is the value of href
attribute as specified in Section 4.5. This element must be a grammar element,
matching the syntax for grammar.
This grammar element is transformed by recursively applying the
rules from this subsection and from previous subsections of this section.
This must not result in a loop. In other words, the transformation of the
grammar element must not require the dereferencing of an
include attribute with an href attribute with the same
value.
Define the components of an element to be the children of the element
together with the components of any div child elements. If the
include element has a start component, then the
grammar element must have a start component. If the
include element has a start component, then all
start components are removed from the grammar element.
If the include element has a define component, then the
grammar element must have a define component with the
same name. For every define component of the include
element, all define components with the same name are removed from
the grammar element.
The include element is transformed into a div element. The
attributes of the div element are the attributes of the
include element other than the href attribute. The
children of the div element are the grammar element (after
the removal of the start and define components described
by the preceding paragraph) followed by the children of the include
element. The grammar element is then renamed to div.
4.8. name attribute of
element and attribute elements
The name attribute on an element or attribute
element is transformed into a name child element.
If an attribute element has a name attribute but no
ns attribute, then an ns="" attribute is added
to the name child element.
For any name, nsName or value element that does
not have an ns attribute, an ns attribute is added. The
value of the added ns attribute is the value of the ns
attribute of the nearest ancestor element that has an ns attribute,
or the empty string if there is no such ancestor. Then, any ns
attribute that is on an element other than name, nsName or
value is removed.
Note
The value of the ns attribute is not transformed either
by escaping disallowed characters, or in any other way, because the
value of the ns attribute is compared against namespace URIs in
the instance, which are not subject to any transformation.
Note
Since include and externalRef elements are resolved
after datatypeLibrary attributes are added but before
ns attributes are added, ns attributes are inherited
into external schemas but datatypeLibrary attributes are
not.
For any name element containing a prefix, the prefix is removed and
an ns attribute is added replacing any existing ns
attribute. The value of the added ns attribute is the value to
which the namespace map of the context of the name element maps the
prefix. The context must have a mapping for the prefix.
Each div element is replaced by its children.
4.12. Number of
child elements
A define, oneOrMore, zeroOrMore,
optional, list or mixed element is transformed so
that it has exactly one child element. If it has more than one child
element, then its child elements are wrapped in a group element.
Similarly, an element element is transformed so that it has exactly
two child elements, the first being a name class and the second being a
pattern. If it has more than two child elements, then the child elements
other than the first are wrapped in a group element.
A except element is transformed so that it has exactly one child
element. If it has more than one child element, then its child elements are
wrapped in a choice element.
If an attribute element has only one child element (a name class),
then a text element is added.
A choice, group or interleave element is
transformed so that it has exactly two child elements. If it has one child
element, then it is replaced by its child element. If it has more than two
child elements, then the first two child elements are combined into a new
element with the same name as the parent element and with the first two
child elements as its children. For example,
<choice> p1 p2 p3 </choice>
is transformed to
<choice> <choice> p1 p2 </choice> p3 </choice>
This reduces the number of child elements by one. The transformation is
applied repeatedly until there are exactly two child elements.
A mixed element is transformed into an interleaving with a
text element:
<mixed> p </mixed>
is transformed into
<interleave> p <text/> </interleave>
An optional element is transformed into a choice with
empty:
<optional> p </optional>
is transformed into
<choice> p <empty/> </choice>
A zeroOrMore element is transformed into a choice between
oneOrMore and empty:
<zeroOrMore> p </zeroOrMore>
is transformed into
<choice> <oneOrMore> p </oneOrMore> <empty/> </choice>
In this rule, no transformation is performed, but various constraints are
checked.
Note
The constraints in this section, unlike the constraints specified in Section 7, can be
checked without resolving any ref elements, and are accordingly
applied even to patterns that will disappear during later stages of
simplification because they are not reachable (see Section 4.19) or because
of notAllowed (see Section 4.20).
An except element that is a child of an anyName element
must not have any anyName descendant elements. An except
element that is a child of an nsName element must not have any
nsName or anyName descendant elements.
A name element that occurs as the first child of an
attribute element or as the descendant of the first child of an
attribute element and that has an ns attribute with
value equal to the empty string must not have content equal to
xmlns.
A name or nsName element that occurs as the first child of
an attribute element or as the descendant of the first child of an
attribute element must not have an ns attribute with
value http://www.w3.org/2000/xmlns.
Note
The [XML Infoset] defines
the namespace URI of namespace declaration attributes to be
http://www.w3.org/2000/xmlns.
A data or value element must be correct in its use of
datatypes. Specifically, the type attribute must identify a
datatype within the datatype library identified by the value of the
datatypeLibrary attribute. For a data element, the
parameter list must be one that is allowed by the datatype (see Section
6.2.8).
For each grammar element, all define elements with the same
name are combined together. For any name, there must not be more than one
define element with that name that does not have a
combine attribute. For any name, if there is a define
element with that name that has a combine attribute with the value
choice, then there must not also be a define element
with that name that has a combine attribute with the value
interleave. Thus, for any name, if there is more than one
define element with that name, then there is a unique value for
the combine attribute for that name. After determining this unique
value, the combine attributes are removed. A pair of definitions
<define name="n">
p1
</define>
<define name="n">
p2
</define>
is combined into
<define name="n">
<c>
p1
p2
</c>
</define>
where
c
is the value of the combine attribute. Pairs of definitions
are combined until there is exactly one define element for each
name.
Similarly, for each grammar element all start elements are
combined together. There must not be more than one start element
that does not have a combine attribute. If there is a
start element that has a combine attribute with the value
choice, there must not also be a start element that
has a combine attribute with the value interleave.
In this rule, the schema is transformed so that its top-level element is
grammar and so that it has no other grammar elements.
Define the in-scope grammar for an element to be the nearest ancestor
grammar element. A ref element refers to a
define element if the value of their name attributes
is the same and their in-scope grammars are the same. A parentRef
element refers to a define element if the value of their
name attributes is the same and the in-scope grammar of the
in-scope grammar of the parentRef element is the same as the
in-scope grammar of the define element. Every ref or
parentRef element must refer to a define element. A
grammar must have a start child element.
First, transform the top-level pattern
p
into <grammar><start>
p
</start></grammar>. Next, rename define
elements so that no two define elements anywhere in the schema have
the same name. To rename a define element, change the value of its
name attribute and change the value of the name
attribute of all ref and parentRef elements that refer to
that define element. Next, move all define elements to be
children of the top-level grammar element, replace each nested
grammar element by the child of its start element and
rename each parentRef element to ref.
4.19. define and
ref elements
In this rule, the grammar is transformed so that every element
element is the child of a define element, and the child of every
define element is an element element.
First, remove any define element that is not reachable. A
define element is reachable if there is reachable ref
element referring to it. A ref element is reachable if it is the
descendant of the start element or of a reachable define
element. Now, for each element element that is not the child of a
define element, add a define element to the
grammar element, and replace the element element by a
ref element referring to the added define element. The
value of the name attribute of the added define element
must be different from value of the name attribute of all other
define elements. The child of the added define element
is the element element.
Define a ref element to be expandable if it refers to a
define element whose child is not an element element.
For each ref element that is expandable and is a descendant of a
start element or an element element, expand it by
replacing the ref element by the child of the define
element to which it refers and then recursively expanding any expandable
ref elements in this replacement. This must not result in a
loop. In other words expanding the replacement of a ref element
having a name with value
n
must not require the expansion of ref element also having a
name with value
n
. Finally, remove any define element whose child is not an
element element.
In this rule, the grammar is transformed so that a notAllowed
element occurs only as the child of a start or element
element. An attribute, list, group,
interleave, or oneOrMore element that has a
notAllowed child element is transformed into a
notAllowed element. A choice element that has two
notAllowed child elements is transformed into a
notAllowed element. A choice element that has one
notAllowed child element is transformed into its other child
element. An except element that has a notAllowed child
element is removed. The preceding transformations are applied repeatedly
until none of them is applicable any more. Any define element that
is no longer reachable is removed.
In this rule, the grammar is transformed so that an empty element
does not occur as a child of a group, interleave, or
oneOrMore element or as the second child of a choice
element. A group, interleave or choice element
that has two empty child elements is transformed into an
empty element. A group or interleave element
that has one empty child element is transformed into its other
child element. A choice element whose second child element is an
empty element is transformed by interchanging its two child
elements. A oneOrMore element that has an empty child
element is transformed into an empty element. The preceding
transformations are applied repeatedly until none of them is applicable any
more.
After applying all the rules in Section 4, the schema will match the following
grammar:
| grammar | ::= | <grammar>
<start> top
</start>
define* </grammar> | | define | ::= |
<define
name="
NCName
"
>
<element
>
nameClass
top
</element>
</define>
| | top | ::= | <notAllowed/> | pattern | | pattern | ::= | <empty/> |
nonEmptyPattern | | nonEmptyPattern | ::= | <text/> | <data
type="NCName"
datatypeLibrary="anyURI">
param* [exceptPattern]
</data> | <value
datatypeLibrary="anyURI"
type="NCName"
ns="string">
string </value> |
<list> pattern
</list> | <attribute>
nameClass
pattern </attribute> | <ref
name="NCName"/> |
<oneOrMore>
nonEmptyPattern </oneOrMore> |
<choice> pattern
nonEmptyPattern </choice> |
<group> nonEmptyPattern
nonEmptyPattern </group> |
<interleave> nonEmptyPattern
nonEmptyPattern </interleave> | | param | ::= |
<param
name="
NCName
"
>
string
</param>
| | exceptPattern | ::= |
<except
>
pattern
</except>
| | nameClass | ::= | <anyName> [exceptNameClass]
</anyName> | <nsName
ns="string">
[exceptNameClass] </nsName> |
<name
ns="string">
NCName </name> |
<choice> nameClass
nameClass </choice> | | exceptNameClass | ::= |
<except
>
nameClass
</except>
|
With this grammar, no elements or attributes are allowed other than those
explicitly shown.
The following is an example of how the schema in Section 3.1 can be
transformed into the simple syntax:
<?xml version="1.0"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
<start>
<ref name="foo.element"/>
</start>
<define name="foo.element">
<element>
<name ns="">foo</name>
<group>
<ref name="bar1.element"/>
<ref name="bar2.element"/>
</group>
</element>
</define>
<define name="bar1.element">
<element>
<name ns="http://www.example.com/n1">bar1</name>
<empty/>
</element>
</define>
<define name="bar2.element">
<element>
<name ns="http://www.example.com/n2">bar2</name>
<empty/>
</element>
</define>
</grammar>
Note
Strictly speaking, the result of simplification is an instance of the
data model rather than an XML document. For convenience, we use an XML
document to represent an instance of the data model.
In this section, we define the semantics of a correct RELAX NG schema that has
been transformed into the simple syntax. The semantics of a RELAX NG schema
consist of a specification of what XML documents are valid with respect to that
schema. The semantics are described formally. The formalism uses axioms and
inference rules. Axioms are propositions that are provable unconditionally. An
inference rule consists of one or more antecedents and exactly one consequent.
An antecedent is either positive or negative. If all the positive antecedents
of an inference rule are provable and none of the negative antecedents are
provable, then the consequent of the inference rule is provable. An XML document
is valid with respect to a RELAX NG schema if and only if the proposition that
it is valid is provable in the formalism specified in this section.
Note
This kind of formalism is similar to a proof system. However, a traditional
proof system only has positive antecedents.
The notation for inference rules separates the antecedents from the consequent by
a horizontal line: the antecedents are above the line; the consequent is below
the line. If an antecedent is of the form not(
p
), then it is a negative antecedent; otherwise, it is a positive antecedent.
Both axioms and inferences rules may use variables. A variable has a name and
optionally a subscript. The name of a variable is italicized. Each variable
has a range that is determined by its name. Axioms and inference rules are
implicitly universally quantified over the variables they contain. We explain
this further below.
The possibility that an inference rule or axiom may contain more than one
occurrence of a particular variable requires that an identity relation be
defined on each kind of object over which a variable can range. The identity
relation for all kinds of object is value-based. Two objects of a particular
kind are identical if the constituents of the objects are identical. For
example, two attributes are considered the same if they have the same name and
the same value. Two characters are identical if their Unicode character codes
are the same.
The main semantic concept for name classes is that of a name belonging to a
name class. A name class is an element that matches the production
nameClass. A name is as defined in Section 2: it consists of a namespace URI and
a local name.
We use the following notation:
-
n
- is a variable that ranges over names
-
nc
- ranges over name classes
-
n in nc
- asserts that name n is a member of
name class nc
We are now ready for our first axiom, which is called "anyName 1":
This says for any name n, n belongs to the name class
<anyName/>, in other words
<anyName/> matches any name. Note the
effect of the implicit universal quantification over the variables in the
axiom: this is what makes the axiom apply for any name n.
Our first inference rule is almost as simple:
| (anyName 2) |
| not(n in nc) |
| n in
<anyName>
<except>
nc
</except>
</anyName> |
|
This says that for any name n and for any name class nc, if
n does not belong to nc, then n belongs to
<anyName>
<except>
nc
</except>
</anyName>. In other words,
<anyName>
<except>
nc
</except>
</anyName> matches any name that does not match
nc.
We now need the following additional notation:
-
ln
- ranges over local names; a local name is a
string that matches the NCName production of [XML Namespaces], that is, a name
with no colons
-
u
- ranges over URIs
-
name( u, ln )
- constructs a name with URI u and local
name ln
The remaining axioms and inference rules for name classes are as follows:
| (nsName 1) |
name( u, ln ) in
<nsName
ns="u"/> |
|
| (nsName 2) |
| not(name( u, ln ) in nc) |
| name( u, ln ) in
<nsName
ns="u">
<except>
nc
</except>
</nsName> |
|
| (name) |
name( u, ln ) in
<name
ns="u">
ln
</name> |
|
| (name choice 1) |
| n in nc1 |
| n in
<choice>
nc1
nc2
</choice> |
|
| (name choice 2) |
| n in nc2 |
| n in
<choice>
nc1
nc2
</choice> |
|
The axioms and inference rules for patterns use the following notation:
-
cx
- ranges over contexts (as defined in Section 2)
-
a
- ranges over sets of attributes; a set with a
single member is considered the same as that member
-
m
- ranges over sequences of elements and strings; a
sequence with a single member is considered the same as that member;
the sequences ranged over by m may contain consecutive
strings and may contain strings that are empty; thus, there are
sequences ranged over by m that cannot occur as the children
of an element
-
p
- ranges over patterns (elements matching the
pattern production)
-
cx |- a; m =~
p
- asserts that with respect to context
cx, the attributes a and the sequence of elements and
strings m matches the pattern p
The semantics of the choice pattern are as follows:
| (choice 1) |
| cx |- a; m =~
p1 |
| cx |- a; m =~
<choice>
p1
p2
</choice> |
|
| (choice 2) |
| cx |- a; m =~
p2 |
| cx |- a; m =~
<choice>
p1
p2
</choice> |
|
We use the following additional notation:
-
m1,
m2
- represents the concatenation of the
sequences m1 and m2
-
a1 +
a2
- represents the union of a1
and a2
The semantics of the group pattern are as follows:
| (group) |
| cx |- a1;
m1 =~ p1 | cx |-
a2; m2 =~
p2 |
| cx |- a1 +
a2; m1,
m2 =~
<group>
p1
p2
</group> |
|
Note
The restriction in Section 7.3 ensures
that the set of attributes constructed in the consequent will not
have multiple attributes with the same name.
We use the following additional notation:
-
( )
- represents an empty sequence
-
{ }
- represents an empty set
The semantics of the empty pattern are as follows:
| (empty) |
cx |- { }; ( ) =~
<empty/> |
|
We use the following additional notation:
The semantics of the text pattern are as follows:
| (text 1) |
cx |- { }; ( ) =~
<text/> |
|
| (text 2) |
cx |- { }; m =~
<text/> |
| cx |- { }; m, s =~
<text/> |
|
The effect of the above rule is that a text element matches zero
or more strings.
We use the following additional notation:
-
disjoint(a1,
a2)
- asserts that there is no name that is the
name of both an attribute in a1 and of an
attribute in a2
The semantics of the oneOrMore pattern are as follows:
| (oneOrMore 1) |
| cx |- a; m =~ p |
| cx |- a; m =~
<oneOrMore>
p
</oneOrMore> |
|
| (oneOrMore 2) |
| cx |- a1;
m1 =~ p | cx |-
a2; m2 =~
<oneOrMore>
p
</oneOrMore> | disjoint(a1,
a2) |
| cx |- a1 +
a2; m1,
m2 =~
<oneOrMore>
p
</oneOrMore> |
|
6.2.6. interleave
pattern
We use the following additional notation:
-
m1 interleaves
m2; m3
- asserts that m1 is
an interleaving of m2 and m3
The semantics of interleaving are defined by the following rules.
| (interleaves 2) |
| m1 interleaves
m2; m3 |
| | m4,
m1 interleaves
m4, m2;
m3 |
|
| (interleaves 3) |
| m1 interleaves
m2; m3 |
| | m4,
m1 interleaves
m2; m4,
m3 |
|
For example, the interleavings of <a/><a/> and
<b/> are <a/><a/><b/>,
<a/><b/><a/>, and
<b/><a/><a/>.
The semantics of the interleave pattern are as follows:
| (interleave) |
| cx |- a1;
m1 =~ p1 | cx |-
a2; m2 =~
p2 | m3
interleaves m1;
m2 |
| cx |- a1 +
a2; m3 =~
<interleave>
p1
p2
</interleave> |
|
Note
The restriction in Section 7.3 ensures
that the set of attributes constructed in the consequent will not
have multiple attributes with the same name.
|
|
|