In this section I will compare how the same requirements can be expressed using the
strict DTD models where validation is done by an XML parser and using the loose JPTS
models with validation performed by Schematron.
Attribute values (enumerated list)
Requirement. Article type is required and can be one of three types:
a regular article (rga), a correction (cor), or an editorial (edt).
Strict DTD
<!ATTLIST article
article-type
(rga | cor | edt) #REQUIRED >
JPTS
<!ATTLIST article
article-type
CDATA #IMPLIED >
XML instance (contains nonallowed article type)
<article article-type='xxx'/>
Schematron
<rule context="article">
<assert test="@article-type=('rga','cor','edt')">
@article-type '<value-of select='@article-type'/>'
not allowed, must be 'rga', 'cor', or 'edt'</assert></rule>
Schematron message
@article-type 'xxx' not allowed, must be 'rga', 'cor', or 'edt'
Element position and sequence
Requirement. If a journal has a subject grouping (such as table of
contents category or a disciplinary subset) and an article belongs to a special
collection (such as a one-time special section or an ongoing theme), then the
subject grouping metadata must precede the special collection metadata.
Strict DTD
<!ELEMENT article-categories
(subject-group*,
special-collection?) >
JPTS
<!ELEMENT article-categories
(subj-group*) >
XML instance (wrong sequence of subject groups)
<article-categories>
<subj-group subj-group-type="special-section">
<subject content-type="EARLYWARN1">New Methods and
Applications of Earthquake Early Warning</subject>
</subj-group>
<subj-group subj-group-type="toc-category">
<subject content-type="SDE">Solid Earth</subject>
</subj-group>
</article-categories>
Schematron
<rule context="article-categories/subj-group[@subj-group-type=
('special-section','theme')]">
<assert test="not(following-sibling::subj-group[@subj-group-type=
('toc-category','subset')])">
<name/>/@subj-group-type='<value-of select='@subj-group-type'/>'
must appear after a ToC Category or a Subset when either is
present</assert></rule>
Schematron message
subj-group/@subj-group-type='special-section' must appear after
a ToC Category or a Subset when either is present
References
Validating bibliographic references presents a particular challenge given, on the
one hand, their variety and, on the other, the need to enforce house style. On
the one end of the spectrum is a strict approach where the DTD prescribes the
fixed order of elements and allows for no mixed content. In this model, the
punctuation, spacing, and face markup are generated on output.
Strict DTD
<!ELEMENT book-standalone-citation
((person-group | string-name),
year,
source,
edition?,
(person-group | string-name)?,
size?,
elocation-id?,
publisher-name,
publisher-loc) >
<!ATTLIST book-standalone-citation
id ID #REQUIRED >
On the other end of the spectrum is JPTS's
mixed-citation element, which allows for any number
of elements in any order mixed with the character data.
JPTS
<!ELEMENT mixed-citation
(#PCDATA | person-group | string-name |
year | source | edition | size |
elocation-id | publisher-name |
publisher-loc | ... | ...)* >
<!ATTLIST mixed-citation
id ID #IMPLIED
publication-type CDATA #IMPLIED >
Example:
Mood, A. M., and F. A. Graybill (1963), Introduction to the Theory
Statistics, 2nd ed., 295 pp., McGraw-Hill, New York.
XML instance (strict DTD)
<book-standalone-citation id="mood63">
<person-group person-group-type="author">
<name><surname>Mood</surname>
<given-names>A. M.</given-names></name>
<name><surname>Graybill</surname>
<given-names>F. A.</given-names></name>
</person-group>
<year>1963</year>
<source>Introduction to the Theory Statistics</source>
<edition>2nd</edition>
<size units="page">295 pp<size/>
<publisher-name>McGraw-Hill</publisher-name>
<publisher-loc>New York</publisher-loc>
</book-standalone-citation>
XML instance (JPTS)
<mixed-citation publication-type="book-standalone">
<string-name>
<surname>Mood</surname>, <given-names>A. M.</given-names>,
</string-name>
and
<string-name>
<given-names>F. A.</given-names> <surname>Graybill</surname>
</string-name>
(<year>1963</year>),
<source><italic>Introduction to the
Theory Statistics</italic></source>,
<edition>2</edition>nd ed.,
<size units="page">295</size> pp.,
<publisher-name>McGraw-Hill</publisher-name>,
<publisher-loc>New York</publisher-loc>.
</mixed-citation>
One could use Schematron to check that the required elements are present
<rule context="mixed-citation[@publication-type='book-standalone']">
<assert test="(person-group | string-name) and year and source
and publisher-name and publisher-loc">
required element missing</assert></rule>
and that the elements are in the correct sequence.
XML instance (JPTS) (edition is in the wrong place)
<mixed-citation publication-type="book-standalone">
<string-name>
<surname>Mood</surname>, <given-names>A. M.</given-names>,
</string-name>
and
<string-name>
<given-names>F. A.</given-names> <surname>Graybill</surname>
</string-name>
(<year>1963</year>),
<edition>2</edition>nd ed.,
<source><italic>Introduction to the
Theory Statistics</italic></source>,
<size units="page">295</size> pp.,
<publisher-name>McGraw-Hill</publisher-name>,
<publisher-loc>New York</publisher-loc>.
</mixed-citation>
The following fragment uses positional predicate [1] to check that
year is immediately followed by
source.
Schematron
<rule context="mixed-citation[@publication-type=
'book-standalone']/year">
<assert test="following-sibling::*[1]/self::source">
'<name/>' must be followed by 'source', not by '<value-of
select='name(following-sibling::*[1])'/>'</assert></rule>
Schematron message
'year' must be immediately followed by 'source', not by 'edition'
But how to check the sequence of required elements when there might be optional
elements interspersed between them? The following fragment checks that required
publisher-name is preceded by required
source, any optional elements that may occur
in-between notwithstanding:
Schematron
<rule context="mixed-citation[@publication-type=
'book-standalone']/publisher-name">
<assert test="preceding-sibling::source">
'<name/>' must be preceded by 'source'</assert></rule>
There is, however, a more elegant approach suggested by Rick Jelliffe, the
inventor of Schematron, which can be used to combine the flexibility of the JPTS
citation model with the benefits of the strict element order a structured DTD
may offer. In this ingenious method, each element is rewritten as a string of
its element names, and the content model is represented as a regular expression.
Then a Schematron checks the string of element names against the regular
expression.
Thus, one may have an XML file, e.g., citation-models.xml,
where all allowed structured citation models are specified:
...
<model publication-type="book-standalone">
((string-name | person-group),
year,
source,
edition,
(string-name | person-group)?,
size?,
elocation-id?,
publisher-name,
publisher-loc)
</model>
...
The Schematron generates an error or a warning message if the content does not
match the model. The method offers a number of advantages:
The caveat here is, however, that implementing this approach requires a
clever use of XSLT 2.0.