Article XML captures not only article-specific metadata such as the article title and the abstract, but
also additional information about the issue, the volume, and the journal that this article is published
under. We propose the extraction of all this non-article information in one place: the Issue XML. We have
built a separate IssueXml.dtd on top of the JATS Tag Suite and ported our entire hosted content which includes
hundreds of thousands of issues to use that. We view the Issue XML as the natural next step after having
defined the article XML, which helps us build a layered structure of articles and issues, much like these
are encountered in the real world. In practice, the Issue XML helps us eliminate journal and issue metadata
redundancy from the article XMLs, and allows for a more natural encoding of the Table of Contents, complete
with its issue title, section headings, redefined article metadata, and finally the cover image and its
associated caption.
The JATS Tag Suite1 was designed with the article as its
centerpiece. It defines a collection of elements and attributes that capture the metadata and the content
of a journal article. The focus on the article publication is justified both in terms of the complexity
of accurately capturing all elements of an article, but also because the article remains the single most
granular publishable entity in modern online publishing systems.
If we look closer into the front-matter metadata of an article, we will notice that not all of them are
attributes of the same publication object. There are metadata in there that describe properties of the article
itself. Examples of such metadata are the authors that contributed to the work, the article title, the abstract,
and the date it was accepted for publication. On the other hand, there is information encoded in the article
front-matter that does not pertain to the article itself, but rather to its publication environment. For example,
the full title and the ISSN of the journal where this work is published, or the volume and the issue number of
the printed issue which includes this article. Such information is also captured in the article front matter, but
actually refers to distinct publication entities with which the article maintains a relationship. The same can
be said for the first and last page of the printed article or the Table of Contents heading above the article
entry, which depend on the issue where the article gets printed, rather than the article itself.
Yet, all this information concerning the article and the issue where the article is published is all bundled
together in the article XML front-matter. Similarly, all other articles published in the same issue with this
article all share the same journal information, the same publisher information, the same volume/issue
designation and finally the same coverdate. If one asks which articles belong to an issue, then locating these
by virtue of the article XMLs can only be effected by collecting all XMLs with matching journal title, volume
and issue numbers. On the other hand, for publishers who wish to present an online version of the Table of
Contents, there are several more elements that must be captured, which will burden further each and every
article XML. Examples of such elements are the issue title, the Table of Contents headings and the cover image
and caption.
As publishers go along this route enhancing article XMLs with elements needed for their online publishing, these
XMLs tend to get bloated with metadata that do not really belong with the article, and yet this metadata is
replicated across all articles of an issue. This repetition is known to be the source of error and inconsistency.
When inconsistency occurs, it is very difficult to determine which set of metadata is correct and which is
incorrect. This is because there is no single article XML which acts as the authoritative source for these
metadata. And when corrections need to be applied, these must be applied across all article XMLs.
The Case of an Issue-Level XML
There is an implicit assumption in the DTD that an article can only be published in a single journal and
appear as part of a single issue. This is why in JATS DTDs, <journal-meta>
can appear at most once inside <front> and
<volume> and <issue> can appear at
most once inside <article-meta>. In present-day publishing environments,
this strict one-article-to-one-issue correspondence is fading rapidly. An article may start as an online-only
'early' issue, then potentially shape up as part of a 'forthcoming' online issue, then appear in a regular
printed issue and finally become member of one or more collections of virtual issues, which are collections
of articles destined for specialized readerships.
It is the same article with the exact same content in all of these publishing instances. Why should one be
forced to update the article metadata each time to reflect the different journal metadata and issue metadata?
Why can't all the information which is not directly related to the article itself, but rather to the publishing
context of the article be extracted out of the article XML, so that:
the article XML need only change when the information of the article itself changes. For
example when the full text body of the article becomes known, or when the references section is made
available.
information about where the article was published each time remains independent and maintained
authoritatively in a single place
metadata about the issue which includes the article is not reproduced in each article XML of
the issue.
These are the questions that triggered the concept of the issue-level XML. It is a single authoritative
XML containing all the metadata of an issue publication. It includes information such as the journal title,
ISSN, publisher, volume and issue numbers, issue title, publication date, and whatever other information can
be associated with an issue, such as editors and page ranges. In addition, it maintains a list of all articles
contained in the issue, along with their first/last page numbers (if it's a printed issue) as well as all the
information necessary to properly generate the Table of Contents of the issue. Finally, it includes information
which uniquely identifies/retrieves the issue such as the issue DOI and the self URI, as well as the list of
files associated with it, such as an issue-wide PDF, the PDF of the Table of Contents, etc.
The notion of the issue-level XML does not necessarily refer to a printed issue. In today's online publishing
platforms, publishers are becoming increasingly creative in introducing numerous other collections of articles,
either prior to the actual publication of the printed issue or occasionally after it. It is also not uncommon
that an article never ends up in a printed issue, but populates solely online-only collections. Any such
collection of articles is considered an issue in the context of our discussion and assumes an Issue XML
keeping all of the collection metadata in a single place.
Now, when an article belongs to two or more collections, you do not have to go back to the article XML to
update it with the metadata of these collections. Similarly, when an article is no longer part of a
collection, you do not need to modify the article XML to reflect the fact that the article no longer
belongs somewhere. Maintaining collection metadata inside the article is like trying to solve the problem the
wrong way around. An article can stand on its own without being part of any collection and without any need
of any journal or issue metadata. But once it becomes part of an issue, this new publication, the issue,
comes into play too and it is no lesser citizen. The issue possesses its own share of metadata (journal,
publisher, etc), including among others the identities of the articles that constitute it. Therefore,
it is in this direction that the relationship between issues and articles must be setup and not the other
way around. Following this logic, articles can freely become members of collections and drop out of
collections without triggering any change to the article XML.
A typical Issue XML for a printed issue is illustrated in Figure 1.

Issue XML sample for a printed issue
Note that the Issue XML essentially consists of three elements, the
<journal-meta> element, the <issue-meta>
element and the <toc> element. The
<journal-meta> element is identical to the corresponding element in the
article XML. It defines all metadata relating to the enclosing journal publication and its publishing institution.
The <issue-meta> element collects all issue-level information defined
in the article XML front matter. Finally, the <toc> element, in its simplest
form lists the articles contained in this issue referencing them via a unique identifier (in this case the DOI).
The <toc> element not only captures the list of articles included
in the issue, but also encodes the order that these elements will assume when rendering the Table of Contents.
The article XML for these articles may now avoid including all the above issue metadata in their front matter,
thus eliminating redundancy and decoupling the articles from their publishing context. The same article can
become part of another collection by simply including it in the Issue XML of that other collection. The
important thing to note is that by decoupling issue metadata from article metadata, the article XML does not
need to change every time the article becomes part of a new issue or a new article collection. In essence,
an Issue XML includes references to all articles it contains, whereas an article reference may be part
of one or more Issue XMLs. This model offers maximum flexibility in today's online publishing systems, where
an article may change issues and/or find itself as part of multiple collections.
Ahead of Print Collections
When an article is electronically made available as soon as it goes through the peer review process, it
is typically not part of a specific printed issue yet. At this early stage, it is part of an 'ahead-of-print'
issue for the journal, which consists of all such early articles not yet committed to a printed issue. The
XML excerpt of Figure 2 illustrates the Issue XML for an
ahead-of-print collection.

Issue XML sample for an Published-Ahead-of-Print article collection
Note the issue-type attribute value which can be used to distinguish between different types of issues, such
as in this case where we are dealing with an 'ahead-of-print' issue. The Issue XML encodes only the metadata
available for an issue, which in this case does not include a publication date, a volume/issue designation or
any other such information typically not available for ahead-of-print issues. As these issues grow and shrink,
when new articles arrive for publication or a number of older articles is reassigned to a print issue, the
Issue XML for the ahead-of-print bunch adapts to its constituent articles at any point in time. Since this
typically proves tedious, Issue XML for ahead-of-print collections is managed under the scenes with the
system automatically updating the Issue XML when an article joins or when an article leaves the bunch.
Electronic-only Journals and Virtual Journals
Similar in nature to the ahead-of-print logic, is the case for electronic-only journals or e-journals, that
is, journals with no printed counterpart. As in the case of ahead-of-print articles, it is typical for
such publications to publish articles as soon as they become available. The requirement to associate each
article with an Issue XML does not interfere with the different nature of these publications, as long as
the Issue XML is kept current with newly e-published entries. Again, this is best handled automatically
by the system, rather than by manually maintaining an updated version of the Issue XML on a busy
electronic-only title. On the other hand, the publisher may choose to form collections of the published
articles based either on subject classification, or follow some other ad hoc pattern to form a new
collection. An Issue XML will lie at the heart of each such collection as it grows and shrinks over time.
On a similar note, the <journal-meta> element can be omitted
altogether when the Issue XML encodes a multi-disciplinary, multi-journal collection of articles
specifically designed with a focused readership in mind. An example of an Issue XML for a virtual issue
is presented in Figure 3.

Issue XML sample for a virtual issue
However, publishers will often define a virtual journal in such a case, which encompasses the scientific
areas of the selected articles, originally published elsewhere. When this happens, the
<journal-meta> element will include the metadata of the newly defined
virtual journal. Virtual journals may not necessarily include articles from a single publisher,
but may actually consist of a multi-publisher collection of articles towards a common theme.
The separation of issue-level metadata into their own distinct XML lends itself to a more straightforward
capturing of information relating to the issue itself, such as:
the issue editors
the issue title
the sequence of the articles in the issue
the issue page range and page count
the issue DOI and self URI
the issue cover image and its associated caption
supplementary information relating to the issue
links to issues related to this issue
the Table of Contents (ToC) headings
different alternative ToCs for an issue
Atypon has been the among the first houses to fully adopt the JATS Archiving & Interchange
DTD2 across
its entire online publishing platform in 2003. For the purposes of Literatum, Atypon's e-publishing
platform3 we have
built an issue-level DTD out of JATS Tag Suite elements and a handful of new elements used mostly as
wrapper tags. This is defined as a distinct DTD root called "IssueXml.dtd", based entirely on existing
JATS elements. The full DTD appears in Figure 4.
It is important to note that the flexibility and reusability of the JATS Tag Suite elements allowed for
a straightforward implementation of this independent offspring DTD. Therefore, the resulting Issue XML
meshes quite naturally with the referenced article XMLs.
By virtue of this issue-level DTD, we have back-ported over 11 million articles belonging to over 600
thousand issues from over 6,000 journals of our hosted content to use this layered XML encoding of article
versus issue metadata. New submissions include an issue-level XML. Typically, issue-level XML is very
simple and merely avoids the repetition of issue-level information on each article XML. For articles that
are not yet part of a printed issue, such as is the case of 'ahead-of-print' articles, there is still an
Issue XML associated with these, but this need not be bundled with the submission. The platform creates
and maintains the Issue XML automatically under the scenes for such issues. This may take away some of the
flexibility of assigning custom metadata at the article collection level, but one is relieved from the
requirement to maintain an Issue XML for a collection which is constantly updated with article entries
being added and/or subtracted continuously.
Encoding the Table of Contents
The Table of Contents plays a central role in both print and online publishing. When only article XMLs are
available, rendering the Table of Contents presents several challenges:
Issue metadata are collected from across all article XMLs comprising the issue. Due to replication,
these metadata tend to be inconsistent across articles of the same issue, leading to difficult decisions
when it comes to choosing one version to render.
Only a limited number of issue metadata is made available as part of the article XML, therefore
resorting to external ad hoc resources to capture information such as the cover image and caption,
the issue first page/last page, or the issue self-URI and DOI.
The sequence of articles presented in the Table of Contents is typically implied through first page/last
page information and gives little room for a custom order, or when multiple articles start on the same page,
or for special articles with latin page numbering.
Occasionally, publishers wish to present each article entry in the Table of Contents in a different
fashion than the article itself appears in its own page. So, for example, the contributors may appear with
initials rather than with their first names spelled out entirely. Or only the first contributor is shown
for each article followed by an "et al." notation. When article metadata in the Table of Contents
is not an exact copy of those presented for the article itself, rendering of the Table of Contents introduces
challenges.
With the introduction of Issue XML, all of the above problems associated with the proper rendering of
the Table of Contents are addressed. All the information necessary to build the Table of Contents is
readily retrievable from either issue or article level metadata.
Problem #1 above is eliminated owing to the lack of redundancy achieved by means of collecting all issue
metadata in a single place rather than maintaining multiple copies of the same information across
different XMLs. Problem #2 is taken care of by defining in the Issue XML DTD a diverse collection of
issue-level metadata, over and beyond what is currently allowed by the
<article-meta> element of the JATS Tag Suite. And this, without actually
introducing new elements not already present in the Tag Suite. For instance, let us focus on the DTD
definition of element <issue-meta>:
<!ELEMENT issue-meta ( contrib-group*, author-notes?,
pub-date*,
series-title*, series-text?,
volume?, volume-id*, volume-series?,
issue?, issue-id*, issue-title*, issue-sponsor*,
issue-part?, supplement?,
fpage?, lpage?, page-range?,
copyright-statement?, copyright-year?, permissions?,
self-uri*, related-issue*, kwd-group*,
funding-group*, conference*,
counts?, custom-meta-group? ) >
Most of its constituent elements (with the exception of
<related-issue>
which is the issue-level form of
<related-article>) are defined by the
standard JATS Tag Suite. However, all of these elements are now associated directly with the issue and not
with the article. So, for example,
<contrib-group> can now fully define
the editors of an issue and
<fpage>,
<lpage>
and
<page-count> now refer to the corresponding page information for the
issue. Similarly,
<kwd-group> refers to keywords that characterize the
entire issue and
<counts> collects counting statistics referring to the
entire issue. Finally,
<custom-meta-group> plays the same 'catch-all'
role it serves under
<article-meta>, only in this case for issue-level
information that cannot be captured under any other element.
The only new element in the above definition is <related-issue>,
which is not really new, simply a renaming of the corresponding article metadata element
<related-article>. In the context of
<issue-meta> it is used to link to other issues that maintain a
relationship with the current issue. It uses the same model as
<related-article> though:
<!ELEMENT related-issue ( #PCDATA %related-article-elements; )* >
Problem #3 from the list above can be dealt with easily because article references appear explicitly
in the Issue XML. Therefore, it takes a mere reordering of these references in the Issue XML to
effect a corresponding change of their appearance sequence in the Table of Contents. So, for
instance, in the Issue XML sample of Figure 1, it would simply require
a reordering of the article references under the <toc> element to achieve
a Table of Contents entry rearrangement:
... ...
<toc>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/pres.100387</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/pres.100370</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/pres.100361</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/pres.100340</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/pres.100321</article-id>
</toc-article-meta>
</toc>
... ...
The above <toc> element will essentially reverse the appearance
of articles in the Table of Contents. On a similar note, one can just as easily decorate the Table of
Contents with headings using the <toc-subject-group> element. An example
<toc> element which presents the above five articles into two groups
of articles with appropriate headings is shown below:
... ...
<toc>
<toc-subject-group>
<toc-subject-title>
<subject>Research Articles</subject>
</toc-subject-title>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100387</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100370</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100361</article-id>
</toc-article-meta>
</toc-subject-group>
<toc-subject-group>
<toc-subject-title>
<subject>Brief Communications</subject>
</toc-subject-title>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100340</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100321</article-id>
</toc-article-meta>
</toc-subject-group>
</toc>
... ...
The example shown above presents two headings at the same level. Headings need not all appear at the same
level. Owing to <toc-subject-group> element's recursive definition,
arbitrary nesting of headings in a Table of Contents is allowed.
<!ELEMENT toc-subject-group ( toc-subject-title, ( toc-article-meta | p | toc-subject-group )*) >
The example below shows headings nested in two levels, typical for listings arranged by subject heading:
... ...
<toc>
<toc-subject-group>
<toc-subject-title>
<subject>Biology</subject>
</toc-subject-title>
<toc-subject-group>
<toc-subject-title>
<subject>Cell Biology</subject>
</toc-subject-title>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100370</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100361</article-id>
</toc-article-meta>
</toc-subject-group>
<toc-subject-group>
<toc-subject-title>
<subject>Molecular Biology</subject>
</toc-subject-title>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100340</article-id>
</toc-article-meta>
<toc-article-meta>
<article-id pub-id-type="doi">10.1162/100321</article-id>
</toc-article-meta>
</toc-subject-group>
</toc-subject-group>
</toc>
... ...
The presence of <p> elements interspersed with the article entries
supports cases where arbitrary text is present in the Table of Contents. This addition was prompted by
specific use cases of annotated Table of Content entries, but in our installed base we have also come
across several other notes that precede or follow an article entry in a Table of Contents. Likewise,
headings are occasionally annotated with text that precede them or follow them, which has prompted the
introduction of zero or more <p> elements to the
<toc-subject-title> element:
<!ELEMENT toc-subject-title ( p*, subject, p*, trans-toc-subject* ) >
Notice the presence of element <trans-toc-subject> in the above
definition for <toc-subject-title>. This helps define multi-lingual Table
of Contents headings, such as those requested for some of the publications of the National Research Council,
a Canadian publisher wishing to display bilingual English/French Table of Contents headings. Here is how
<trans-toc-subject> element has been employed for this purpose:
... ...
<toc xml:lang="en">
<toc-subject-group>
<toc-subject-title>
<subject>Biochemistry</subject>
<trans-toc-subject xml:lang="fr">
<subject>Biochimie</subject>
</trans-toc-subject>
</toc-subject-title>
<toc-article-meta>
<article-id pub-id-type="doi">10.1139/z09-115</article-id>
</toc-article-meta>
... ...
</toc-subject-group>
... ...
</toc>
... ...
This use is much inline with the use of elements
<trans-abstract> and
<trans-title> to represent multi-lingual article metadata.
Finally, for those cases where an article entry needs to be highlighted, specially indented, fine-printed
or hidden from the Table of Contents, one can make use of a format attribute in
<toc-article-meta> element. For instance, the XML excerpt:
<toc-article-meta format="invisible">
<article-id pub-id-type="doi">10.1162/100361</article-id>
</toc-article-meta>
designates that the specific article entry must be eliminated from the rendered Table of Contents. There
was a use case from a University of Chicago Press title, where an "
Answer to a Photo Quiz"
article was requested to be absent from the Table of Contents. In another use case, an Institution Investor
journal wanted to present a side-table with specific highlighted articles selected from the issue.
Problem #4 is fully addressed by the versatility of the <toc> element
of the Issue XML DTD. So far, we have only used the <toc-article-meta>
element to point to an article entry in the Table of Contents. A brief look at the definition of its DTD
model shows that this element is capable of a fully custom representation of an article entry:
<!ENTITY % toc-article-meta-model
"( article-id+, title-group?,
( contrib-group | aff | %x.class; )*, author-notes?,
( (fpage?, lpage?, page-range?) | elocation-id )?,
( %address-link.class; | product | supplementary-material)*,
history?, self-uri*, related-article*,
abstract*, trans-abstract*, toc-graphic*,
counts? )" >
This model is an abridged version of the regular %article-meta-model; defined by the JATS Tag
Suite. It only includes those elements from that original model which make sense in the context of the
Table of Contents article entry. Any of these elements defined inside
<toc-article-meta> are meant to override the
corresponding values that these elements assume from the article XML for the purposes of the Table of
Contents rendering. shows an example of two articles from
the New England Journal of Medicine whose entry in the Table of Contents displays authors unlike how these
same authors appear on the articles themselves4.
Two articles with different author presentation in the Table of Contents than on the article page
But it is a lot more than the article title and the contributors that one can alter through
<toc-article-meta>. Other elements include the pages, the article
history dates, the TOC version of the abstract and/or the list of related articles links. There is a single
element in the <toc-article-meta> model which does not derive directly
from JATS. This is the <toc-graphic> element and it was introduced
for publishers, like the American Chemical Society, who associate article entries in the Table of
Contents with a small graphic image which decorates the article
entry5.
Multiple Table of Contents
Another feature which is occasionally encountered in publishing is the existence of two or more Table
of Contents for the same issue. For instance, the Proceedings of the National Academy of Sciences
publishes one Table of Contents where articles are arranged according to the subject categorization
scheme6 and another where articles are listed according
to their author lastnames7. Canadian
publishers and other multi-lingual publications occasionally need to display distinct Table of Contents
in different languages. The <toc> element of Issue XML comes in handy
in situations such as these where multiple Tables of Content are called for. Notice that the DTD defines
that at least one <toc> element can appear in the Issue XML:
<!ELEMENT issue-xml ( journal-meta?, issue-meta, toc+ ) >
The sample XML of Figure 6 showcases such an issue with two
distinct Table of Contents, achieved through the use of two separate <toc>
elements.

An Issue XML sample with two distinct Table of Contents
One of the major design goals of the JATS Archive and Article Interchange DTD was to facilitate scholarly
content interchange between different parties in the publishing world. The introduction of Issue XML DTD
is quite compatible with this direction. It isolates issue metadata from article metadata but at the same
time maintains links between the two in a way that reflects more accurately the network of articles and
their collections, as these exist in publishing.
However, the Issue XML DTD must be integrated into a future version of the JATS Tag Suite and it must be
standardized along with the rest of the JATS Tag Sets before it is widely accepted by publishers,
aggregators and other interested third parties alike. In the first version of the Issue XML DTD, we have
catered for the needs of our 11 million article content store. It is possible that there are additional
features and potentials which have not been explored by this initial prototype. Similar efforts by other
publishers/aggregators can contribute to a complete proposal to be integrated in an upcoming version of
the JATS Tag Suite.
Besides, the idea of separating article metadata from issue metadata when interchanging content is
hardly anything new. In the CONTRAST standard (content
Transport Standard) developed by Elsevier Science along with a number
of U.S. Universities, there is also a clear separation between the article metadata and all information
pertaining to the issue and its associated files8.
For the cases, where article interchange requires inclusion of journal and issue metadata in the article
XML, all it takes is a simple stylesheet to combine the article XML with one of the Issue XMLs
including that article before delivering this merged XML document to a third party. So this is a simple
task, but at the same time the choice of the Issue XML to merge with (in case there is more than one)
remains with the delivering party and is not left to chance. So one could consistently pick the Issue XML
of the printed issue, or some other Issue XML based on the business requirements in hand.
Bottom line is that Issue XML will not complicate content interchange as long as its DTD forms part
of an industry standard and is integrated as part of the JATS Tag Suite.
Issue XML DTD is a concise yet versatile addition to the JATS Tag Suite which proves quite effective in
collecting all issue-level metadata in a single place. By doing so, one eliminates redundant journal and
issue metadata from the article XMLs, and can encode all the necessary information for rendering the
Table of Contents accurately. The concept of Issue XML appears to be compatible with special cases
of ahead-of-print issues, virtual issues and online-only journal publications. In fact, an article
can change freely between collections without ever needing to update the article XML metadata, owing
to the clear separation that exists between article and issue-level metadata. Article interchange does not
pose a problem either, since a simple stylesheet can combine article and issue metadata in the form
employed by the JATS DTD today.
The Issue XML DTD has been successfully adopted for the encoding of all of Atypon's XML content store
for the past three years. However, the precondition for a broader industry-wide acceptance is the
incorporation of the IssueXml.dtd into the JATS Tag Suite. In this essay, we have elaborated on some of
the advantages in moving towards this direction.