NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2017.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017 [Internet].

Show details

In pursuit of family harmony: Introducing the JATS Compatibility Meta Model

, , , and .

Author Information

JATS is an Open Standard. Users may modify it by adding or removing elements and attributes to suit their needs. Some publishers have extended (added to) JATS based on their own requirements. And there are some public extensions like BITS, STS, and Taxpub. Users expect significant efficiencies from vocabularies based on JATS, including the ability to intermingle the documents in databases, to use tools created for JATS for their new vocabulary with minimal additional work, and to adopt rendering/formatting applications and change only those aspects specific to the new vocabulary. Some model changes create compatible documents, which can interoperate with JATS documents gracefully. But some model changes are disruptive. We discuss what types of changes to the JATS models can be integrated into existing XML environments and which may be disruptive. We propose a set of criteria to evaluate whether a proposed change will be seamless or might cause problems.

Introduction

JATS was designed to be locally extensible. The models in the DTDs are virtually all expressed in Parameter Entities, which means that a local user can over-ride the models as needed. The JATS Tag Libraries all contain chapters titled “Modifying This Tag Set”[1], [2], [3]. These chapters explain the design of the tag set, how to use the various types of PEs, and discuss how to make new tag sets based on the JATS model. There is even documentation on the JATS naming conventions so developers will understand and how to name new structures in ways that will fit in to JATS. A section on “How to Build a New Custom Tag Set” in each Tag Library provides a step-by-step guide on the mechanics of building a new tag set based on JATS.

We are aware of quite a few JATS-based tag sets and are confident that there are many more that we don’t know about. Some of the JATS-based tag sets are subsets of JATS; users have removed structures (elements, attributes, and in some cases just attribute values) that they do not need and do not want to occur in their documents. Many of the JATS-based tag sets not only remove structures that are not needed but also add structures that are locally important. The added structures usually include metadata and often include locally-important body content.

Many users extend JATS simply as a convenience; it is easier to start with a known and reasonably robust tag set and make the changes you need than to start from scratch. For many users starting from JATS has additional advantages:

  • Users who have other JATS documents will find documents tagged to their JATS extension: use familiar tags and have a familiar organization
  • Users who use service vendors for data conversion, typesetting, electronic hosting, or archiving will find that use of a tag set familiar to the vendors increases accuracy and processing speed and decreases costs and learning time.
  • Some users want to combine new document types with existing JATS documents in databases for search, retrieval, and archiving.
  • Some users choose a JATS-based tag set in order to use existing JATS tools for authoring/editing, page layout, and conversion to HTML.

In all of these cases, the users assume that by starting with JATS instead of creating their own tags, and by starting with existing JATS infrastructure, they will be able to build what they need far more effectively (and inexpensively) than if they wrote their own.

Problems with Some JATS Extensions

The users of JATS-based tag sets have usually experienced the advantages they were seeking. However, in some cases they have been unpleasantly surprised. Some of the changes they have made to JATS have left them with incompatibilities they did not expect and did not know how to deal with.

For example, changing the model of an element from character data mixed with phrase-level elements to paragraph-level (paragraph and block-style elements) has meant that the formatting of these structures has behaved unexpectedly – and in at least one case the content simply failed to appear in the display.*

Goals of Compatibility Meta Model

The “JATS Compatibility Model” was developed to “enable creators and maintainers of JATS-based document models to know when the extensions they make to JATS models are JATS-compatible and to suggest ways in which they can achieve their modeling goals in a JATS compatible way.”[4] In other words, to help people who are creating JATS-based models and who want documents tagged to those models to:

  • Be able to use JATS formatting and display tools, with enhancements only for the new structures added to the model
  • Integrate gracefully into databases designed for JATS documents, including compatible search, manipulation, and display
  • Work in existing JATS document processing pipelines and processes with changes only to accommodate the new information in these documents.

In order to enable this, the “Compatibility Model” identifies some design principles in which compatible models as a whole must match JATS and some Compatibility Properties in which each modified model (element or attribute) must match JATS.

Note that there are many properties of XML models that are NOT on this list of Compatibility Properties. In another paper, we discuss properties that seemed as if they might be important for this sort of compatibility but that on deeper exploration were not necessary for compatibility.

Our intention in developing this model is to enable users to customize JATS models to meet their needs in ways that will work well with JATS documents.

Most Important Rule for Compatibility: Respect the Semantics

Respect the semantics of the existing elements and attributes; use the named structure to mean the same things JATS means by that named structure. This may sound obvious, and most XML users would agree with it. That is, UNTIL push comes to shove, and then we start looking for ways to wiggle around it.

For example, in a document set that is about storage and transportation of fuels, it might seem reasonable to tag information about fuels, including the volume of liquid fuels. There is a Volume element in JATS, used to identify the volume of a multi-volume publication. The JATS meaning of "volume" is not a quantity of liquid measure. Even if the fuels documents will never use "volume" in the way JATS does, they should use a different name, perhaps "liquid-volume" to tag the capacity of a gasoline truck.

The developers of JATS have tried to use only one meaning for each word in a name, and to have only one meaning for each element and attribute name. The documentation includes a description of each element and attribute. And that should be the only meaning of each name in a JATS environment.

So, how do developers of JATS-based tag sets get into trouble with this most basic of rules? With the best of intentions, of course! They are extending JATS to meet the needs of a community or group of documents that has its own vocabulary and jargon. And something important to them is called by the same name as something in JATS that they don’t need. Since we really want our users to recognize the labels we use for their content and the best way to do this is to use the names they use.

So, delete the models we don’t need, use the names we want for our purposes, and our users are happy. For a while. Until they realize that they aren’t getting some of the advantages they expected from starting with JATS.

Or they simply use the same element in another context. It’s OK, they say. We’ll document that in the original context it means one thing and in our new context it means something else. And what is the problem with that? We can certainly control formatting by context in XML. And we could specify context on all searches. But if these documents get mixed into a database that doesn’t have two meanings for that element, it is a good bet that the users of that database are not in the habit of specifying context on an element that never needed such limitation before; and the library of canned searches will certainly not accommodate this new requirement.

Two Design Principles

In addition to “Respect the Semantics”, there are a two aspects of design on which a JATS-compatible model must take the same approach that JATS does:

  • Linking direction, and
  • Section model.

Linking Direction

Links in JATS go from the many to the one, not the other way around. (Yes, we know that since there are IDs and IDREFs on a lot of structures it is possible to create valid tagged documents using the JATS tag set that go in the other direction, but links are clearly documented in JATS as going from the many to the one.) For example, since it is possible for there to be more than one citation to a referenced object, each citation has an ID and references (cross-references) to that citation should link to it with an IDREF. The citation does not point to a cross-reference where it is being cited. Similarly, a reference to a section, table, figure, or equation uses an IDREF to point to that section, table, figure, or equation.

There are tag sets that point in the other direction, with special rules for the (admittedly relatively unusual) situation in which there are two references to the same footnote or other portion of the document. However, it is not reasonable to expect this to work in tools that have been developed for JATS users who follow the JATS guidelines.

There are users who want links in both directions in their user interfaces. This can be built from one-way ID/IDREF attributes in the XML files, but some prefer to put ID/IDREFs in both directions into their XML files. We suggest that not only is this contrary to the way JATS was designed, it is also a just plain bad idea! If the XML file contains the link in one direction and the display tool creates the inverse, the links will be symmetrical. If, however, the XML contains both directions you have the possibility of mismatched pointers.

Section Model

The defining structure of complex text documents is the headed section. The way headings and the text associated with them are handled is fundamental to the way prose documents are encoded, managed, and displayed. There are several common approaches to this, and all work well in some circumstances:

  • A mixture of paragraph-like things and various levels of headings
  • A sequence of level-identified sections with corresponding levels of headings
  • Nested level-identified sections containing either level-identified or generic headings
  • Recursive sections containing generic headings
JATS uses the last of these; sections that contain headings, paragraph-like things, and possibly lower-level sections. The level of the heading, and thus the way in which it is displayed to the user, are (by default) calculated by the numbers of sections in which the heading is nested. That is, the title of a section in the body of the document will display with the styling of a first level heading, and the title of a section that is in a section in the body of the document will display with the styling of a second level heading, and the title of a section that is in a section that is in a section that is in the body of the document will display with the styling of a third level heading.

If you want a tag set based on JATS to work comfortably with existing JATS infrastructure, use this approach to section modeling.

This does not mean that document tagged using JATS or a JATS-compatible model cannot be as funky as needed. If you want heading levels to be skipped in your documents either:

  • provide invisible sections levels by providing wrappers than contain nothing but the next level of section (allowed in Archiving (Green) but not in Journal Publishing (Blue) or Article Authoring (Pumpkin)), or
  • use the @display-level attribute (allowed in Archiving (Green)) to specify what the heading should look like.

Subsetting

A proper subset of any content model or model of attribute values is always allowed. What we mean by that is that a model can always be made smaller as long as it keeps the basic structure of the larger model.

We say that any modification to the model that describes a subset of a JATS model is allowed. This includes:

  • Elements may be removed from an “or” group with many elements
  • Elements that are required in JATS may be removed or made optional
  • Values may be removed from the list of specified values of an attribute
  • Attributes may be removed from elements
Sometimes this may allow some surprising modifications. For example, structures that have a section model may be significantly simplified:
  • The model for Abstract may be simplified from:
    • The following, in order:
      • <label> Label (of an Equation, Figure, Reference, etc.), zero or one
      • <title> Title, zero or one
      • <p> Paragraph, zero or more
      • <sec> Section, zero or more
  • to:
    • <p> Paragraph, zero or more
  • or even to:
    • <p> Paragraph

Compatibility Properties

There are also some specific properties where a JATS-compatible model must match JATS. In order to be JATS-compatible, and comfortably expect to interoperate with JATS, new tag sets must match JATS not only on the major design aspects described above but also on all of these properties.

The compatibility properties are all defined in the “JATS Compatibility Model”, which includes an appendix that provides values for all JATS structures through JATS version 1.1.

Here we describe several of these properties that have caused users the most difficulties.

Element versus Attribute

Designers of XML vocabularies often have very strong opinions about what information should be encoded as element content and what should be attribute values. Unfortunately, while this is often the subject of very firm opinions, there is no consensus on how to determine what should be element and what attribute content.

A JATS-compatible tag set should either:

  • follow JATS in encoding the information as element or attribute or
  • create a new element or attribute for the information.
That is, if the designer(s) of a new vocabulary decide that “Prefix word”, provided as an attribute in JATS which is described as “Word or phrase to be added to the beginning of each item in a list (for example, ‘Step’, ‘Procedure’.)” should be element contents in the new vocabulary, it must not be an element named <prefix-word>; a new name must be used.

Similarly, if the designer(s) of a new vocabulary want to make phone number an attribute on <address> instead of an element that can be contained in address, they must use a new name for this new phone number attribute, it must not be @phone because <phone> is an element in JATS.

Whitespace Handling

Many XML users, even many XML vocabulary designers, are unaware of the complexities of XML whitespace handling. In XML, there are situations in which whitespace is significant and situations in which it is not. XML processors may, and typically do, behave very differently in these two situations:

  • where whitespace is significant (and @xml:space="preserve" has not been specified):
    • all whitespace characters (space, carriage return, and tab) become spaces
    • multiple whitespace characters are treated/processed as if there were only one
    • whitespace can be automagically added to a document any place where there is already significant whitespace (this is often done when wrapping lines or indenting to show structure)
  • where whitespace is not significant:
    • processors (are supposed to) ignore it
    • whitespace can be added at any place where whitespace is insignificant (this is often done when wrapping lines or indenting to show structure)
    • insignificant whitespace may be removed by an XML processor at any time (this is, by definition, not changing the XML document)

Elements with element–only models have insignificant whitespace. Elements with mixed content (those that allow character content and perhaps other elements) have significant whitespace.

Because XML processors may remove all insignificant white space but must retain one space where whitespace is significant, the difference is critical in processing and formatting XML documents.

For example, JATS Element Citations have insignificant whitespace. There may, or may not be, whitespace between the elements inside an <element-citation>, for example, between the cited article-title and the cited publication date, but since this whitespace is insignificant, a display tool would show the contents of the elements inside the element citation flush one against another unless the tool explicitly inserts space. (In fact, in order to display <element-citation>s correctly, the display tool would usually insert both space and punctuation.)

JATS <mixed-citation>s have significant whitespace. In displaying a <mixed-citation> the display tool would show space where the XML document had space and would display any tagged elements inside the <mixed-citation> that did not have space between them as flush against each other.

Because the way whitespace is treated in displaying XML content is so different between elements with mixed content and those with element content it is important that this property be maintained for each element in making a JATS-compatible tag set.

Alternatives

The “Alternatives” concept in JATS is very powerful, and unfamiliar to users of many other tag sets. Think of an alternatives element as a wrapper that says “all of these things are equivalent”. That is, for display or count, you typically want to use only of the (possibly many) supplied versions, or you may want to treat one as the preferred version and others are synonyms.

For example, an <alternatives> that contains:

  • a <graphic>,
  • an <mml:math>, and
  • a <textual-form>
contains 3 ways to express one piece of information; probably a mathematical expression. In formatting this XML document for print, it might be best to use the MathML version, while a screen display might want to use the graphic with the content of the textual form provided as <alt-text> for non-visual uses. Since these are 3 expressions of the same content, it would be quite unusual to display more than one of them.

In another case, if a <name-alternatives> contains several <name> or <string-name> elements it is assumed that these are variations on the name of the same person. Perhaps one is the name in the language of the containing document and others are the name of the same person in other languages or scripts. It is common to display the version of the name that matches the document’s style followed by other versions in parentheses. It is important that when counting the number of contributors to a document this person be counted only once, not once for each name version. Similarly, when counting the number of documents to which this person contributed this document should be counted only once.

Because of the special meaning of alternatives wrapper structures in JATS, processors and display tools handle alternatives in differently from other wrappers. It is important that JATS-compatible tag sets use alternatives wrappers in the same way.

Note on JATS and These Guidelines

JATS was developed and maintained for quite a while before these principles were articulated. In documenting these guidelines, and this property in particular, we are aware that JATS does not completely conform to these rules. It is our hope that future version of JATS may “clean up” what we now see as infelicities.

Element or Attribute. It is not an error to have the same name used as an element and an attribute, but it clearly causes confusion and could easily cause someone modifying JATS to believe that it is acceptable to move content from element to attribute or vice versa. At the moment, in JATS, the following are both the names of elements and attributes:

  • corresp
  • count
  • country
  • elocation-id
  • glyph-data
  • issue
  • journal-id
  • name
  • object-id
  • version

We are also aware of one instance in which an element has significant white space in one JATS tag set and insignificant white space in another: History is element content in the Publishing (Blue) model and mixed content in the Archiving (Green) model. This means that white space is insignificant in documents that identify themselves as being valid to the Publishing model but if nothing in the document changes except the Document Type Declaration is changed to Archiving the whitespace inside <history> becomes significant.

We suggest that these were unfortunate design decisions and advise future vocabulary developers NOT to make these same mistakes.

Recommendation and Request

We, the authors of both this paper and the “JATS Compatibility Model”, recommend that anyone creating a JATS-based tag set, anyone modifying and of the JATS tag sets, consider whether you need/want/expect your modified tag set to interoperate with other JATS documents. If you do, we suggest that you read the “JATS Compatibility Model” and follow the guidelines in it when creating your tag set. This may, at times, be inconvenient. It means that a few key architectural principles must match those of JATS (nested sections and pointer direction), but beyond that it means that if you do things differently from the way JATS does them you simply use a different name for your structure.

We request that JATS users, especially those with experience modifying tag sets, read the “JATS Compatibility Model” document and help us improve it. Are there other design principles or compatibility properties that should be added to these guidelines? Are there things we suggest that are not important for interoperability? Is the document unclear?

References

1.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Archiving and Interchange Tag Library, NISO JATS Version 1.1 (ANSI/NISO Z39.96-2015), “Modifying This Tag Set”. December 2016. Available at: https://jats​.nlm.nih​.gov/archiving/tag-library/1​.1/chapter/implementor.html.
2.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Journal Publishing Tag Library, NISO JATS Version 1.1 (ANSI/NISO Z39.96-2015), “Modifying This Tag Set”. December 2016. Available at: https://jats​.nlm.nih​.gov/publishing/tag-library/1​.1/chapter/implementor.html.
3.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM). Article Authoring Tag Library, NISO JATS Version 1.1 (ANSI/NISO Z39.96-2015), “Modifying This Tag Set”. December 2016. Available at: https://jats​.nlm.nih​.gov/articleauthoring/tag-library/1​.1/chapter/implementor​.html.
4.
Usdin, B. Tommie, , Lapeyre, Deborah A., , Randall, Laura, , Beck, Jeffrey. . JATS Compatibility Meta-Model Description, Draft 0.7. July 12, 2016. Available at: http://www​.niso.org/apps​/group_public/download​.php/16764/JATS-Compatibility-Model-v0-7.pdf.
5.
Usdin, B. Tommie, , Lapeyre, Deborah A., , Randall, Laura, , Beck, Jeffrey. “. “Graceful Tag Set Extension.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). DOI: 10​.4242/BalisageVol17.Usdin01. Available at: https://www​.balisage​.net/Proceedings/vol17​/html/Usdin01/BalisageVol17-Usdin01​.html.

Footnotes

*

The parameter entities have been set up to make this particular change difficult for exactly this reason, but users are very clever and have found ways around this modeling.

Copyright Notice

Laura Randall’s and Jeffrey Beck’s contribution to the Work was done as part of their official duties as NIH employees. Consequently, this Work is in the public domain; no copyright may be established in the United States. 17 U.S.C. § 105. If Publisher intends to disseminate the Work outside the U.S., Publisher may secure copyright to the extent authorized under the domestic laws of the relevant country, subject to a paid-up, nonexclusive, irrevocable worldwide license to the United States in such copyrighted work to reproduce, prepare derivative works, distribute copies to the public and perform publicly and display publicly the work, and to permit others to do so.

Bookshelf ID: NBK425547

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...