NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2017.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017 [Internet].

Show details

Circling in on the JATS Compatibility Meta-Model

, , , and .

Author Information

The JATS Meta-Model was developed to guide people who want to customize JATS to meet local needs and have their JATS-based vocabularies work gracefully with existing JATS-based infrastructure. From analyzing content models to defining "social behaviors" of XML elements, the process of defining the JATS Compatibility Meta-Model was rarely straightforward and very often led us to surprising conclusions. Why, for instance, is whether or not something is metadata not a defining property of compatibility? This paper aims to explain the process and thinking behind the model—how we came to the conclusions about compatibility and what we even mean by compatibility. We'll look at some of the assertions we started absolutely knowing to be important, and discuss why they're ultimately not in the Meta-Model. By examining the process behind the model and sharing our successes and failures, we hope to improve understanding of the model and its broader implications.

Introduction

The primary goal of the JATS is information interchange. As the JATS becomes more widely used and adopted outside its original scope of STEM journal publications, customizations become necessary. Customizations may take the form of subsetting (reducing the number of structures in the JATS model) or extensions (creating new structures within the model). As the JATS user base grows, so, too, do the requirements on the Tag Suite, the need for new structures, and for expanding the existing ones. In cases where a group needs a structure for something for which the JATS does not provide, there is an understandable tendency to repurpose something in the JATS the group is not using rather than to create an extension. This process, however, can cause problems when it comes to sharing those documents. If JATS-defined structures are used in ways that are not consistent with the JATS definitions, that inconsistency can compromise document interoperability and exchange.

When the NLM Journal Archiving and Interchange Tag Suite ("NLM DTD", the predecessor to the NISO JATS) was created more than a decade ago, it was defined as an interchange format[1] rather than one intended for publishing or XML-first workflows. Even the Journal Publishing Tag Set was defined as being "optimized for regularizing an archive or ... aid[ing] print and web production."[2] This is a very different usage than what we're seeing today where publications begin with XML and must capture their own production-specific information in the XML. Because production considerations like a journal's "look and feel" are explicitly not part of the NISO JATS standard, users must (and should) customize the JATS to handle these requirements.

Current JATS implementations also encompass far more phases of a journal lifecycle than we thought or planned for, so there are very real and legitimate use cases that the NISO JATS standard does not support. Similarly, there are far more types of documents being captured in the JATS than just journal articles—books, standards, pamphlets, and even posters. None of these publication types were part of the original scope and intent of the JATS, but the modular nature of the JATS was designed to enable users to expand it to accommodate any of these (or other) publication types with some thoughtful customization.

The JATS Compatibility Meta-Model [3] aims to help those creating these customizations to make sure that what they do maintains a high-level of compatibility with JATS to enable interchange with other JATS documents.

The process to arrive at the compatibility model started by discussing what we called "JATS proliferation madness"— the idea that the popularity of JATS, while a good thing, is leading to a chaotic environment where a growing number of users have their own flavors or custom models and there's no guidance on creating these models in a way that maintains compatibility with the base JATS model. This leads to a scenario where there could be duplicate model extensions, some with similar intentions, and most destructively, extensions that are incompatible with the JATS. We considered this incompatibility destructive because it leads to documents having to be isolated, which is contrary to the JATS goal of interchange and interoperability.

Defining Compatibility

The specifics of JATS compatibility are described in other works[3][4], but for the sake of this paper, compatibility is identified as document structures that behave in predictable, consistent, and generally non-destructive ways. This broad definition of compatibility was a product of the general process to determine the model rather than something we started out with in the beginning.

Predictable

Predictability is important because XML systems don't like surprises. If the system needs to generate punctuation or whitespace to display an element, then it should always have to do so. If an article identifier needs to be indexed in a database, then ideally it will always be found in the same place in the document and retrievable in the same way. This isn't always the most realistic situation, but systems (and systems developers) should never have to guess where to find the information it needs.

Consistent

Consistent behavior in the JATS models means that you respect the semantics. The JATS includes definitions of elements and attributes and staying true to these definitions is key to compatibility. Just as systems don't like to be surprised, they don't usually handle semantic variations, either. As discussed in this paper's companion[3] paper, if the JATS model is repurposed by a group that does not use volume to mean the number of a journal and they instead use that same element to mean the quantity of a liquid measure, the semantic disconnect will cause an interchange problem. There is nothing in the tagging to provide for a system to distinguish which semantic meaning is appropriate to the document and since the semantic meanings are significantly different, the documents are not interchangeable and thus not compatible.

Generally non-destructive

The concept of destructive behavior in the context of this compatibility model refers to maintaining the spirit of the JATS and respecting the specific properties defined by the compatibility model. The JATS models were developed with a clear purpose of providing an XML format to enable the capture of journal articles. Altering (not expanding) that purpose is contrary to compatibility. This is also tied heavily to the predictability and consistency aspects of compatibility, but it address the larger picture of the JATS environment.

Conceptually, the JATS was created to archive journal article content. Known extensions and customizations expand on this goal and enable capture of similar or related structures like books (BITS) and taxonomic markup used in journal articles (TaxPub). These customizations and many others stay true to the goals of the JATS in enabling scholarly content capture and interchange. It is only when a model changes the inherent purpose of goal of the JATS that it would be considered destructive and would jeopardize interoperability. These kinds of changes cannot be made while preserving compatibility with the JATS environment.

Process

The first step we took in creating this model was to go through the JATS defined elements and attributes, group them by like behaviors, and to list the specific characteristics that informed the groupings. Some of these early element groupings were:

  • paragraph-like: block, no label, can contain text or elements (p, license-p)
  • figure-like: block, has metadata, has title and label (fig, table-wrap)
  • inline toggle: inline, no metadata, toggles (italic, roman, bold, sc, underline)
  • inline not toggle: inline, no metadata, does not toggle (named-content, chem-struct, inline-formula, inline-supplementary-material)
  • list-like: block, label and title, not recursive (list)
  • major structures: superstructure or framework, not directly displayable (article, front, body, back)
  • sec-like: block, has metadata, has title and label (sec, ack, bio)

These initial groupings intermixed the ideas of the role elements play in a document (block, inline, metadata) with what they contain (metadata, text, mixed-content). The combination of these two pieces is what defines an element's social behavior. As we dug into the significance of the characteristics of these behaviors, we came to the difficult conclusion that some of these behaviors, while seemingly important, did not actually matter when determining compatibility.

What didn't make the cut

Throughout the process of developing the model, a number of characteristics of the model were discussed, some at great length, and were ultimately not included. One of the hardest things about the process was accepting that some of the things we started out absolutely knowing to be key to compatibility were, in fact, not relevant at all.

Metadata

From the very first discussion of the compatibility model, one of the properties we included was whether or not something was metadata. At first, our analysis of elements focused on the behaviors of elements in a document (their social behaviors): empty elements, elements that contained PCDATA, elements with sequences, mixed-content, element-only content; but this left us with the question, where do metadata elements fit in? They didn't fit into anything we had identified yet, so they were their own category. At first, this provided a useful classification and specifying whether or not an element is metadata seems to naturally be an important aspect of identification.

We quickly realized that simply identifying that something could be metadata wasn't adequate. There are a number of elements defined as JATS metadata that appear elsewhere in documents, as well. However, we weren't ready to abandon metadata as a property just yet, so we divided "can be metadata" into two separate categories: meta-only and meta-display.

The classification of an element as meta-display meant that it could appear either in the front matter of an article or in the narrative flow. Something that was meta-only would only be included as metadata. However, the number of elements in the JATS that only occur as metadata is very small, in part because user needs have often been to include these meta-only elements in other places in the document.

Knowing that this is the case, we came to the conclusion that if an element is typically part of the metadata and becomes part of the narrative flow of the document, it does not affect compatibility. Similarly, we realized that the reverse is also true—something from the narrative flow being included as metadata does not fundamentally change the JATS-defined element and thus isn't of concern when determining compatibility. Recognizing that the "meta-ness" of something does not, in fact, matter in determing compatibility was surprising to us, but undeniable.

Frameworks or Superstructures

Like whether or not something was metadata, the concept of frameworks or superstructures was with us from the beginning of the model development. Initially this included the article, front, body, back, sub-article, and reply elements. In our experience, these elements were key components of a document's structure that provided it's identity as a JATS article document.

As the work progressed and we were explicitly defining the properties we were identifying as necessary for compatibility, defining this particular property was surprisingly difficult. Each attempt at a definition led to more confusion. We referred to these elements as superstructures, frameworks, organizing bags, and each time we tried to define the properties of this thing we knew to be key to compatibility, we realized that there was nothing unique about these structures.

For example, when we tried to define frameworks as bags that contain all the components of a certain structure component of the document (as in the front contains all the metadata), we realized that the JATS is full of elements that fit this description. Everything from title-group to alternatives to even name fit that description. For a long time, though, we were convinced we just hadn't found the right description and frameworks were still a key aspect to this compatibility model.

After much of the work around the compatibility model was complete and we were still hanging on to frameworks as a property of compatibility, B. Tommie Usdin shared a thought with the rest of the group that, as difficult as it was to accept, led us to dropping the property from the model.

If a JATS article can become a BITS book-part with one or two top-level changes, and if the body of that article could become a sec in either an article or a book with similarly minimal changes, perhaps the superstructure Property is unimportant.

I suggest that our inability to clearly communicate what it is and why it affects interoperability...is a good clue that we should dump this one. [5]

Recursion

Like the metadata and superstructure characteristics, recursion was something that seemed very important at the outset. Our original element groupings based on social behavior made a distinction between list-like behaviors (cannot contain myself) and def-list like behaviors (can contain myself). The issue of recursion, though, was not as straightforward as that. Originally we looked at the most basic and literal form of recursion—an element that can contain itself. However, from the perspective of social behaviors in a document, there are less literal types of recursion. List cannot contain list as a child, but it can contain it as a descendant (list-item/p/list). So from a behavioral perspective, lists can be recursive, just not in the most literal interpretation of the schema.

To complicate the matter, the JATS defines structures that are not directly recursive, but they do allow structures as children that exhibit the same behaviors. For example, bio cannot contain itself, so by the strict interpretation, it is not recursive. However, it does allow the sec element and the models for sec and bio are identical. By that extension, bio cannot contain itself, but it can contain structures that mimic its own behavior. Would that, then, qualify as being a recursive structure?

Ultimately we decided to limit the meaning of recursion for this project to an element that could contain itself as a direct child. There were two major reasons for that decision:

  • There are far too many similar (or identical) structures in JATS to have to map what would and would not count for recursion, and
  • If any of these "similar enough" models were to be altered in a customized version of the JATS, the recursive nature of the element may no longer apply.

It was the discussion of this second reason that led us to abandon recursion as a compatibility property for this model. When we discussed the repercussions of having an element that was recursive as defined by JATS become non-recursive in a customization, we decided that there were none. Similarly, if a customized model were to expand an element's model to allow it to be recursive, that alone would not be sufficient to break any of the compatibility properties we had defined.

Moving Forward

The exercise of examining the JATS-defined structures led us on an often circular path but we eventually got to the JATS Compatibility Model. But like JATS documents, this model won't be particularly useful if it exists in isolation. We strongly encourage the JATS community to read the model, to ask questions, to try to apply the Properties and run it through its paces. Community involvement is what pushed the JATS to where it is today and we hope that the same community involvement will help make the Compatibility Model an integral tool in creating a much more robust JATS environment.

References

1.
NLM Journal Archiving and Interchange Tag Suite [Internet]. Bethesda, MD: National Center for Biotechnology Information; updated 2012. September 13 [cited 2017. April 10]. Available from: https://dtd​.nlm.nih.gov/.
2.
Journal Publishing Tag Set [Internet]. Bethesda, MD: National Center for Biotechnology Information; updated 2012. September 17 [cited 2017. April 10]. Available from: https://dtd​.nlm.nih.gov/publishing/
3.
Usdin, B. Tommie, , Lapeyre, Deborah A., , Randall, Laura, , Beck, Jeffrey. . JATS Compatibility Meta-Model Description, Draft 0.7. July 12, 2016. Available at: http://www​.niso.org/apps​/group_public/download​.php/16764/JATS-Compatibility-Model-v0-7.pdf.
4.
Usdin, B. Tommie, , Lapeyre, Deborah A., , Randall, Laura, , Beck, Jeffrey. “. “In pursuit of family harmony.” Forthcoming 2017. April 25.
5.
Usdin, B. Tommie. . Let's Dump "Superstructure"! [Internet]. Message to: Jeff Beck, Deborah A. Lapeyre, Laura Randall. 2016. March 31. [cited 2017 Mar 28]. .
6.
Usdin, B. Tommie, , Lapeyre, Deborah A., , Randall, Laura, , Beck, Jeffrey. “. “Graceful Tag Set Extension.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). DOI: 10​.4242/BalisageVol17.Usdin01. Available at: https://www​.balisage​.net/Proceedings/vol17​/html/Usdin01/BalisageVol17-Usdin01​.html.
Copyright Notice

Laura Randall’s and Jeffrey Beck’s contribution to the Work was done as part of their official duties as NIH employees. Consequently, this Work is in the public domain; no copyright may be established in the United States. 17 U.S.C. § 105. If Publisher intends to disseminate the Work outside the U.S., Publisher may secure copyright to the extent authorized under the domestic laws of the relevant country, subject to a paid-up, nonexclusive, irrevocable worldwide license to the United States in such copyrighted work to reproduce, prepare derivative works, distribute copies to the public and perform publicly and display publicly the work, and to permit others to do so.

Bookshelf ID: NBK425698

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...