NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2018 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2018.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2018

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2018 [Internet].

Show details

Fighting the “Inevitable” Expansion of the JATS Tag Sets

.

Author Information

The tag set maintenance process is biased toward expansion and loosening. The JATS Standing Committee wants to satisfy the people who request changes while resisting any suggestions that would be backwards-incompatible. Requests include adding new metadata, enriching the expressive capabilities of the tag suite, accommodating the needs of new document types, and better supporting document display. Models can be tightened to better support interchange through the use of layers on top of the Standard, including guidelines and Schematron published by other players in the community including vendors and groups such as JATS4R and STS4i. We need to support these efforts and to resist pressure to increase the scope of JATS.

Introduction

This discussion grew out of an email I sent to the JATS Standing Committee, explaining a position I had taken in a meeting of the JATS Standing Committee. I was justifying my position to the group after some private push-back on it. In essence, I was trying to discourage the group from making a requested change. Now, some months and many conversations later, my opinions have evolved a bit, but I still think that a conversation with the whole community about how JATS should evolve would be very helpful. I hope to start that discussion.

How JATS Changes Over Time

As most of you know, there are three JATS tag sets. (I think there is an unwritten law that this list must be presented at least three times at each JATS-Con. Sort of invoking the JATS spirits — and introducing newcomers to some jargon you are really going to need to understand much of what goes on for the next two days.)

  • Archiving. The loosest tag set. Designed to be an easy conversion target, and enforcing virtually no rules. Informally called “Green” because that is the accent color in the online tag set documentation.
  • Publishing. A tag set designed for interchange and display. It provides many options for encoding some key information types, but does enforce some rules that are critical for rendering or display. Informally called “Blue”.
  • Authoring. Designed for people who are creating content in JATS. Authoring is intended to provide a way to encode everything a content creator needs to express in a journal article without providing multiple options or the ability to tag things that are the purview of a publisher. This one is called “Pumpkin”.

While I am introducing the members of the family, I should also mention two more:

  • BITS. The Book Interchange Tag Set is a JATS-based model for books, parts of books, and collections of books. While it could be called “Chocolate” or “Brown” because that is the accent color in its tag library, it is generally called “BITS”.
  • STS. The NISO Standards Tag Suite is a JATS-based model for standards, backward-compatible with ISO STS. A meta-standard, as it were, which is called “STS” (or “NISO STS” to differentiate it from its predecessor “ISO STS”).

Requests to Change JATS

The JATS tag sets are a NISO standard, and are maintained by a NISO committee called the JATS Standing Committee. Suggestions for changes to JATS can be made by anyone by filling out a form on the NISO web site. The JATS Standing Committee meets periodically to review the suggestions. Some of the suggestions result in changes to one or more of the JATS tag sets or to the JATS documentation.

OK, so far there is no news here. JATS is a continuous maintenance standard; the procedure for requesting changes and the responses of the standing committee to the requests are all on the NISO web site, available for anyone to read.

This next bit is not really telling tales out of school, because recordings of all of the JATS Standing Committee meetings are also available on the NISO site. But … it takes a lot of motivation to listen to even one of them and it would surprise me if anyone who wasn’t trying to cure insomnia would listen to more than one. So, I may well be telling those of you who are not on the JATS Standing Committee things you don’t know (and may not care about).

To over simplify: about twice a year the Standing Committee gathers all of the suggestions that have been made since the last meeting and discusses them in a series of phone meetings. When dealing with a request, the JATS Standing Committee (and equivalent BITS and STS committees) may:

  • agree to make the change(s) suggested,
  • address the requester’s problem by making a functionally similar change (This is the most common occurrence.),
  • decide to address the suggestion with a change to the documentation, the examples, or other non-normative materials,
  • identify the suggestion as not relevant to JATS,
  • reject the suggestion and provide an explanation of why the suggestion was rejected to the requester, or
  • postpone deciding how to respond.

Requests to Change BITS

BITS is maintained by NCBI with the support of a committee that is quite similar to the JATS Standing Committee although it is not an activity of a formal standards body, and its activities are less regulated. Comments on BITS are provided by the members of the committee on the JATS discussion list. The BITS Working Group meets at intervals to review recent comments and suggestions.

Requests to Change STS

STS is a new NISO Standard, and has not yet gotten into routine maintenance. It will be under continuous maintenance as JATS is. A Standing Committee will be established and it will function pretty much the same way the JATS Standing Committee does.

Communication between JATS Family

These tag sets are a family, and communication among them is provided by the users who use more than one and committee members who serve on more than one of the maintenance committees. Historically, each time the JATS Standing Committee completes decisions on a set of comments, the BITS Committee then meets to discuss any suggestions that have been made for improvement of BITS and to discuss whether to adopt each of the JATS changes into BITS. It seems likely that when the STS Standing Committee meets, it will consider any changes JATS has made as well as any suggestions made on the STS comment form.

Requests to Expand Scope of JATS

We have seen several types of requests to expand the scope of JATS. Some are to accommodate changes in the journal publishing environment. For example, publishers are getting more serious about accessible versions of journal articles and are finding that the mechanisms provided in current JATS are not quite sufficient. The environment is also changing; when JATS was first developed there was no consistent, industry-wide way to identify (to name but a few):

  • the nature of the contributions made by each of the contributors to an article,
  • the institutions with which these people were affiliated,
  • the sources of funding for the work reported on in an article,
  • the non-monetary support provided,
  • the data sources used, where they can be found, and how the data was used (“generated” versus “analyzed”, for example), or
  • the taxonomy or ontology sources of terms and identifiers.

More Document Types

We also hear from people who are using JATS for types of documents that are not journal articles, sometimes document types that are explicitly out of scope. They ask us to add structures to accommodate law books and technical reports. I have been involved in discussions with people who use JATS for pamphlets, posters, training materials, …

Among the forces that let to creation of BITS (the JATS-based Book Interchange Tag Suite) were the needs of people publishing journal articles to also publish books, and their desire to make JATS articles into chapters of these books easily.

NISO STS grew out of a desire to use tools that had been developed for JATS articles in the publication of standards documents.

Look & Feel

The purpose of JATS, as documented in the Standard [1] is:

“to preserve the intellectual content of journals … independent of the form in which that content was originally delivered … without modeling any particular sequence or textual format”.

In other words, look and feel is explicitly out of scope for a JATS vocabulary.

Nonetheless, JATS does have structures for tagging some appearance features, such as <bold>, <italic>, and <small-caps>. We have recently added additional structures to capture complex emphasis using attributes on <styled-content>. This is because while we all agree that it would be best to tag phrases that look different from the surrounding text with the reason they look different, this reason not always known, and it is far better to tag a phrase as <bold> than to lose the fact that there is something that differentiates this phrase from the rest of the text.

However, the committee regularly receives, and generally rejects, requests to improve the presentation tagging in JATS. This is not because we think JATS as it is does a perfect, or even a good, job of tagging presentation; it is because the details of presentation are out of scope for JATS.

JATS Standing Committee Approach to Requests

In general, the JATS committee works from an agenda which is distributed before the meeting. The agenda groups similar or related requests to make discussion easier. Each request appears in the agenda in its original form, with notes that identify which of the JATS models would be affected, notes on possible approaches to the request, and sometimes recommendations from the tag suite maintaners. Any requests that would require backwards-incompatible changes or that would have known negative impacts are identified.

Strong Inclination Toward Making Requested Changes

In the JATS Standing Committee discussions, there are an amazing number of forces pulling in an amazing number of directions. We take the suggestions seriously because someone was interested enough in JATS to go to the trouble to make a suggestion. They are using JATS; they need more; they come to us for help.

It seems to me that the committee has a strong bias toward finding a way to enable the requester to do what was wanted; sometimes by granting the request exactly and more often by doing something that is similar and fits gracefully into the existing JATS structures. We are a group that wants to say “yes” to requests, especially if the request is clear, examples from real publications are provided, and there is a good case that the information in question is obviously relevant to JATS but cannot already be encoded in JATS.

This group has also, on the whole, been sympathetic toward “Greenification of Blue”. That is, toward making models in the Publishing model looser, more like the models in the Archiving model, to accommodate users who prefer to use the Publishing model but who need “just this one thing” loosened so they can.

Specific Changes Users Request

The changes I have seen requested in JATS (and in the JATS-Based tag sets) can be divided into several categories.

Changes to Elements

  • Add a way to tag new information. (E.g., add a way to record non-financial support similar to the tagging for financial support, or create an element for inline media, for content that might be tagged <media> but should be placed similarly to <inline-graphic>.)
  • Add an existing element to a new context. (E.g., allow <inline-formula> inside <source>.)
  • Restructure an existing model to loosen it. (E.g., make <pub-date> optional in Blue.)
  • Restructure an existing model to make it tighter. (E.g., require <title> in <sec>.)
  • Restructure an existing model to change it from element content to mixed content or vice versa. (E.g., allow character data in <def-item> to allow for encoding leader dots between the <term> and the <definition> and for extra notations that are not really part of either the <term> or the <definition>.)
  • Redesign existing structures (changing relationships, making element content into attribute values, etc.) (E.g., replace the current emphasis elements including <bold>, <italic>, <monospace>, <roman>, <strike>, and <underline> with a single <emph> element with a @style attribute.)
  • Rename existing structures (change tag or attribute names). (E.g., change the tag for <list-item> to <li> because that a more natural tag — and what HTML uses.)

Changes to Attributes

  • Add a value to the list of values of an attribute. (E.g., add "bio" to the list of values for the @ref-type attribute in Blue and Pumpkin.)
  • Change the value of an attribute from a specified value list to unspecified character content. (E.g., change the value of the @ref-type attribute from a specified value list to character content.)
  • Split an overloaded attribute value between two attributes. (E.g., the current @pub-id may name the type of identifier or the authority/source for the identifier. Make one attribute record the type and a second attribute record the authority/source.)
  • Add an existing attribute to one or more elements. (E.g., add the @style to <disp-quote>.)
  • Add a new attribute to one or more elements. (E.g., add the @vocab and @vocab-identifier attributes to <term> and <kwd>.)

Changes to Tag Library Documentation

  • Clarify or enhance documentation. (E.g., add a best practice recommendation, e.g., show the values JATS4R recommends in the examples showing attributes with unconstrained values.)
  • Replace, correct, or change the intent of documentation or parts of existing documentation. (E.g., replace one or more of the examples with examples in the requester’s preferred style.)

Backwards Compatible or Not

More important that what structures would be affected by a change is the impact the change would have on users.

Tag set changes are either backwards-compatible or non-backwards-compatible. Backwards-compatible changes can be either adding additional options (completely new optional tagging), or loosening existing constraints. There are two types of backward-compatible changes, and I think they should be discussed separately. We can add additional options or loosen existing constraints. When we added structures for recording information about the funding for the work reported on in an article or added the possibility of associating people and institutions with unique identifiers we were adding options to the tag set. Were we to change the model of <def-item> (Definition Item, which currently must contain a <term> followed by zero or more <def>s) to made the <term> optional, we would be loosening a model. (NO, I am NOT suggesting that this would be a good idea!)

Non-backwards-compatible change means either that documents that were valid to the previous model might not be valid according to the new model OR that the meaning of some part of a document changes between the old model and the new model. For example, if we were to make <title> required on <sec>, old documents that had <sec>s that did not contain <title>s would not be valid to the new model. More subtle, and thus perhaps more insidious, are changes to meaning. If we change the meaning of the <bio> element from “biographical data concerning a contributor or the description of a collaboration” to “biographical data concerning a contributor” and added a new element for “descriptions of collaborations”, all old documents would be valid according to the grammar of the new model, but some might now be incorrect because they included descriptions of collaborations in a location that should only contain information about people.

Some of the requested changes are backwards-compatible (therefore easy to change in the tag set):

  • Adding a value to the list of values for an attribute,
  • Changing the value of an attribute from a specified value list to unspecified character content,
  • Adding an existing attribute to one or more elements,
  • Adding a new attribute to one or more elements,
  • Restructuring an existing model to loosen it, and
  • Clarifying or enhancing documentation.

Some of these changes might be backwards-compatible, depending on how they are done:

  • Adding a new element or entirely new structure,
  • Adding an existing element to a new context, or
  • Clarifying or enhancing documentation.

These can be backwards compatible:

  • If the new elements are optional,
  • If the old version of the model is retained alongside the new version (possibly with additional wrappers), or
  • If the documentation change does not contradict previous guidance.

Not Backwards Compatible

Some of these requested changes are not backwards-compatible. Documents valid to the older form of the tag set might not be valid to a new model if the change:

  • Restructures an existing model to make it tighter,
  • Restructures an existing model to change it from element content to mixed content (or vice versa),
  • Redesigns existing structures (changing relationships, making element content into attribute values, etc.),
  • Renames existing structures (change tag or attribute names), or
  • Replaces, corrects, or changes the intent of documentation (even if fixing an error).

Costs of Model Maintenance

It is the job of the JATS Standing Committee to “maintain” JATS. This is generally assumed to mean that it is our job to change it to meet the needs of people who request changes. On the assumption that a change would not be made unless the requester believed that the change would improve JATS either for their use (most of the changes requested) or for all users (a few of the more sweeping suggestions), it seems as though making these changes would be a good thing.

However, there are significant costs to making changes to something as widely used as JATS is. Even totally backwards-compatible changes are disruptive to many JATS users.

New versions of JATS are released in several steps. First, there is a committee draft, which is (intended to be) available only to the members of the Standing Committee. The Committee is asked to review this draft to see if it accurately reflects the decisions made during the latest set of meetings, and if there are any errors or infelicities we can correct before the draft is made public. After this, a public draft, clearly identified as a draft, is published. Typically, after there have been several drafts, a new version of the Standard is released. There are some users who adopt the new versions at each stage in this process. Some committee members, especially those who have urgent need of some new feature, have been known to put new drafts into production at the committee internal review stage. (I understand, but this is really risky; we DO find and correct errors at this stage.) There are quite a few users who adopt the public versions of the drafts, largely because there are new features they want/need. More users wait for the stability of the new version of the Standard.

The Costs of Change — Any Change

There are a significant number of users who “drop out”, or perhaps more accurately “freeze” at each step in the process. They have complex production systems with high volumes, many players who need to work together, and good reasons to do significant testing and quality assurance work before changing models. We find that each time the standard is revised, even if the revision is totally backward-compatible, some users stay with the old version. This is reasonable. If a content creator adds new information to their documents and passes these documents to a business partner who is not, or not yet, prepared to accept this new information, the result may be confusion, delay, information loss, and expense.

The Costs of Backwards-Incompatible Changes

We have users who are using the pre-JATS immediate predecessor version, NLM 3.0, who didn’t follow us to JATS 1.0. We have more users who are still using a still older NLM version 2.3, the last version before some major changes in citation tagging. It would not surprise me to hear of users still using NLM 1.1, although I don’t know of any. The changes between NLM 3.0 and JATS 1.0 are truly small; the @dtd-version attribute on the <article> element needed to change, and some new optional elements and attributes were added. Nonetheless, we lost some users at this transition.

While a few users are “stuck” at NLM 3.0, a lot more users are still using NLM 2.3. This is because there were significant non-backwards-compatible changes between NLM 2.3 and NLM 3.0.

There are good reasons for this: making a backwards-incompatible change to a well controlled workflow is time consuming, expensive, and disruptive. In order to move to a non-backwards-compatible version of a tag set, a user may need to not only change their document production and coordinate with all of their business partners, they may need to modify their database(s) and or their entire backfile. Many JATS users want to have their entire library stored in one system and searchable as a collection. They do not consider it reasonable to require a searcher to know if they are looking for content created before or after a model change; in fact few want their users to know about or thing about the models at all.

(If you can just “slide” a non-backwards compatible change into your production workflow, you are either working in total chaos or you have a secret we would really like to know about here at JATS-Con!) When there are non-backwards-compatible changes, a significant number of users stay with the previous version.

The JATS Standing Committee, aware of the high costs of backwards-incompatible versions of the JATS Standard, tries very hard to meet the changing needs of users in backwards-compatible ways. Requests that cannot be met in a backwards-compatible way are either rejected or put on a list of non-backwards-compatible suggestions to consider the next time a full number version (that is, a non-backwards-compatible version) is considered. (A “point release”, for example, version 1.3 following version 1.2 is fully backward-compatible. A number release, for example, version 2.0 after version 1.8 would contain some non-backward-compatible models. Note: we are now working on point releases, meaning fully backwards-compatible enhancements to JATS 1.0.)

We sometimes find ourselves making awkward changes in order to add the new information that has been requested without making backward-incompatible changes. There have been several changes recently that could have been made much more gracefully, and that would result in much simpler models and more interchangeable documents, if the change had been made without regard to backward compatibility (or if we had known when the original model was written what we know now).

An Unacknowledged Problem: The Costs of Model Relaxation

The costs to the community of losing members at non-backwards transition points are significant, and may be the most visible costs of model change. However, I think we do not pay enough attention to the costs of one of the types of completely backward-compatible model changes: model loosening. We do not, and are unlikely to, hear of a user dropping JATS because a model was loosened. And, in fact, we rarely if ever hear a complaint about loosening a model, but we may have increased the workload for many JATS users.

By loosening JATS, and most specifically by loosening the Blue/Publishing model, we are reducing the value of that model. The reason many users want to use the Publishing model is that the tighter models help them create XML documents that are easier to interchange, easier to display, and of generally higher value. That’s why when they have a problem encoding something in one of their documents using the models in Blue/Publishing they don’t want to switch to the Green/Archiving model; they want us to loosen Blue just a little bit so they can continue to benefit from the generally tighter rules.

If it were only one model that were loosened, it probably would not matter. But we are not loosening one and only one model. We loosen one, and another, and another. It is sort of like adding a drop of water to a full glass of water. One more drop is probably fine. And perhaps another. And another. But after a while there really is more water in that glass than it can hold, and it spills over.

An Example: The Reference Type Attribute on Citations

An example of loosening existing constraints is changing the value of the attribute @ref-type from a specified value list to character data with a suggested value list in Blue (the Publishing model) and Pumpkin (the Authoring model). It was already totally open with a suggested value list in Green (the Archiving model).

Why Did It Happen

So, how did this come about? The request to the JATS Standing Committee was:

‘add ref-type="bio" to allowable values for xref/@ref-type’ for cross-reference to biographies.

The enumerated value list for @ref-type already included Affiliation, Appendix, Boxed Text, Footnote, List, Section, and Table. Adding Biography seemed reasonable. The Standing Committee discussed it and decided that there was no reason to think that this would be the last thing anyone wanted to link to and it would be easier if we make the value of @ref-type unlimited text and provide a suggested value list.

I have to confess that I did not object to this change at the time. In retrospect, I wish I had. Part of the reason I did not object is that in addition to the values I just told you about, all of which seem valuable to me, @ref-type values also included some junk that didn’t seem all that valuable. Values included Keyword (who cross-references a keyword?), Plate, Scheme, and my personal least favorite value “Other”. (Why do I dislike “Other”? Because it provides no information that would not be provided by simply not using the @ref-type attribute! I don’t like information-free data.)

Why do I now regret not objecting to opening up the values of the @ref-type attribute? Partly because we did it for a bad reason. We opened it up because there might be more maintenance requests. JATS is under continuous maintenance; we have a mechanism to do that maintenance. Our mission is to respond to explicit requests. The users did not request that much loosening. A bigger problem is that some members of the Standing Committee have since cited that decision as precedence. We have, somehow, in some member’s minds, a policy of unconstraining attribute value lists.

Implications

In XML, at least in XML that is constrained by DTDs, we have only one mechanism for providing content restriction that users can rely on; we have attribute value lists. If the content of an attribute is constrained by a value list, the receiver of a document can know what the possible content is and can be prepared to process it. When we unconstrain the values, the receiver cannot rely on the grammar to provide this regularity. There may be anything given as the attribute value.

When there was a limited list of values, one of the values was “ref”. This was used to identify cross references to citations, which are very important and which are often displayed differently from other cross references. It would be reasonable for a display system that was designed for documents valid to the Publishing model to use this attribute value to create reference links. And, as long as it was the only reasonable option for this link type, that would be reasonable.

But if the value list is unconstrained, even with a suggested value list, things may get tricky. Imagine someone tagging a document that has two lists of citations, one called “References” and one called something else (for some journals one called “References” and one “Data Citations” or in the case of many standards, one called “Normative References” and one called “Bibliography”). For standards, both of these lists contain citations to other documents, but they have very different roles in the standard and very different levels of importance. If the only value for @ref-type that could possibly be used for cross references to either of these is “ref”, that is what people will use. But it would not surprise me if a tagger looked at the suggested values, decided that “ref” was for References, and made up the value “bibl” for cross references to the Bibliography. And systems that know how to render and process ref-type="ref" will not know what to do with these ref-type="bibl" cross references. And, of course, a user might decide to use ref-type="bibliography" or ref-type="bib" for the same thing. No stopping them, at least not in the grammar.

I should point out that there is another way to achieve the goal of knowing the sort of thing to which a cross-reference points traverse the link and look. This is an option that will always work, that is not vulnerable to creative tagging, and which will work in Green/Archiving (which never did have a constrained list of values for this). My point is not that we can’t work around this loss, but that this is a loss.

Your Use Case Probably Matters

I find it interesting that many JATS users want to use Blue (Publishing) instead of Green (Archiving). Some have very clear reasons for that, such as that they provide content to a business partner who requires Blue. Others prefer it because they feel that it is somehow better. (Many of these people also feel that Element Citation is better than Mixed Citation.) They want to use the more limiting model because it has more constraints on what their documents may contain, making the documents easier to process and manipulate. And these are the people who make “Greenification” requests. They want to use Blue, and are offended at the suggestion that they use Green, but there is just this one little thing in Blue that is too restrictive for their use case. So, they want us to loosen up just this one little thing. And as I said at the beginning of this talk, odds are good the Standing Committee will give them what they want, each and every one of them, because there is a user with a good use case and examples of documents that need this new looser model. These, in general, are people who look at JATS from the point of view of creating tagged documents. They are not, on the whole, looking at JATS from the point of view of receiving tagged documents and trying to DO something with them. If we take this “make it easy to tag the documents” philosophy to the extreme, anyone will be allowed to put anything anywhere, and nobody will be able to do anything with the tagged documents, making them “Write Only” documents.

The problem, as I see it, is that we have a request from a user saying loosen this more for me and there is no other user saying “Don’t water down rules I find useful”. Perhaps we need to assign the role of “defender of the status quo” to one participant in each meeting.

Layered Systems: Other Ways to Constrain JATS Documents

Thus far, I have been writing as if the models and documentation produced by the JATS Standing Committee are the beginning and the end of the rules for tagging JATS documents. Fortunately, this is not the case.

JATS is not just a standard, it is the center of an active community that provides support for users in a variety of ways, based on a variety of goals and created by a variety of sources. JATS-Con, the conference at which this paper is presented, is at the heart of the JATS community. JATS-Con provides a place for JATS content creators to talk with each other, with tool and service vendors, and with potential interchange partners. Similarly, it provides tool and service vendors access to users, and provides libraries, archives, and other consumers of JATS documents with access to the creators and their vendors.

Layers of Document Validation

Several speakers at previous JATS-Con conferences have discussed the virtues of layered validation systems. Of having some document constraints expressed by, and enforced by, unchanging or fundamental rules and other constraints layered on top. Some have suggested that the layered rules may change for different places in the document lifecycle or with different business partners. Others have discussed customizing JATS through a layer on top instead of changing the JATS models themselves.

Model (DTD, XSD, RNG)

The rules of the JATS Standard are expressed in prose in the Standard. For example, in the Standard, the model for <sec-meta> Section Metadata in the Publishing model is:

  • The following, in order:
    • <contrib-group> Contributor Group, zero or more
    • <abstract> Abstract, zero or more
    • <kwd-group> Keyword Group, zero or more
    • <permissions> Permissions, zero or one

The Standard does not provide the grammar in any useful checkable form. The non-normative materials on the NLM site provide grammars in DTD, XSD, and RNG forms. These models, which most JATS users consider fundamental to JATS, are technically a layer on top of the Standard, and their use is not required by the Standard. These are a convenience, which I suspect most, if not all, JATS users adopt.

Tag Libraries and Other Documentation

In addition, the JATS Standing Committee provides a significant amount of non-normative documentation. There are Tag Libraries, which contain clarifying remarks about many structures, examples, discussions of tagging and common tagging practice, advice on how to approach some common tagging situations, and help understanding the tag set and the documentation. There are FAQs and pointers to other materials.

Guidelines and User-group Guidance

These materials, provided by the JATS Standing Committee are just the beginning. Many of the organizations that receive documents in JATS also provide tagging guidelines that describe what their systems expect and how to tag JATS documents that will work gracefully in their environment. Many of these are available only to the customers/users of that service. The PubMed Central Tagging Guidelines [13] are one example of this sort of guideline. They provide detailed instructions on how many of the JATS structures are to be used in articles submitted to them. For example, these guidelines tell users that:

“PMC uses the subject groups tagged within <article-categories> to generate headings for the Tables of Contents (ToC). Articles may have more than one subject group within <article-categories>, but every article must have exactly one subject group with @subj-group-type="heading". PMC will use only the subject group with the specified type of ‘heading’ to generate the ToC.”

This detail might be useful to people creating articles to send other places; it is very important for articles going to PubMed Central. This guidance is not, and should not be, in the general tag library because it is specific to one user organization.

Relatively recently, we have seen formation of several groups that consist of players from throughout the document ecosystem to support the interchange of information within the community of a shared tag set. JATS4R (JATS For Reuse) [8] provides guidelines designed to promote interchange or reuse, by describing how JATS, or specific portions of JATS, should be used. JATS4R guidelines recommend tagging practices designed to make the XML documents more tractable for machine manipulation. For examples, they recommend:

  • Use a separate <ref> element for each citation.
  • Use a unique and internally consistent identifier for @id. Best practice is an alphanumeric sequence common to all citations in your document, followed by an incremental number matching the sequential order of citations.

JATS allows multiple citations inside one <ref>; there are some journals that put multiple citations in one <ref> to group related citations and/or to save page space. The JATS4R recommendation essentially says “that’s legal JATS but it makes interchange a lot harder, so don’t do it”.

Additional Validation Tools

Speakers at previous JATS-Cons have suggested that Schematron be used in conjunction with document grammars (DTDs, XSDs, or RNGs) to add organization-specific business rule checking to JATS and to do different checks at appropriate places in the document life cycle [2, 10, 15, 16 20].

To over simplify, the reasons to use Schematron as a rules checker in addition to one of the JATS DTDs, XSDs, or RNGs are:

  • To add additional constraints specific to your documents or environment,
  • To add checks or constraints that cannot be expressed in the document grammar,
  • To be sure that the constraints you add do not make your documents violate the rules of JATS, which might happen accidentally if a non-expert modifies a JATS DTD, XSD, or RNG,
  • To have different constraints at different points in the production process or different constraints for different business partners, and
  • To be able to add new checks and/or constraints easily at any time.

An example of some JATS-related Schematron is available on GitHub from eLife [4].

JATS4R has made some Schematron, with documentation and support functions, available at GitHub [6].

Other Checking Languages

I do not want to imply that Schematron is the only language in which document validation tools that work on top of or in addition to the published grammars can be written. The PMC Style Checker, for example, is written in XSLT. [11].

My Recommendations

So how should JATS evolve and grow, and what can we do to help it move in directions that will be good for the community?

Participate in the Community

JATS is a community effort. I encourage JATS users to participate actively in the community. There are many ways you can do this:

  • Email discussion lists — Subscribe to discussion lists relevant to your tagging interest
    • JATS-List [9]
    • NISO-STS-List [14]
    • PMC-Tagging Guidelines [12]
    • Ask questions on the discussion lists, and answer them
  • Community Groups — Participate in JATS4R [8] [5], STS4i [7], Force11 [3], and other groups working to enhance information interchange. If you can, join. If that is not possible, follow their work, comment on it, and actively decide if their recommendations are relevant to your use.
  • Attend JATS-Con, and talk with your fellow attendees at breaks
  • Share successes and pain by speaking about your use of JATS
  • Provide feedback and suggestions for improvement to the maintainers of the tag set(s) you use:

Resist Loosening Models

There is some balance provided by the guidelines provided by document recipients and increasingly effective balance provided by the guidelines and tools provided by the groups promoting interchange (JATS4R and STS4i), but that does not mean that all structure and consistency should be moved from the model layer to the next layer. Think about how generally useful a looser model would be before asking for it, and expect the maintenance groups to consider this as well.

Help Resist Scope Creep

Stand with me to resist scope creep. Gradual expansion of JATS’ scope means gradual reduction in the value and utility of JATS documents. A one-size fits all model, a tag set that will work for all documents, will not work for any of them very well.

Expand the Number of Vocabularies, Not Any One Tag Set

Groups that want to expand JATS to accommodate text books, law books, patient information sheets, instruction manuals, and other document types that are outside the current scope of JATS should be encouraged to:

  • Form a group of people who have the same or similar needs
  • Figure out what they need changed in or added to JATS to meet their needs
  • Build a JATS-based tag set that is optimized to meet their needs
  • Document that tag set and make it available to all JATS users

If they work in an environment in which it is likely that people will want to work with their documents and with documents tagged with JATS, BITS, or STS (the existing members of the JATS family), I recommend developing the new model within the JATS Compatibility Meta Model [19], which was discussed at JATS-Con 2017 [18] and at Balisage 2016 [17].

References

1.
American National Standards Institute/National Information Standards Organization (ANSI/NISO) ANSI/NISO Z39.96-2015, JATS: Journal Article Tag Suite, version 1.1. 2015. Baltimore: National Information Standards Organization. Available at: https://groups​.niso.org​/apps/group_public/download​.php/15933/z39_96-2015.pdf.
2.
Blair J. “. “Developing a Schematron–Owning Your Content Markup: A Case Study.” In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2012 [Internet]. Bethesda (MD): National Center for Biotechnology Information; (US). 2012. Available at: https://www​.ncbi.nlm​.nih.gov/books/NBK100373/
3.
Force11. Available at: https://www​.force11.org/
4.
GitHub Repositories: elifesciences/elife-schematron. Available at: https://github​.com/elifesciences​/elife-schematron. (JATS-related Schematron from eLife).
5.
GitHub Repositories: JATS4R (JATS for Reuse). Available at: https://github​.com/JATS4R.
6.
GitHub Repositories: JATS4R/validator. Available at: https://github​.com/jats4r/validator. (JATS-related Schematron, with documentation and support functions).
7.
GitHub Repositories: STS4i (Standards Tag Suite XML for interoperability). Available at: https://github​.com/sts4i.
8.
JATS4R (JATS for Reuse). Available at: https://jats4r​.org/
9.
10.
Kraetke M, , Bühring F. “. “A Quality Assurance Tool for JATS/BITS with Schematron and HTML reporting.” In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2016 [Internet]. Bethesda (MD): National Center for Biotechnology Information; (US). 2016. Available at: https://www​.ncbi.nlm​.nih.gov/books/NBK350149/
11.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), PubMed Central (PMC). PMC Style Checker (written in XSLT). Available at: https://www​.ncbi.nlm​.nih.gov/pmc/tools/stylechecker/
12.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), PubMed Central (PMC). PMC-Tagging Guidelines — Resources. Available at: https://www​.ncbi.nlm​.nih.gov/mailman/listinfo​/pmc-tagging-guidelines.
13.
National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), PubMed Central (PMC). PubMed Central Tagging Guidelines. Available at: http://www​.ncbi.nlm.nih​.gov/pmc/pmcdoc/tagging-guidelines​/article/style.html.
14.
15.
Schwarzman AB. “. “JATS Subset and Schematron: Achieving the Right Balance.” In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017 [Internet]. Bethesda (MD): National Center for Biotechnology Information; (US). 2017. Available at: https://www​.ncbi.nlm​.nih.gov/books/NBK425543/
16.
Schwarzman AB. “. “Superset Me—Not: Why the Journal Publishing Tag Set Is Sufficient if You Use Appropriate Layer Validation.” In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet]. Bethesda (MD): National Center for Biotechnology Information; (US). 2010. Available at: https://www​.ncbi.nlm​.nih.gov/books/NBK47084/
17.
Usdin BT, , Lapeyre DA, , Randall L, , Beck J. “. “Graceful Tag Set Extension.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). https://doi​.org/10.4242/BalisageVol17​.Usdin01. Available at: https://www​.balisage​.net/Proceedings/vol17​/html/Usdin01/BalisageVol17-Usdin01​.html.
18.
Usdin BT, , Lapeyre DA, , Randall L, , Beck J. “. “In pursuit of family harmony: Introducing the JATS Compatibility Meta Model.” In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017 [Internet]. Bethesda (MD): National Center for Biotechnology Information; (US). 2017. Available at: https://www​.ncbi.nlm​.nih.gov/books/NBK425547/
19.
Usdin BT, , Lapeyre DA, , Randall L, , Beck J. . JATS Compatibility Meta-Model Description, Draft 0.7. July 12, 2016. Available at: http://www​.niso.org/apps​/group_public/download​.php/16764/JATS-Compatibility-Model-v0-7.pdf.
20.
Usdin T, , Lapeyre DA, , Glass CM. “. “Superimposing Business Rules on JATS.” In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet]. Bethesda (MD): National Center for Biotechnology Information; (US). 2015. Available at: https://www​.ncbi.nlm​.nih.gov/books/NBK279902/
Bookshelf ID: NBK493526

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...