NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2013 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2013.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2013

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2013 [Internet].

Show details

What JATS Users should Know about the Book Interchange Tag Suite (BITS)

.

Author Information

The Book Interchagnge Tag Suite (BITS) is a book model based on the JATS article model. There are many things that can be structured the same way in both a Journal Article and a Book (or a part of a book), and some things that are very different. We'll review the things you 'get for free' if you are already familiar with the article model, and what parts of the book model you will need to pay a little more attention to.

Introduction to BITS — History

Why a new Book Tag Set?

The Book Interchange Tag Suite (BITS) is an XML vocabulary that is an extension of NISO Z39.96-2012 Journal Article Tag Suite, or JATS. There are already a number of book XML models including DocBook[1], the Text Encoding Initiative (TEI)[2], and the NCBI Book Tag Set[3][4]. DocBook was created for technical books and documents, although it has been applied to documents in nearly every discipline. The TEI is used mainly in the Social Sciences, humanities, and linguistics, and the NCBI Book model has been used in Scientific, Technical, and Medial book publishing. NCBI Book was based on the NLM DTD work for journal articles that was a project at the US National Library of Medicine.

The NCBI Book model was written as an extension to the NLM DTDs. Although it has been applied in general SGML book publishing, it was written specifically to support the NCBI Bookshelf. BITS was intended to be a fresh start at a general book model that is familiar to JATS users and not simply an update to the NCBI Book DTD.

What is a book? The Scope of BITS

Many things can be described as "books": novels, cookbooks, textbooks, a series of database records presented together. Because of this, one of the first things that the BITS Working Group did when it met was to set the scope for the project. It was decided that the scope for BITS would be a single completed book or a complete book component such as a chapter, part, or module. It should define both new and legacy book material, and be able to describe both book sets and series.

For documents where time is a factor such as continually updated books or editions that represent a work in multiple versions, the XML model will represent a snapshot of the book or book component at a single point in time. Document versioning should be handled by the publication system and not by the document XML.

The scope of BITS includes

  • Technical monographs
  • Government reports
  • Multivolume monograph series
  • Traditionally website-based content such as Gene Reviews (http://www.ncbi.nlm.nih.gov/books/NBK1116/)[6] where each record can be considered equivalent to a book chapter
  • Reference and professional books
  • Conference proceedings
  • Encyclopedias
Atlases and Field Guides are not considered to be out-of-scope, but these may need additional semantic metadata to be represented completely.

STM textbooks also are not explicitly out of scope, but they typically have semantic tagging and processing requirements that are beyond what is available in BITS version 1.0.

BITS 1.0 explicitly excludes the following:

  • Material where the content depends heavily on typesetting (as exemplified by Wired magazine
  • Inconabula and historic editions (or any other content where the format is critical to the intellectual processing or where the focus of the analysis is the text rather than the subject of the content
  • Legislative material such as laws, statutres, codes of regulations, and legal material such as briefs and court reports
  • Children's books
  • Scripture and sacred literature
  • Cookbooks
  • Pamphlets and package inserts
  • Travel guides
  • Dictionaries, gazetteers, and thesauri
  • TV Guides and other magazine-like publications
  • Textbooks K-12 (primary and secondary schools)

What is the relationship to JATS?

BITS is a project of the National Center for Biotechnology Information (NCBI) at the US National Library of Medicine (NLM). While BITS is an extension of NISO Z39.86 JATS, it is not a NISO Standard.

The firt design decision that the BITS Working Group made was that the Tag Set will be based on the most recent version of NISO JATS, including the multi-language capbilities of this structure. If JATS has a named structure and that structure occurs in book content, then the JATS name (and to the extent possible, the JATS model) will be used. This implies that the NISO JATS model will not be improved by the BITS working group. However, the BITS working group made many comments on NISO JATS version 1.0 that were included in JATS version 1.1d1[7].

A draft version of BITS (0.2)[8] was released in October, 2012, and comments were collected until early 2013. The Working Group considered all of the comments, forwarding many of them on to the JATS Standing Committee for consideration to be added to the Journal Article model on which BITS is based.

Once the draft version of JATS 1.1 was released in December 2013, the BITS group built version 1.0 using the latest version of JATS and adding those requested structures that were not added to JATS, like the Question and Answer model.

BITS version 1.0 was released at the end of December 2013. It is not backward-compatible to the version 0.2 draft. The schemas (in DTD, XSD, and RELAX NG) and documentation are available under "extensions" on the JATS non-normative supporting information site hosted by NCBI at http://jats.nlm.nih.gov/extensions/bits. A complete Tag Library is available at http://jats.nlm.nih.gov/extensions/bits/tag-library/1.0/.

Other Working Group Design Decisions

Besides being JATS-based, the Working Group also made the following conclusions that informed details of BITS:

  1. Although the project is supported by NLM, this book model should be useable beyond life sciences publishing as the NISO JATS journal models are useful in physics, social sciences, and chemistry.
  2. There will be two top-level elements: the Book: <book> to contain an entire book and the Book Part Wrapper <book-part-wrap> to contain a book part such as a chapter or module that is handled as a discrete unit.
  3. This is a new Tag Suite, not based on the NLM Book model, but it is informed by NLM Book 3.0.

Introduction to BITS — The Gory Details

Things you get for free as a JATS user

The main content areas of a book chapter are very similar to those of a journal article (Fig. 1). The <body> of both contains paragraphs, sections, figures, tables, formulae, and other structures that JATS users will be familiar with. The <back> of both journal articles and book chapters contain reference lists, appendicies, and acknowledgments and other structures that JATS jusers will be familiar with.

Fig. 1. Comparison of the main content areas of a journal article (<article>) and a book chapter (<book-part>).

Fig. 1

Comparison of the main content areas of a journal article (<article>) and a book chapter (<book-part>).

Because of the Working Group's first design decision (that structures with the same names as those in JATS have the same models as those in JATS), a <p> is a <p>, a <fig> is a <fig>, and a <table-wrap> is a <table-wrap>. There are some book-specific models that are allowed within book content (see New Structures Within below) that will need to be handled by any software that processes book content, but this means that if you have the ability to process journal articles in JATS, you are able to process book content (<body> and <back>) in BITS.

Structures above Chapter/Article

Now that we know that the content area of a book chapter is similar to the content area of an article, we only have to worry about how all of the chapters fit together.

The recursive <book-part<

Books can have many different levels of content. Compare the Tables of Contents (TOCs) of a book divided into Parts (Fig. 2) with one divided simply into chapters (Fig. 3). From the reader's, writers, publisher's, and editor's perspective, this is not a problem at all. A book can have chapters or a book can have parts that have chapters, or a book can have sections that have parts that have chapters (that then, of course, have sections).

Fig. 2. A book with Parts and Chapters.

Fig. 2

A book with Parts and Chapters.

Fig. 3. A book with just Chapters.

Fig. 3

A book with just Chapters.

But this gives the XML modeler something to think about. Do you create an explicit <Section> element that allows <Part>, and <Part> that allows <Chapter> (which gets us down to our basic unit)? If you do this, then the root element (probably <Book>) will either need to allow <Section> and/or <Part> and/or <Chapter> or you will force anyone tagging a just-chapter book (as in Fig. 3) to tag an "empty" copy of each level until they get to the level that has any content.

Fig. 4. Empty level-specific elements needed to represent a simple, flat book.

Fig. 4Empty level-specific elements needed to represent a simple, flat book.

The simpler strategy (from a model writing point of view) is to have one element (<book-part> in our case) that has a type attribute (@book-part-type) that describes the type (or level) of the book part. Then, <book-part> is allowed to recurse, or allow <book-part> children. In this way, you can make as many levels as needed, and if @book-part-type is not a controlled list (which it is not in BITS), then you can call them whatever you like.

Compare Figs. 5 and 6, which both show a book with just chapters.

Fig. 5. A simple chapter-only book tagged with explicit level elements.

Fig. 5A simple chapter-only book tagged with explicit level elements.

Fig. 6. A simple chapter-only book tagged with generalized <book-part>.

Fig. 6A simple chapter-only book tagged with generalized <book-part>.

A Book, One Document or Many Documents?

One of the basic design decisions of the Working Group was that a book should be able to be represented as a single XML document (with a single root element (<book>) or "assembled" from a set of <book-part>s (let's say chapters for now). The later seems a little odd, but it is in line with the JATS article model, which does not describe "issues" or "journals" but includes enough information in each article to allow a publishing system to assemble or create them from a collection of articles.

Modeling books as a single XML document is pretty straightforward. <book< includes book-level metadata (<book-meta>), book-level frontmatter like Preface, Forward, etc (<front-matter> to distinguish it from chapter/article-level frontmatter in <front>), book-level body content (all of the descendant Parts and Chapters in <book-body>) and book-level backmatter (<book-back>).

Building a book from a collection of <book-part>s means that each <book-part> must include (or hace associated with it) the book-level metadata (like each JATS article includes <journal-meta> and issue-level information in <article-meta>). In the NCBI Book DTD, the <book-meta> was an optional element allowed within each <book-part>. For BITS, we've created a <book-part-wrap> for the delivery of one or more <book-part>s. The wrapper contains the <book-meta> and then the <book-part>(s). This keeps the book-level metadata out of the <book-part> content. This also means that there are two root elements that can be used to describe a book in BITS: <book> and <book-part-wrap>.

XInclude

A book represented by a single XML document does not need to be written in a single XML file. Pieces of the document can be declared as XML entities and referenced by a book master file. This is how it was typically done with the NCBI Book DTD, and how it could have been done in the draft version of BITS. See an example of this in Fig. 7. Each of the referenced files (ch1.xml and ch2.xml) needs to be a complete <book-part>, but they can not be standalone XML documents on their own.

Fig. 7. Referencing XML files with entities.

Fig. 7

Referencing XML files with entities.

In BITS 1.0, we added the element <xi:include>,[9] so that books and book parts can be managed as separate files and "included" as needed into a final document. See Fig. 8 for an example.

Fig. 8. Referencing XML documents with XInclude.

Fig. 8

Referencing XML documents with XInclude.

New Structures Within

Book-specific objects

There are two new book- (and book-part-) level objects that need to be mentioned here: <toc>[10] and <index>. Both of these elements allow their objects (a Table of Contents and Index, respectively) to be tagged explicitly rather than generated from the content. The decision to tag these explicitly as content objects is a business decision, but the BITS Working Group felt that there was need to be able to do this in the new model. See the BITS Tag Library for details on <toc> and <index>.

Questions and Answers

The BITS Working Group proposed a simple but flexible Question and Answer model to the JATS Standing Committee as a comment on JATS 1.0. It included new elements for <question>, <question-wrap>, <answer>, <answer-set>, and <explaination>. It models only questions and answers and not quizzes or Continuing Medical Education exams, but the elements can be used to build these items. The JATS Standing Committee did not add the Question and Answer model to JAT 1.1d1 because it was not yet a proven model and there was not a lot of call for it in Journals. The BITS Working Group added these elements to BITS 1.0 because of the need for them in book content. Details on questions and answer are available in the Tag Library.

Conclusions

BITS is a real improvement over the old NCBI Book DTD. The structure of the book is handled better and there are more book-specific objects that book publishers have been requesting. But at its core, the main content area of book chapters and parts uses the same objects that JATS users are already familiar with and more than likely have built systems to operate on.

References

1.
http://www.docbook.org/
2.
http://www.tei-c.org/
3.
http://dtd.nlm.nih.gov/book/
4.
Kasdorf, B. XML Models for Books. http://www.niso.org/news/events/2008/digresources08/agenda/kasdorf_digresources08.pdfKa.
5.
http://www.ncbi.nlm.nih.gov/books.
6.
GeneReviews® [Internet]. Pagon RA, Adam MP, Bird TD, et al., editors. Seattle (WA): University of Washington, Seattle; 1993-2014.
7.
http://jats.nlm.nih.gov/archiving/1.1d1/
8.
http://jats.nlm.nih.gov/extensions/bits/0.2/index.html.
9.
http://www.w3.org/TR/xinclude/
10.
http://jats.nlm.nih.gov/extensions/bits/tag-library/1.0/index.html?elem=toc.
11.
http://jats.nlm.nih.gov/extensions/bits/tag-library/1.0/index.html?elem=index.
This work is in the public domain and may be freely distributed and copied. However, it is requested that in any subsequent use of this work, the author be given appropriate acknowledgment.
Bookshelf ID: NBK159737

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...