NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2011.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011 [Internet].

Show details

Book Publishing with JATS


Author Information

JATS is gaining in popularity as a viable tag set for book publishing. A number of publishing organizations have adopted the NLM book tag set as their core publishing DTD. This paper will present a series of case studies of not-for-profit and commercial organizations that have adopted the NLM book DTD for titles ranging from reports and reference materials to higher education textbooks.

The paper will examine why the NLM book DTD was chosen and analyse the benefits and challenges post-implementation. Furthermore, the paper will detail how more complex design requirements were achieved within the NLM book ecosystem. Finally, the paper will touch on how content changes made within the layout environment were reconciled back to the NLM source, and note the challenges faced in such a process.

In the textbook publishing environment, textual content can take many forms including paragraph text, poetry, case studies, readings and plays. Images can vary from cited figures to chapter opening vignettes and filler images. It is also common to find dozens of different table styles in a textbook. This paper will demonstrate how a number of publishers used the NLM book DTD v3.0 to produce complex titles requiring all the above variations without extending the DTD and keeping to the original design goals of JATS.

Book production workflows

Book production workflows can be classified into two categories: content centric and design centric.

In a content-centric workflow, the visual design and presentation decisions are secondary to the requirements of the content. In such a workflow concessions may be made in page design, page or column length and other production parameters to accommodate the content unchanged. Such a workflow allows for good separation of content from presentation and therefore the use of XML is appropriate. Most scholarly reference titles would fall into this category and an example is shown in Fig. 1.

Fig. 1. Sample page spread from a typical Scholarly reference title.

Fig. 1

Sample page spread from a typical Scholarly reference title. Samples pages from IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 100

In a design-centric workflow, the visual design and presentation of the publication are as important as, or in some cases more important than, the content. In such a workflow the content may be written and edited with a specific design or page or column length in mind, and the content may be changed during production to better suit these design requirements. Design-centric workflows may not allow for separation of content from presentation and therefore the use of XML is less appropriate. An example of design-centric book is a children’s illustrated reading book and an example is shown in Fig. 2.

Fig. 2

Fig. 2

Sample page spread from a children’s illustrated book Geronimo Stilton The Amazing Voyage

In many cases the distinction between design centric and content centric is not obvious. Some books can be described as content centric with design-centric elements or design centric with content-centric elements. Fig. 3 shows positioning of different types of books on content versus design-centric axes.

Fig. 3. Different types of books are positioned on a design versus content centric scale.

Fig. 3

Different types of books are positioned on a design versus content centric scale. Titles on the left, design centric, are generally unsuitable for a NLM book DTD (or any other type of workflow that is presentation agnostic). The titles on the right side (more...)


This paper is based on three publishers that use JATS for book publishing. First publisher, the International Association for Research on Cancer (IARC), publishes technical reports using a fully content-centric workflow. The second publisher, the World Health Organization (WHO), publishes journals and various book titles using a workflow can also be described as content centric but there are occasional design elements that require the editor (and by implication the XML) to be aware of the final design of the outputs. The third case study is on Macmillan Publishing Solutions (MPS Limited) that is a production outsourcer for textbook publishers. MPS Limited has a design-centric workflow as the print designs for textbooks are often set well in advance of finalizing the content. In addition to these three specific case studies, the paper will reference other publishers and publications to provide additional examples or clarity.

In all three production processes content is edited in Microsoft Word and exported to NLM book v2.3 or v3.0 using Inera eXtyles. The XML is then passed to Typefi Publish for automated composition within Adobe InDesign. It is possible for the composed InDesign files to be hand edited by designers before publication, however IARC and WHO make all content changes in Word and then regenerate the XML and InDesign files. Changes in InDesign are mainly restricted to styling changes (e.g. text spacing changes to fix short pages). In contrast MPS Limited’s workflow was implemented to make a substantial amount of changes in layout in response to their customer’s requirements. The changed content is then round-tripped back from InDesign to NLM for conversion to publisher specific schemas and DTDs.

Challenges in book production

Book production, when compared to journal production, generally presents a very unique set of challenges. Whereas a journal article exists in its entirety at the point of acceptance, often books are assembled and read in their entirety for the first time towards the end of the production process.

The length of time it takes to publish a book also presents challenges: the time from acceptance to publication of a journal article may be a few days to a few months, which contrasts the time from commissioning a book to publication that can take from six months to a few years (and in some cases decades).

In addition, as books tend to be longer than journal articles, there are almost always a substantial number of changes to book content during production, which can range from simple changes such as updating statistics to the rewriting of whole chapters that were developed early in the publishing process.

Books are more likely than journals to be produced using desktop publishing tools (DTP) such as Adobe InDesign and Quark Xpress mainly because of the requirement for very complex page designs as shown in Fig. 4.

Fig. 4. Complex page design in a textbook.

Fig. 4

Complex page design in a textbook.

Book chapters may be composed out of sequence (i.e. chapter 5 before chapter 3) as content is received and the time lapsed between composition of the first chapter and the last chapter can be as much as six to 12 months. This means any corrections to early content may need to be made in the desktop publishing product.

Choosing JATS

Although “Book Tag sets were written with a more modest purpose, to describe volumes for the NCBI online libraries”,6 there are a number of good reasons to choose the NLM book DTD for wider publishing workflows. The fact that the DTD is a very good fit for the content is an important reason to choose the tag set. This tends to be the case for scholarly publishers with content-centric workflows and is the case for both IARC and WHO.

Publishers that are already using the NLM journal DTD may prefer to adopt the NLM book DTD because it allows them to have a single workflow and a single set of tools across book and journal production. This synergy significantly increases the usability and acceptability of process and brings down any artificial barriers that may exist between journal and book production staff. In some cases the same staff works on both book and journal production, as is the case with WHO, and using JATS for both enables them to easily switch from book to journal production with very little productivity impact.

Another advantage of choosing the NLM book DTD is because it is seen as a very good interchange DTD, for where the publisher needs to deliver XML in various other DTDs. NLM book provides a good meta-data model and a high level of granularity allowing for automated transformation to other DTDs. This is one of the key reasons in MPS Limited’s decision to use the DTD.

Although quite often overlooked, NLM provides a good set of free and commercial production tools that allows for quick implementations and a fast return on investment. Being an open tag suite, publishers can purchase off-the-shelf tools and hire established experts for their projects without having to invest in creating a custom schema/DTD and building tools to support that. In all three cases the availability of reliable tools significantly influenced the selection of the NLM DTD.

Using JATS

Based on their output, most publishers will implement either a design-centric or a content-centric workflow within their production team. It is important for a publisher to decide on a content-centric or design-centric workflow based on their publishing requirements, staff competencies, market requirements and available tools. Larger publishers may implement both types of workflows but very rarely would a single production team run both design-centric and content-centric workflows in parallel.

Broadly speaking, the NLM book DTD is not suitable for a purely design-centric workflow. The main reason for this is that one of the key philosophies behind JATS is the separation of content from presentation. The introduction to JATS states: “The exact replication of the look and feel of any particular journal has not been a consideration”.7 Therefore books at the extreme end of the design-centric axis are considered out of scope of this paper.

For publishers that have a mainly content-centric workflow but also have a number of design-centric elements, there are a number of techniques to facilitate the design-centric requirements in a content-centric workflow.

One such technique would be to extend the NLM book DTD to include presentational meta-data. This is done by publishers, such as the American Chemical Society (ACS), that employ multiple DTDs for authoring and production to overcome the need for presentational meta-data by adding “an additional set of vocabulary that is focused on page layout”.8

Another technique is to use attributes such as content-type, fig-type and list-type to provide semantic and formatting related information. In some cases this is in line with the design principles of JATS. An example of this is the content-type attribute where the documentation states: “The presence of a @content-type attribute may be used to treat its element in a special way, for example, giving the work, phrase, or structure a different look in print or on display”.9

Conversely, the documentation for the sec-type attribute states: “best practice, this attribute should be used only if the section is one of the types listed”.10 But a book chapter may have many sections starting with a chapter opening vignette Fig. 5, may contain sections like poems, labs, chapter summaries or exercises. Each of these sections is a valid semantic section and may have very specific display requirements.

Fig. 5

Fig. 5

Sample page design showing how <sec> elements and @sec-type are used to markup a chapter vignette

Between these two techniques is the list-type attribute that is documented as: “Identifies the type of prefix character that precedes each list item… Although designed to accept any text as its value, the following are suggested list types, which name the prefix character for each item in the list”.11 While maintaining the list-prefix requirement, additional list types are often introduced for book publishing such as learning objective lists (Fig. 6) or step lists where each list item is prefixed by “STEP [number]”.

Fig. 6

Fig. 6

Custom list type with “LO” as the prefix for a learning objective in a chapter

The <disp-quote> tag has been used frequently as it allows nesting within in a <p> tag. In many cases the tag did not hold “Extract or extended quoted passage from another work”.12 but was used as <disp-quote content-type=”code”> or <disp-quote content-type=”sonnet”>. This is important as some of the material, such as a line of computer code or a song verse, was nested within a logical paragraph.

A method used to add design centric elements is the use of processing instructions to provide composition information. Such instructions can range from sizing and orientation information for a composed page to specifying the semantic meaning and the design parameters of content element (e.g. Fig. 7).

Fig. 7. Another technique used to derive different visual treatments for the same JATS element is to use the content.

Fig. 7

Another technique used to derive different visual treatments for the same JATS element is to use the content. On the left of the figure is a “Chapter Opener Box” the content is formatted differently based on the contents of the <title> (more...)

Editing in layout

In a typical publishing workflow, most of the content edits and changes are done before composition. After editing a composition operator or an automated system will generate typeset pages. During this process an operator may spend a significant amount of time designing pages (e.g. page spread in Fig. 4), making it generally accepted as undesirable to make any edits after composition because these changes will need to be made in the layout tool rather than in the XML. An alternative is to make the changes in the XML (the editorial tool) and re-run the composition. But if significant design resources are required to redesign the pages after automated composition this approach is generally not economical or practical.

There are a number of reasons why a publisher may wish to edit content in layout. One reason is to take in corrections or updates that arise due to the longer publishing cycle of a book, for example where the first chapter may need corrections after a later chapter had been produced. Another reason is to edit for copy fitting, which is done on a macro scale (e.g. trying to fit the book to a defined page limit) or on a micro scale (e.g. making a margin term definition fit on the page). Another common reason for editing in layout may be imposed by the publisher’s workflow and tools; if the publisher has no way of regenerating a layout from the XML efficiently (e.g. lack of automation) then changes will always be done in layout.

Regardless of the reason, it is important that any changes done in the final composed file be captured in the XML. The only exception would be changes due to copy fitting where the publisher will continue to use the XML as the master super set of content and accept that the print version is a rendition of a subset of the content.

It is possible technically to round-trip content back from the composed page to XML. To do this effectively the designer has to be mindful and take due care when editing the composed pages as there are a number of common issues including restyling content. For example a compositor may re-style a heading level 2 to a heading level 3 for spacing or copy fitting purposes and not for semantic reasons. Regardless of the reason, mapping these changes back to NLM can be difficult. Solution here is for the operator to be very diligent when copy fitting and use a mechanism for introducing new NLM elements within the layout tool.

Sometimes a single item of content in the XML is duplicated in the composed title, an example of this would be with glossary definitions where the edited content and XML will contain a single set of <def-list> and <def-item> elements but the composed page might contain margin definition boxes in addition to the glossary at the end of the book (particularly in textbooks). Further to this duplication of content, the text of the margin definition boxes might be edited for fit and context in layout. It is important for the publisher to define how such duplicated, and possibly not identical content, should be handled. One option is to round trip one set and ignore the duplicates; for example the publisher might decide to drop the margin element definitions and keep the glossary section content. Such an approach may cause difficulties as any derived products will not contain identical content to the printed product; for example where jurisdictions require that accessible content of textbooks delivered to disabled students be identical to the printed versions (including mistakes).

Lessons learned

  • It is important for a publisher to consider their whole publishing list when deciding whether the NLM book DTD is appropriate. Not all books are suitable for the NLM book DTD but the presence of design-centric books does not preclude its use. If a substantial portion of a publisher’s content is suitable for the NLM book DTD, there are efficient and economical ways to make the rest work too.
  • A substantial percentage of books require corrections in layout. Even though the process that does not require layout changes is more streamlined and efficient, a publisher needs to decide on a process for reconciling the changes made in layout with the XML.
  • The three implementations studied in this article range widely from content-centric to design-centric workflows and the NLM book DTD has proven adequate for all three without needing any extensions. One can always avoid the introduction of a <lab> tag by using <sec sec-type=”lab”>.
  • Providing a more generic form of <disp-quote> or another generic block element (opposed to a specific element like <verse-group>) that can go inline would be useful.
  • Relaxing some of the strict recommendations (e.g., recommendation about section-type attribute values) in the documentation would also be advised. Although they are justified for journals, for books they seem restrictive and may inhibit take-up.


The NLM book DTD has proven to be a very flexible and usable book publishing DTD. The three cases studied in this paper have shown that the NLM book DTD can cater to a wide range of titles from the scholarly and professional reference to higher education textbooks. If a publisher is already using JATS for journals, using NLM book DTD for books is straightforward and a simple step.


IARC Working Group on the Evaluation of Carcinogenic Risks to Humans. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 100, A Review of Human Carcinogens, Part A: Pharmaceuticals. Lyon: World Health Organization International Agency for Research on Cancer; 2011. ISBN 978-92 83213185.
Dami, E. The Amazing Voyage: The Third Adventure in the Kingdom of Fantasy, special edition. Scholastic Inc.; 2011. ISBN 978-0545307710.
Cathy B, Vivienne L, Doug C, Alan K, Stephen M. Humanities Alive Geography 1, 2nd edition. John Wiley & Sons Australia; 2010. ISBN 978-174216 1082.
World Health Organization. World Report On Disability. World Health Organization, Geneva, Switzerland 2011. ISBN: 978-9241564182.
WHO manual of diagnostic ultrasound. Vol. 1, 2nd edition. Geneva: World Health Organization; Switzerland 2011. ISBN 978-9241547451.
Book and Collection Tag Library, version 3, General Introduction. Bethesda: National Library of Medicine; 2007. http://dtd​​/book/tag-library/n-cu00.html.
Archiving and Interchange Tag Set, Bethesda: National Library of Medicine; 2007. http://dtd​
O'Brien D, Fisher J. Journals and Magazines and Books, Oh My! A Look at ACS' Use of NLM Tagsets. Proceedings of the Journal Article Tag Suite Conference 2010, National Center for Biotechnology Information (US); 2010.
Book and Collection Tag Library. version 3.0 [Attribute: content-type]. Bethesda: National Library of Medicine; 2007. http://dtd​​/book/tag-library/n-eei0.html.
Book and Collection Tag Library. version 3.0 [Attribute: sec-type]. Bethesda: National Library of Medicine; 2007. http://dtd​​/book/tag-library/n-ikq0.html.
Book and Collection Tag Library. version 3.0 [Attribute: list-type]. Bethesda: National Library of Medicine; 2007. http://dtd​​/book/tag-library/n-gvj0.html.
Book and Collection Tag Library. version 3.0 [element: <disp-quote>]. Bethesda: National Library of Medicine; 2007. http://dtd​​/book/tag-library/n-6eu0.html.
Copyright © Typefi Systems Pty Ltd. All rights reserved.

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

Bookshelf ID: NBK62098


  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...