NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2015.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet].

Show details

A complete end-to-end publishing system based on JATS

.

Author Information

JATS is fast becoming the de facto standard in scholarly publishing. We describe a web-based platform that handles the complete publishing process from authoring to publication, where the file being worked on is always a native JATS XML. This method minimizes the risk of errors in format conversion, and in addition speeds up the publication process.

The software used is primarily PHP and MySQL with other free software to allow visual formatting. During the authoring process, the file is saved as HTML, but upon submission it is converted to JATS. Thereafter, all conventional stages of production, e.g. peer review, copy editing and author proof correction are carried out on the JATS file directly. Any data (e.g. editor queries to author) that are not to be held in the final delivered file, are saved as processing instructions.

Once author corrections are approved, the XML is automatically converted PDF, using the TeX engine for pagination. If any fine-tuning is required, this is done by the typesetter, or even by the publisher, through a graphical user interface which embeds the relevant PIs, and then recreates the PDF file. We will give a live demonstration of the system.

Introduction

Almost all publishers now require that an XML is delivered by their composition suppliers, as the definitive archive of the publication. And increasingly the DTD of choice is JATS. This ubiquitous requirement for XML results in two main challenges for publishers and for their production vendors:

  • Creating a granular XML file from the file submitted by the author (e.g. Word, LaTeX).
  • “Rendering” the XML to obtain PDF, Epub, etc.

It is instructive to review typical present scenarios and the associated problem, then look at our solution to the problem.

A typical workflow

Almost all workflows are based around sending files, via email attachments, web uploads, or APIs, from one individual or group to another. The file is typically a text file, e.g. Word or LaTeX, or a PDF file, e.g. for author proof checking. Although stakeholders are constantly trying to make the process more efficient and more economical, the process of sending files back and forth remains.

Fig. 1.   typical (simplified) workflow with files being sent back and forth between stakeholders.

Fig. 1   typical (simplified) workflow with files being sent back and forth between stakeholders.

Creating XML

The most common files submitted by authors to publishers are Word and LaTeX files. In principle, if written perfectly, both can be used to produce structured files automatically, but in practice this is almost unknown. The reasons in each case are as follows:

Microsoft Word

By its nature, Word (or any word processor) is a page-based “WYSIWYG” writing system. The writer receives feedback on what has been written by looking at the screen. For example, a word that is bold might be signalling one of many attributes: an emphasized word or phrase, a figure label, a mathematical vector, etc. The author would normally choose “bold” from a menu, giving the right look to the document. Unfortunately it is impossible to convert the resultant file automatically into XML, as a different tag would have to be used in each of the example cases given. In almost all cases, the files are sent to a compositor, or typesetter, whose job, amongst other things, is to “tag” the content logically to create an XML file. There are 3rd party automated or semi-automated programs that can help this process by intelligently “reverse engineering” the content, and putting in logical tags where needed.

TeX and LaTeX

LaTeX (a user friendly version of TeX) is not WYSIWYG, and uses mark-up to distinguish different elements in a text file. LaTeX encourages logical mark-up, e.g. important for emphasis. So in principle it is possible to create a fully structured file that can be converted to XML automatically. In reality, LaTeX is actually a programming language, and gives authors complete freedom in forming their document. Two LaTeX files might produce the same output, but they might have very different levels of structuring.

In practice, just as with Word file, LaTeX files are sent to specialist compositors who transform them into XML using various techniques, with different levels of automation.

Rendering XML

The major driving force behind archiving XML is to “future-proof” the content, i.e. to allow future formats to be produced with little or no extra effort. For this reason the XML has to be guaranteed to match the PDF and other formats that are delivered to the publisher. This is quite a difficult problem. Many typesetters achieve this by meticulous checking and proofreading. The problem is that the XML cannot be, or at least almost never is, proofread. It is the PDF that is normally read for accuracy. So it is always possible that a late change is not updated reflected in the XML.

In our opinion the most reliable way of ensuring complete fidelity between XML and the conventional formats like PDF, say, is to ensure that the PDF is generated directly and fully automatically from the XML. In fact for many years now publishers have specified an “XML-first” workflow, precisely to ensure accuracy of the XML, but the interpretation of XML-first varies, and some manual “parallel” corrections are no doubt taking place, especially during final corrections to a paper. We believe that XML-first should be interpreted in its more “pure” form, meaning that every PDF generated, including proofs going to authors, is generated fully automatically from the corresponding XML file, thus virtually guaranteeing one to one correspondence between the two files. We do admit this full automation is not an easy task and involves considerable programming, but it is possible and we believe the effort pays off long term.

An alternative to the traditional workflow

We set out to create an alternative to the current publishing workflow with the following requirements:

  • XML is the definitive archive of the publication and its accuracy is paramount
  • To all intent and purposes, XML is the “format of record”
  • Authoring, submission, and checking proofs must be user-friendly experiences
  • File uploads and attachments should be minimized
  • Fast publication speed is paramount
  • All formats (PDF, Epub, HTML, etc) must be produced automatically from XML

As JATS is now the de facto standard in publishing, we have used JATS as the DTD. The system we decided on is shown in figure  2 .

Fig. 2. An alternative to the conventional file-based publishing workflow.

Fig. 2

An alternative to the conventional file-based publishing workflow.

The new workflow is based on a single JATS file in the cloud, with any graphics or supplementary material saved alongside. Using role-based control, different stakeholders log into a system and make modifications to the XML, using a user-friendly interface. We have used a framework based on PHP and MySQL for most of the modules making up the system, which we call RVPublisher. The system is applicable to any type of publication, from journal papers to major reference works. Here we will concentrate on a journal workflow. The idea is that there are no files being uploaded or downloaded, until the manuscript has been published. However, the system does allow conversion to PDF at any point in the process, as a temporary check on quality.

The modules that make up the system are the following.

Collaborative authoring (RVRite)

The authoring system uses an HTML front end allowing authors to author in a friendly way. Some of the editing features are:

  • Freeform authoring and editing with familiar blog-type features
  • Insertion of figures using batch uploading or drag and drop
  • Support for multiple resolutions of figures.
  • Insertion of math elements using point and click or LaTeX code
  • Math displayed using MathJax
  • Author names and affiliations entered using forms or via ORCID log-in
  • References inserted using APIs to reference managers like Zotero
  • Collaborative authoring, allowing any number of authors to edit
  • Full version control

When the authors are ready to submit a file, the corresponding author presses a submit button. At this point the HTML is converted to JATS, and submitted to the publishers.

Fig. 3.  nserting equations in RVRite.

Fig. 3  nserting equations in RVRite.

Peer Review (ReView)

River Valley have created a user-friendly peer review system that works seamlessly with RVRite. As soon as a manuscript is submitted to a "#FF7C47"journal, an email notification is sent to the next “role” in the workflow, e.g. a Journal Editor. The email contains a deep link to the XML file which is temporarily rendered as HTML. The editor can choose to accept and pass for peer review, in which case the next person in line, say an Associate Editor (AE). AE will be automatically guided into ReView, and will assign reviewers and manage the peer review process.

Fig. 4. ReView – a user friendly peer review system.

Fig. 4 ReView – a user friendly peer review system.

Copy editing and proof checking (RVEdit and ProofCheck)

RVEdit and ProofCheck allow editing of existing JATS files using a graphical user interface. RVEdit is for the copy editor, and ProofCheck is for the author who is checking “proofs” online. The systems have some similarities with RVRite, the authoring platform, but they are aimed at modifying an existing JATS file, not creating one from scratch. There are restrictions that can be made as to what can be done. For example, an author cannot modify the title or affiliations, but can answer queries, and make basic changes to text. By using version control, “track changes” are used to show modifications made by the copy editor and by the author.

Fig. 5. ProofCheck, showing changes made by the copy editor.

Fig. 5 ProofCheck, showing changes made by the copy editor.

Fig. 6. Journal editor using ProofCheck to accept or reject changes made by author.

Fig. 6 Journal editor using ProofCheck to accept or reject changes made by author.

VeRify

In order to check that the tagging has been applied correctly to content, a platform called VeRify colors the text according to the tags, and so allows quick checking of the content, even by a non-specialist.

Fig. 7

Fig. 7

RVFormatter

Once a copy edited manuscript has been approved by the author and is ready for publication, it can be converted into a PDF file automatically and in the server, using RVFormatter. Obtaining a good PDF is one of the most important aspects of any publishing system and we have looked in detail at how we can go from XML to PDF automatically but have good quality typesetting too. In order to obtain the highest typographic quality in the PDF, we decided to use the TeX typesetting engine for the pagination.

There are several methods of going from a XML file to PDF, including ConTeXt, Pandoc, PassiveTeX, and XSL/FO. Having examined all, we decided that none fitted our requirements and those of the publishers, so we wrote our own “filter”, using TeX as the pagination engine.

For simple (single column) papers the filter creates well typeset pages automatically and usually without the need for manual tweaking. For more complex pages, for instance double column pages with single and double column “floating” figures, high quality automated pagination is not possible with any system. In these cases RVFormatter allows the user to insert comments or processing instructions via the HTML interface, in order to tune the pagination. The user does not see any code, but only commands from a drop down menu, e.g.

  • Do not hyphenate
  • Try to break page after this line
  • Put figure at top of page
  • etc

By putting all pagination instructions in the XML, we achieve two goals simultaneously: truly automated XML-first pagination, and beautifully typeset pages.

Conclusion

We have successfully produced a complete end-to-end solution for publishing system, using JATS as the only format saved throughout the entire process. At any point in the workflow, PDF and other formats can be generated for checking formatting and pagination. JATS remains the definitive “format of record” from which all other formats are generated automatically.

Copyright 2015 by Kaveh Bazargan .

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

Bookshelf ID: NBK279828

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...