NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2015.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet].

Show details

Building an Automated XML-Based Journal Production Workflow

, , , , and .

The biggest challenge of single-source publishing has been capturing changes during the page proof stage. Until now, direct XML editing in composition has required a high-end system (such as 3B2) and trained operators. Even then, PDF-based proofing places a barrier between the actors---authors and editors---and the source XML file, leading to inefficiency and increased chance of error. The key to creating a robust, automated XML-based workflow is an XML proofing and editing environment that requires little to no learning for the author and has comprehensive change tracking for the editor. The PDF can then be created by a lights-out composition system at any point in the correction cycle. Here we describe creating an XML-through journal production workflow that takes early JATS XML from copyedited manuscripts through layout and PDF generation, author and editor correction cycles, and delivery of assets, all controlled by a task-driven production management system.

Single-source publishing—defined here as editing, composing and correcting text and publishing in mutiple formats using one source file—has for some time been a goal of scholarly publishers. The benefits publishers seek include increased productivity, fewer errors introduced during production, and validation of reference information (Meyer et al., 2008; Schwarzman et al., 2004). Achievement of this goal, however, has been stimied by the limits of the traditional publishing workflow.

Up to now, publishers have been limited in their choices for XML-based workflows by their composition systems. Desktop publishing systems that may be able to accept XML input do not round-trip the XML; that is, changes made during the correction cycle are not faithfully carried in the XML. This lack of native XML handling leads almost inevitably to inelegant and inefficient back-end conversion of XML. High-end composition systems that can handle XML natively are expensive and require highly trained operators. Even for composition systems that handle XML well, the PDF page proofs used for author and editor corrections divorce the underlying XML from the page being corrected. Marking up PDF page proofs is generally clumsy for all involved and is inefficient. Worse still, redundant keying of corrections is error prone (Meyers et al., 2004).

The solution to these limitations to single-source publishing is early creation of XML and use of a WYSIWYG XML editor coupled with rapid XML-to-PDF creation, with all components tied together through a production management system. In this workflow, the XML editor presentation resembles the familiar online presentation of published articles, providing authors with a comfortable working environment. Special editing tools, resembling the familiar formatting and editing tools of word processors, also aid the author and editors. The PDF page proof is really an offshoot of the XML, becoming a reference point for work done in the XML editor, providing conformation to the author and editor that all is well with the "page" itself.

At Dartmouth Journal Services (DJS), we have created this workflow with an eye not only to efficiency but also to a successful author and editor experience. ArticleExpress (patent pending) uses a combination of existing third-party tools with new proprietary components, all managed through our internal production tracking system, Artemis. The technical challenges of one small portion of this entire system, development of a track changes system used in the XML editor, was already presented at JATS Con 2013/2014 (O'Connor et al., 2013). Here we describe the various working parts of ArticleExpress and how it functions as a system capable of integrating with outside production management systems.

Early XML Creation

ArticleExpress begins with the author's article submission in Microsoft Word. We use eXtyles® to structure and clean up manuscripts and run various advanced processes in preparation for copyediting and XML output. DJS and The Sheridan Group, of which DJS is a member, has had a long relationship with Inera in the development and implementation of eXtyles. As the first stage in our single-source workflow, MS Word and eXtyles allow us to provide copy editors a stable platform for editing content with a good deal of the heavy lifting done for them with no real knowledge of XML needed. That said, editors still need to adhere to strict rules for handling content, because the Word paragraph and character styles added during eXtyles processing form the basis of the structure of the XML file when it is exported. Because many of our copy editors do not have direct access to eXtyles, we provide them with Visual Basic for Applications–based copyediting tools to ensure manuscripts are fully compliant with our systems. For example, we limit insertion of characters to those that can be mapped for output. Each editor, whether our freelancer or a member of a customer's staff, is trained to work on manuscripts to our standards.

Post-copyediting, eXtyles is used to create the XML file from the edited manuscript using its export functionality. Because the export is highly configurable, we can create XML not only valid to various DTDs but also to the specific electronic deliverable format requirements of any journal. Further, we have developed Schematron rules, again, configurable per journal, to further validate our XML output. This Schematron validation not only helps in the enforcement of editorial style and business rules, it is also critical to ensure that the XML is in a format that can be handled well by downstream automated systems. Once the XML file is in hand, it progresses through the remaining systems. There is no need, or desire, to make changes in the Word file that must be re-exported.

The eXtyles Server Implementation (SI) is critical to the success of this system. DJS first implemented SI workflows in 2010. Because we support more than 150 journals, the ability to create a flexible system was essential to successful implementation. We engineered into the system the ability to create customized manifest XML files to drive SI processing as well as notification and file movement solutions. After a user does an initial prep of a manuscript (removing figure graphics, converting tables to MS Word and styling paragraphs), all the eXtyles processing is done by SI.

XML Proofing/Editing Cycle

The centerpiece of the ArticleExpress workflow is the web-based, WYSIWYG XML editor. After the copyediting stage, all corrections to content are made directly in the XML by the person whose current task is to review the proofs. Changes made by authors, for example, do not need to be interpreted by editors and then conveyed to typesetters who then pass the corrected proof back to the editor to review. Instead, editors review the changes that the author has made and, in the same file, accept, reject, or modify them. Making changes directly in a single XML file removes the inefficiency and inaccuracy caused by interpretation and rekeying.

An XML-based proofing and editing system must embody three core virtues in order to be successful: First, it must be easy and intuitive to use, because the majority of the users, authors, will be encountering the system at the same moment they are expected to use it to review and correct their proofs. Second, it must have a robust change-tracking mechanism, so that editors have a record of every change made throughout the correction cycle. Last, it must be highly configurable, so that the system can handle XML to fit the requirements of various online hosts while conforming to the editorial style of any particular journal.

To make the system easy and intuitive to use, we started by basing the editor on SDL Xopus. The vast majority of changes that any user will make are simple insertions, deletions, and formatting changes. For these, Xopus presents an interface very much like the word processor the author was likely to have used when writing her paper originally. We then developed other functionality to fit the robust model we need in our publishing environment. More complex XML structures are edited through forms that appear when a user attempts to edit the associated text. The form uses natural language and controls to guide the user through the input of the information necessary to populate the elements and attributes in these structures. For example, instead of requiring a user to input the text of a reference citation and then hoping that they choose the right element and enter the correct rid attribute to connect to the corresponding reference, the system simply presents a list of references available to cite and then takes care of the structuring of the XML without further user interaction. As a bonus, the system only lists references that exist in the document, preventing the author from entering an ambiguous citation that the editor will have to follow up on.

An intuitive and easy-to-use interface would mean almost nothing if an editor did not have the ability to see exactly what changes the author made. In a presentation to JATS-Con last year, we covered the technical challenges that had to be overcome to achieve a comprehensive and useful change-tracking solution (O'Connor et al., 2013). In our change-tracking system, editors not only see the changes made by a paper's authors, they are given an interface through which they can easily accept or reject these changes. In addition, the system enforces a particular order of change acceptance/rejection to protect the underlying structure of the XML, even when changes have been made by a multitude of users.

As a publications services company, DJS must deliver JATS XML to online hosts with different tagging requirements for journals that have a wide range of editorial styles. For example, journal style may be to cite footnotes with letters, numbers, or symbols (which may appear in an order particular to the journal). Graphics may be cited as "Fig. 1", "Figure 1", "Fig 1", etc. The system must be configured not just to support a particular journal's style but also to enforce it. To this end, configuration files are stored in our production tracking system and delivered to the editing environment with each article that enters the system. These configuration files go beyond defining journal style to enforce business rules and role access to the editor's various tools.

PDF Proof Creation

As much as we may want to focus only on the XML content, we cannot escape having to create a PDF for online delivery and, yes, print. Contrary to our assumtions when we started this project, the expectations for the quality of the layout in a composed page have not fallen dramatically with the increasing dominence of online content delivery. For this reason, we could not avail ourselves of technologies, such as XSL-FO, that lack the typographical and layout sophistication of more traditional composition systems. Instead, we use Typefi Publish, an InDesign-based automated composition engine.

XML-based workflows that rely on InDesign have been described as a "rope of sand" (Imsieke, 2010), because it is difficult to get structured XML into InDesign and nearly impossible to get it out. Because InDesign is not an XML-validating editing environment, changes made in InDesign are difficult to translate back into the XML. Attempts to create systems to "round-trip" JATS XML through InDesign, which may have solved problems related to carrying insertions, deletions, and formatting changes through to the XML, have foundered on changes that alter the structure of the XML, such as adding sections or changing heading levels. In addition, any changes that require an update to the cross-references within the article, for example, addition of a reference into the middle of the list, are difficult for such a system to handle.

In our workflow, Typefi Publish solves the problem of getting XML in. It transforms the JATS XML into a layout-oriented Content XML that the Typefi engine uses to render the page in InDesign. Because we use an XML editor for all content changes though the correction cycle, we never need to get XML out. Instead, with each workflow step that involves changing the content, we can generate new pages.

This one-way trip into InDesign, however, presents a different challenge. The pages created at any particular workflow step need to be as perfect as possible, because any tweaking done to the layout in InDesign will be lost the next time the pages are generated. Typefi does a very good job of laying out pages, but there are complexities in journal articles that may present difficulties. Floating elements, for example, can vary greatly in size and even extend beyond a single page. These variations can in turn create difficuties with heading placement and column balancing.

To overcome these difficulties, we worked with the programmers at Typefi to take advantage of the wide array of scripting opportunities in InDesign. We have scripts that break long tables across pages, balance columns, and apply different master pages depending on the delivery target. In this way, we avoid having to fix the layout at any point during the correction cycle. However, because the file is in a fully functional composition system, we can fine-tune the layout just prior to delivery if we wish, an opportunity not available to users of strictly one-way composition systems like XSL-FO.

Workflow Backbone

DJS uses a proprietary production tracking system, Artemis, to integrate the ArticleExpress components. Artemis is workflow and task based. A workflow consists of various tasks and decision points, and, at the simplest level, when one task completes, the following task in the workflow opens.

Artemis does more than push tasks, though. It communicates with all the ArticleExpress systems as well as external systems and passes files and data to drive those systems. Files are stored on the cloud with virtual pointers in a File Manager. Files and data are received from external systems like peer review systems, workflows initiated, and information and files are then passed by Artemis to the other systems using Web service calls or various propietary XML notification protocols. Email notifications are also avilable.

Tasks can be automated or user driven. A typical ArticleExpress workflow would consist of the following tasks:

MS Prep -> EditExpress Processing -> EditExpress Processing QC -> Send Files to the Copy Editor -> Copyediting -> XML Creation -> Composition -> Compositon QC -> Author Proof -> Author Proof Review

Depending on the needs of the situation, the procesing tasks are all be automated, while the prep, QC and review tasks are done by users. If an article does not pass muster at any stage, the user has the ability to pass it back through the automated system.

Artemis was not engineered specifically for ArticleExperss, but its versatility has allowed it to adapt well to ArticleExpress. When offering ArticleExpress software as a service to customers, user interaction takes place entirely through the customer's production management system. From the customer perspective, development to interact with Artemis may be required; however, user training is minimized as users do not need to learn an entirely new tracking system. We don't at this time offer customers a way to interact directly with Artemis, though we have planned development of a customer portal to allow customers to upload and download files, access real-time information, and run customized reports.


Though there have been quite a few challenges in building the ArticleExpress system, as we roll out the system to our customers, we expect to reap many benefits as well. Foremost among these are efficiency and accuracy. Authors reviewing their proofs no longer indicate what change they wish to make; they simply make it. Editors no longer interpret the change, translate it into a format understandable to typesetters, and then review the typesetters' work to ensure that it was done correctly. They just accept or reject the change.

Authors also benefit by reviewing their proofs in an environment very much like the word processor they used to write their article in the first place. Authors using our legacy PDF proofing system have complained, "Why do I have to learn a new program [annotation in Acrobat] in order to review my proofs." In fact, a Web-based XML editing environment offers tools to aid the author that word processors do not. They can use search to find and insert references from PubMed. They can add links to databases like GenBank and immediately test whether they resolve correctly. They can share proof review responsibilities with their colleagues and see what changes their colleagues have made.

Such a system also helps publishers enforce journal style and business rules. A citation for a table that does not exist in the document cannot be added. References that are added or edited are automatically formatted per journal style. Authors must respond to all queries before being allowed to submit their proofs. Editing privileges can be restricted or extended based on role or workflow step. Such rules that were once handled through procedure are now handled automatically.

Using Typefi, with a healthy dose of scripting, offers "best of both worlds" benefits as well. Automated, XML-based composition technologies such as XSL-FO allow you to create pages without the intervention of an operator, but XSL-FO cannot reproduce the complex layouts that Typefi/InDesign can. As well, XSL-FO and other one-way systems offer very little flexibility when the resulting page turns out not quite right. Fixing problems in these PDFs requires much effort and/or skills that are not common. As good as the pages are that come out of Typefi, when a problem with layout does arise, it can be easily solved by a person with rather common InDesign skills.

Though we are realizing many benefits from the ArticleExpress system, its flexibility as well as its basis on our production tracking system, Artemis, provide a roadmap of potential improvements. In the near future, we will be adding automated graphic processing. Artemis will also allow us to develop a feature to package and deliver assets to online hosts. New tools can be added to the online proofing system, for example, integrating data validation through ORCID and FundRef or encouraging authors to apply taxonomic terms.

Centering a journal production workflow on JATS XML frees authors and editors to focus the most important consideration, the content.


  1. Imsieke, Gerrit. 2010. XML-first Workflows in InDesign: Ropes of Sand? Publishing Geekly. http:​//publishinggeekly​.com/2010/11/indesign-xml-rope-of-sand.
  2. Meyer, Carol Anne. 2008. Reference Accuracy: Best Practices for Making the Links. The Journal of Electronic Pubishing. 11(2); http://dx​​.3998/3336451.0011.206.
  3. Meyers, Barbara. 2004. Editing Tools that Help to Streamline the Publishing Process. Science Editor. 27(5); 155.
  4. O'Connor, Charleset al. 2013. Tracking Changes to JATS XML in an Online Proofing System. Journal Article Tag Suite Conference (JATS-Con) Proceedings 2013. http://www​.ncbi.nlm.nih​.gov/books/NBK159965/
  5. Schwarzman, Alexander al. 2004. XML-Centric Workflow Offers Benefits to Scholarly Publishers. XML 2004 Conference & Exhibition Proceedings, November 15–19, 2004. http://people​.ischool​​/202/Schwarzman-XML-CentricWorkflow​.pdf .
© O'Connor et al.

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

Bookshelf ID: NBK279927


  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...