NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2011.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011 [Internet].

Show details

Annotum: An open-source authoring and publishing platform based on WordPress

.

Author Information

.

Solvitor, LLC

The process of authoring, reviewing, and publishing scholarly articles remains an expensive, time-consuming process that can require significant up-front investment and technical expertise.  Coupled with lengthy review processes this can create delays of up to a year before new scientific findings are published.   Annotum, a new, open-source, open-access authoring publishing tool based on the WordPress content management system, builds on the earlier work of the Public Library of Science's Currents publication and provides an easy-to use alternative to existing publishing systems that supports very rapid expert review and professional online publishing.

Introduction

Despite significant advances in most forms of publishing, from blogs to news sites and other user-generated web content, the process of authoring scholarly articles remains an expensive, time-consuming process that can require significant up-front investment and technical expertise.  While a number of electronic publishing and workflow management systems exist, those intended for the scientific publishing community  provide at best only rudimentary authoring tools—and in many cases simply provide a repository for document files created in other formats.  It is as if the entire revolution in online, web-based content authoring tools has passed by the scientific publishing community.  And despite the development of advanced document formats such as the National Library of Medicine's (NLM) Journal Article Tag Suite (JATS), virtually no current system allows scientific authors to easily create structured XML documents using simple, web-based tools.

Project Background

The inspiration for Annotum comes from the extensive work of the Public Library of Science (PLoS), who, in conjunction with Google, launched PLoS: Currents Influenza in 2009.  In the words of Harold Varmis:

The key goal of PLoS: Currents is to accelerate scientific discovery by allowing researchers to share their latest findings and ideas immediately with the world's scientific and medical communities.

PLoS: Currents incorporated some key publication elements that are central to achieving its goal of rapidly disseminating scientific knowledge.  Articles must be date-stamped and citable, reviewed by expert researchers, released as open-access, and archived in a public repository such as PubMed Central (PMC).  Again, Varmis:

To enable contributions to PLoS: Currents: Influenza to be shared as rapidly as possible, they will not be subject to in-depth peer review; however, unsuitable submissions will be screened out by a board of expert moderators.

Thus, a key driver for creating PLoS: Currents was to speed up the process of disseminating new science.  Consider the following schematic:

Fig. 1. Scientific Review Models.

Fig. 1Scientific Review Models.

The review process for The PLoS: Currents model enables very rapid review and publishing of scientific research results. (Diagram Courtesy Mark Patterson, PLoS)

By reducing the time for review from up to one year to as little as one day, Currents articles are able to make new findings available very quickly.

Google Knol, an article authoring, collaboration and hosting service, was selected as the platform for PLoS: Currents.  Knol provides a web-based authoring environment with a rather rich toolset including text formatting, tables, figures, and mathematical equations, and an extensive set of collaboration tools.  Knols can be edited by multiple authors, incorporate both author and reader comments, and include a rating system, all important features for authoring scholarly articles.  Moreover, Knol's moderated collection feature allows a designated editor, in effect the lead reviewer, to manage a simple expert review process whereby experts in a given field are invited to review each submission, providing comments and a rating.  The editor then makes a final decision, and the article is either accepted and published, sent back to the author(s) for revision, or rejected.  Published articles appear immediately online, and via an arrangement with PubMed Central are quickly imported and made available within that repository.

Following the success of PLoS: Currents Influenza, several additional sections (moderated collections) have been launched under the PLoS: Currents brand, including sections for Huntington's Disease, Tree of Life, and Evidence on Genomic Tests, with additional sections in the planning stages.  In all several hundred articles have been published in the two years since the PLoS: Currents launch.

However, with this rich feature set, the Knol platform also brought a number of limitations.

Firstly, DTD/Structural conformance has been a challenge.  Although Google generously enhanced the Knol feature set to include an XML output that was loosely compatible with JATS, the lack of control on the authoring side meant that quite a bit of poorly-structured content made it into the output and had to be removed, sometimes via hand-editing the exported XML.

For example, when using a web-based tool, or Microsoft Word, for that matter, authors may select a heading style for headings, or they may simply make the heading text boldface with a larger font size.  To the eye, or in print, headings marked in this disparate fashion may appear exactly the same, but for tagging purposes the headings not explicitly tagged as such will not show up in a table of contents or other summary created based on tags.  Beyond the limitations of Knol's XML output, Knol has no provision at all for importing articles; all articles must be entered via web-based editing tools.  This makes it difficult for authors to compose and edit offline; if authors use another tool such as Microsoft Word to create articles, the formatting and structure can be quite messy when the article text is pasted into Knol's editor, further exacerbating the tag conformance issue.

And it was not only the XML output (and lack of XML input) that raised issues. The Knol platform itself had significant limitations in terms of online presentation and customization.  Unlike a web site or content management system (CMS) that is under the control of the publishing entity, Knol is set up as a central service. Users cannot alter the basic design, beyond adding a logo, and the feature set (everything from how author and editor information is used and displayed to the design and arrangement of content) is relatively fixed. Adding new features, such as a nicely-formatted PDF or print output, more robust reference and citation handling, or modification of the existing review and publishing workflow, were simply not possible on the Google-hosted Knol platform.

Given the very real benefits of the Knol platform in enabling the publication of PLoS: Currents, but also the limitations outlined above, Google generously decided to fund a successor system that would continue to facilitate the rapid review  and publishing of web-based scientific publications while also addressing some of the Knol system's limitations.  That project, and the product it was to produce, is called Annotum (from the Latin for an individual annotation).

Project Objectives

The overall objective of Annotum is to develop a simple, robust, easy-to-use authoring system to create and edit scholarly articles using JATS, and to deliver a working, functional system that can be used to create, maintain, and publish scholarly articles.

To meet this objective, Annotum must provide the following capabilities:

  • Allow publication owners to produce peer-reviewed journals online
  • Allow authors collaboratively to create content in the NLM DTD
  • Replicate the PLoS: Currents authoring and workflow including export to PubMed Central
  • Address key limitations of the Knol toolset:
    • DTD conformance (enforcement) and XML import/export
    • Support for rich designs and additional output formats
  • Provide flexible hosting options via open source code
    • Publications should be inexpensive (or free) to host/export
    • System should require minimal technical skills to install, configure, operate, and maintain.

It is worth noting that the scope of Annotum version 1.0 was very explicitly limited: it is not intended to replace all print/online journals, tools, and/or, systems.  Annotum is an incremental step, but not the ultimate one, in the evolution of scientific publishing systems.

Project Approach

When considering the goals for Annotum, an obvious question arises: why not use an existing product? After all, a number of products seem to meet many of the capabilities outlined above, from full-blown CMS to standalone applications and addins for popular word processing applications. For example, the Public Knowledge Project's (PKP) Open Journal System (OJS) and Phase2 Technology's OpenPublish provide complete online publishing systems including workflow; many existing scientific and other publications use these systems in production and have done so for years. Some standalone product offerings are also quite robust, such as Inera's eXtyles or the Microsoft Word Article Authoring add-in — these focus on the creation of JATS-compliant XML from word processing documents. Hybrids such as PKP's Lemon8-XML are set up as hosted systems and manage the conversion of different document formats into JATS. Outside of the scientific publishing sphere, the open source WordPress software provides a comprehensive CMS with extremely simple setup and free (on the hosted version) or very inexpensive hosting options.

One option that was not considered in depth for Annotum was to create a completely new system from scratch. There are too many potential existing systems that provide many of the desired features for version 1.0 to make a new application the best choice, simply from a cost and time perspective. Therefore, in devising an approach to meeting the version 1.0 goals, the Annotum team focused on which existing tools could provide the best starting point for further customization.

One approach would be to take the existing stand-alone applications and attempt to merge them with an online tool. However, systems such as eXtyles are both proprietary and expensive; even the free options such as the MS Word Article Authoring addin do not really lend themselves to integration with a publishing system, and furthermore rarely support multiple platforms (PC, Mac, Linux). Even if we were to select an existing stand-alone tool to meet some of the Annotum version 1.0 requirements, a hosted publication CMS would still be required.

So we took a look at a number of hosted systems. Lemon8-XML does provide JATS XML compliance, but it has too few of the other required features to make a feasible starting point. This leaves the hosted site-publication CMS options: OJS, OpenPublish, and WordPress. All of these platforms have limitations; OpenPublish and OJS both provide options for publications out of the box—subscription models and highly-configurable workflow options for example. But they also tend to be rather complicated systems to set up and maintain. Upgrades for Drupal (the platform on which OpenPublish is based) are time-consuming and complex; maintenance and operation of OJS usually requires a full-time techician, making it beyond the reach of a small research group. Other limitations surfaced: WordPress has no workflow built-in; conversely OJS and OpenPublish do have a very robust—but perhaps overly complex -- workflow functionality, well beyond what is required for Annotum. Finally, OJS is more of a document handling system with no provision for web-based editing of articles, a key Annotum requirement.

After considering the limitations of the three options considered in depth, we narrowed our scope to OJS and WordPress. Both are open source, both are extensible via a plugin architecture, and both support customized templates (themes) for presentation. In the final analysis OJS' complex maintenance needs, overly complex workflow system, and lack of the core web-based editing capability, led us to select WordPress as the basis for Annotum version 1.0.

WordPress is extremely simple to set up and run, with numerous free or inexpensive hosting options available, and it comes with a rich set of user-friendly web-based editing controls. And WordPress functionality is easily extended using plugins and themes. One could argue that OJS is also extensible, but the WordPress platform has spawned a far more diverse and productive 'ecosystem' of developers for themes, plugins, and extensions, meaning a much larger set of options for adding functionality to Annotum in the future. And finally, the setup and operation of a WordPress site is among the simplest of any web-based application. On WordPress.com, for example, users with an account need only provide a name for their site to have a fully-functioning site available in seconds.

WordPress, whether in the hosted (WordPress.com) service or the open-source and freely downloadable (WordPress.org) software package, has seen extremely wide adoption for a very broad range of web sites:

  • 14.7% of the top million websites worldwide  (State of the Word, 2011);
  • 55.9 million sites run WordPress software (Stats—WordPress.com);
  • Over 290 million people view more than 2.5 billion pages per month on WordPress.com (Stats—WordPress.com)
Of course, setting up a journal authoring, review, and publishing system isn't a popularity contest—but it is important to recognize the real benefits of using a platform with a very large user and developer/designer base. Because so many people work on WordPress, journal publishers using a WordPress-based CMS have literally thousands of existing development and design shops, and themes and plugins, from which to choose.

Despite these many advantages, for the Annotum project WordPress is missing some key requirements:

  • Support for multiple authors, article review workflow, and version comparison
  • Scholarly features such as citations, equations, and controlled document structure (headings, lists of figures/equations/tables)
  • Export to and import from the NLM/PubMed Journal Article DTD and other structured formats
Thus development of Annotum version 1.0 was focused on providing these additional features (shown in more detail below).

Annotum v1.0 Feature set

  • Rich, web-based authoring and editing:
    • "What you mean is what you get" (WYMIWYG) authoring with rich toolset (equations, figures, tables, citations and references)
    • coauthoring, comments, version tracking, and revision comparisons
  • Strict conformance to the NLM Journal Article DTD
  • Multiple import and export formats:
    • Export to PDF, JATS XML, WXR (WordPress eXtended RSS) formats
    • Import JATS XML format for "round-tripping" of content
    • Articles can be cited, exported, imported across systems/sites
  • Simple editorial workflow for authoring and reviewer/editor approval

Annotum is provided as open-source software—all software code and other materials will be made available to the open-source community for use and future enhancement or development.  More information about Annotum can be found at [http://annotum.wordpress.com, and the source code is available on GitHub [git://github.com/Annotum/Annotum.git].

Results

As of this writing (September 2011), Annotum version 1.0 is nearing the end of its software development phase and about to enter the initial beta test period.  Annotum version 1.0 is scheduled to be released in Fall 2011, both as a separate WordPress theme available for installation on self-hosted (.org) WordPress sites and, thanks again to the generous contributions of both Automattic, Inc. and Google, as a free theme on the WordPress.com hosted service.  This means that anyone will be able to create a new journal with all of the features listed in this paper at zero cost with very little if any technical expertise required.

The following section describes some of how Annotum is implemented, and is followed by a brief walk-through of key Annotum features and setup.

Annotum Implementation

Annotum is provided as a single WordPress theme, including its own plugins, templates, and custom code.  Although some features are already available as stand-alone plugins, a key design philosophy of Annotum was to keep the installation as simple as possible.  WordPress plugins and themes can at times conflict; the Annotum team sought to reduce the chance of such conflicts by having a single theme contain all of the features to be delivered.

The Annotum theme is based on the Carrington theme engine, a CMS framework provided by Crowd Favorite Ltd., who also provided software engineering resources for Annotum.  Carrington provides an elegant framework for creating sophisticated WordPress themes, and supports multiple child themes for sites with multiple publications (for example, a professional society with multiple journals or sections, as with PLoS:  Currents).  In Annotum the Carrington engine is enhanced with a workflow and permissions engine, along with a custom post type ("article") that supports the additional requirements.  An enhanced editor, based on the TinyMCE editor that comes with the base WordPress package and implemented as a series of TinyMCE plugins, rounds out the package.

It is perhaps difficult to overstate the challenge with adding JATS compliance to the editing component. Many tools have attempted to provide a WYSIWYG environment for creating structured content, with varying degrees of success. The approach in Annotum is to provide a very basic and simple set of formatting options, and rigorously strip from the content any tags or other elements that are not compliant. This entails some overhead for the author, particularly if she has spent time laboriously crafting a document in Microsoft Word and, for example, formatted all of her headings using sized fonts rather than a heading style. Once pasted into Annotum the font sizes are stripped out. Only by ensuring structure conformance at authoring time can we ensure that all text in the system is compliant with the underlying schema. In the case of Annotum, the schema used is a subset of JATS called Kipling.

Enforcing this XML conformance and still retaining both appropriate web formatting for the published pages along with a true WYMIWYG display at editing time was one of the more challenging tasks facing the development team. It is much easier to display article content (either in the editor or as a previewed or published web page) when it is kept in an HTML format, but at the same time the XML format must be retained for use in exporting and for validation.  Annotum resolves these divergent goals by storing both the 'filtered' XML content and the 'unfiltered' HTML content.  The 'normal' post content location, the post_content field in the wp_posts table, is used for the unfiltered content, while the XML version is stored in the post_content_filtered field.  The figure below shows a comparison of the unfiltered and filtered content for a sample article section containing text with a heading and a table.

Fig. 2. Comparison of unfiltered (HTML) and filtered (XML) content stored by Annotum.

Fig. 2Comparison of unfiltered (HTML) and filtered (XML) content stored by Annotum.

Annotum Feature Walkthrough and Demo

[This is a screen-shot version of the Annotum live demo.]

The basic workflow in Annotum is as follows:

  • Authors must log in to reach the Annotum dashboard, then either open an existing article or create a new one.
    Fig. 3. Admin Dashboard.

    Fig. 3Admin Dashboard

    Fig. 4. Article Listing.

    Fig. 4Article Listing

  • Once they arrive at the article editing screen, authors invite any coauthors with whom they may wish to collaborate.
    Fig. 5. Main Article Editing Screen.

    Fig. 5Main Article Editing Screen

    Fig. 6. Adding Co-authors.

    Fig. 6Adding Co-authors

    Fig. 7. The editor can be expanded to fill the screen.

    Fig. 7The editor can be expanded to fill the screen

  • The editor provides a number of features. For example, authors may:
    • Insert and format text
      Fig. 8. Text Formatting.

      Fig. 8Text Formatting

      Note how section indicators show which text is a section, title, or paragraph.

    • Add formatted tables
      Fig. 9. Inserting a table.

      Fig. 9Inserting a table

      Fig. 10. Table in the article editor.

      Fig. 10Table in the article editor

    • Add figures
      Fig. 11. Inserting a Figure.

      Fig. 11Inserting a Figure

      The image dialog includes a number of options for capturing structured text information about the figure.

      Fig. 12. Editor with inserted figure.

      Fig. 12Editor with inserted figure

    • Add equations and quotations
      Fig. 13. Inserting an equation.

      Fig. 13Inserting an equation

      Fig. 14. Inserting a quotation.

      Fig. 14Inserting a quotation

    • Create a reference list and insert references:
      Fig. 15. Create a new reference.

      Fig. 15Create a new reference

      Fig. 16. DOI reference lookup.

      Fig. 16DOI reference lookup

      Fig. 17. PubMed ID reference lookup.

      Fig. 17PubMed ID reference lookup

      Fig. 18. Inserting references.

      Fig. 18Inserting references

      Fig. 19. References in editor and reference list.

      Fig. 19References in editor and reference list

    • Among the collaboration features of Annotum is support for internal discussion comments
      Fig. 20. Authoring comments.

      Fig. 20Authoring comments

      All authors have the use of a back-end commenting feature by which they can exchange non-public comments with each other and with the editor.

      Editors, reviewers, and authors also have the ability to view a log of work on the article
      Fig. 21. Audit log.

      Fig. 21Audit log

      and to compare article revisions.
      Fig. 22. Version compare.

      Fig. 22Version compare

  • Once editing is complete, authors submit their article for review. The submission process is a single click:
    Fig. 23. Submit article for review.

    Fig. 23Submit article for review

    .
    Different users (authors, editors, etc) see a different view of the article status pane depending on their permission level:
    Fig. 24. Submitted: author view.

    Fig. 24Submitted: author view

  • Once the article is submitted, Annotum then optionally notifies the editor (if email notification is disabled, the editor may simply monitor the article queue for new submissions), and the editor assigns one or more reviewers to the task.
    Fig. 25. Submitted: editor view and adding reviewers.

    Fig. 25Submitted: editor view and adding reviewers

  • Next, reviewers sign in, read the article, and enter comments if desired. Reviewers have a separate, private comment area as well, one not visible to Authors.  Optionally, site administrators can enable a form of open-process review in which review comments (and the identity of each reviewer) is visible to the authors.
    Fig. 26. Reviewer comments.

    Fig. 26Reviewer comments

    After the reviewer enters any relevant comments, which might be a question back to the editor (in effect the lead reviewer), he or she makes a recommendation: Approve, Reject, or Request Revisions. 
    Fig. 27. Reviewer comments.

    Fig. 27Reviewer comments

  • When all the reviews are submitted (or whatever portion is sufficient according to the editorial policy of the publication) the editor makes a final ruling on the article, which again can be Approve, Reject, or Request Revisions.
    Fig. 28. Approved.

    Fig. 28Approved

    Once approved, the article is ready for final copyediting and publishing by a site admin (editors cannot publish); if revisions are requested the article returns to a draft status for further editing by the author(s).  And if the article is rejected it is removed from the publication queue.  In all cases notification is sent to the authors and editors of the final decision.  The publication staff (admins) makes whatever final tweaks are required, and publishes the articles live on the public-facing web site.
    Fig. 29. Published Article.

    Fig. 29Published Article

    Fig. 30. Article citation.

    Fig. 30Article citation

    Fig. 31. Published article - sample XML.

    Fig. 31Published article - sample XML

    Annotum XML is based on the Kipling subset of the JATS, and will validate via the PMC XML Validator
  • If the publication has made arrangements with PMC for inclusion of published work in the repository, PMC will monitor that publication's Annotum RSS feed for newly-published articles.  When a new article is available, PMC will request the XML version and import it directly into PMC for publication there.

Conclusions

Annotum significantly reduces the barriers to entry for new, scientific journals built on the type of rapid-review process pioneered by PLoS: Currents.  It is hoped that by providing both free software and free hosting, a vibrant open-source community will develop around Annotum, whereby scholars and others can contribute new features and foster the spread and improvement of this publishing tool.

Where can this growth lead?  Given the ability of Annotum to both import and export NLM-JATS tagged content, it is possible to envision a number of compelling use cases from individual groups of interested parties self-publishing journals to an entire ecosystem of content re-use and republication across professional societies, individuals, universities, and public knowledge repositories such as PMC.

Fig. 32. Local Collaboration Use-case.

Fig. 32Local Collaboration Use-case

A group of collaborators use a local Annotum installation to author and collaborate on a series of articles, which are then published on the web or printed (PDF) for distribution to their friends and colleagues.

Fig. 33. eJournal Publishing Use-case.

Fig. 33eJournal Publishing Use-case

An online journal or university site accepts submissions via XML import or content authored on its in-house Annotum system. The journal can use Annotum editorial workflow features, or those of any existing system they prefer. Approved, reviewed articles are published to the web and/or exported in XML format to a public repository such as NLM’s PubMed Central.

Fig. 34. System Vision—Annotum "ecosystem".

Fig. 34System Vision—Annotum "ecosystem"

A somewhat more expansive system vision with multiple publicly- and privately-hosted Annotum systems, collection of articles into journals and textbooks, and other scenarios.

For more information about Annotum, please visit the Annotum home page, download the code from the (Github repository), participate in discussions and get support via the (Annotum discussion group), or follow @annotum on Twitter.

The author may be reached via solvitor.com or directly at moc.rotivlos@lrac.

Acknowledgments

Annotum is a production of Solvitor LLC with heavy lifting provided by Crowd Favorite, and special thanks to: Google, PLoS, NIH/NLM/NCBI, and Automattic.

Copyright © 2011 by Solvitor LLC.

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License

Bookshelf ID: NBK63828

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...