An oXygen XML editor framework for checking JATS articles according to business rules and building journal issue
packages for Atypon Literatum is presented. In addition, re-use of the framework's unaltered Schematron, XProc, and
XSLT in the context of a Web-based service are presented. This demonstrates the power of the standards-based XML stack
in conjunction with diverse runtime platforms such as oXygen XML editor and le-tex transpect.
Introduction
Hogrefe Publishing is making their journal content available through the Atypon Literatum [1]
platform. In addition to the JATS DTD that the articles must conform to, there are additional constraints imposed,
some of them as additional DTDs (submission manifest, issue XML), some of them as documented naming and tagging
conventions. In addition, there are Hogrefe’s copy editing conventions, a list of journals and their proper abbreviations,
a list of permitted article types, etc.
Many of these conventions may be formalized as Schematron rules, both on the article and on the issue packaging level.
The packaging itself is an error-prone process that may be automated. This process involves checking of the content and
the ancillary files, copying of image files to their package locations, checking of presence, correct names, and file
types for the referenced images and article PDFs, creating an issue table of contents and finally zipping the whole
issue.
The author forked and extended Wendell Piez’s JATS framework [2] with these Schematron
rules and XProc pipelines and published the derived framework on Github [3]. The framework
has been successfully employed by typesetters who use oXygen. There are, however, typesetters who use a different XML
editor and don’t want to switch. They also had difficulties making the article Schematron run at all – it relies on
XPath 2 and XSLT 2 functions, access of external files, and other features that are well supported by oXygen and the
default ISO Schematron implementation, but not necessarily by other implementations. In addition, the messages need to
be rendered in a digestible form, so some XSLT rendering of the resulting SVRL is necessary. This is cumbersome to set
up for arbitrary heterogeneous editing/production environments.
The author suggested to make the framework’s Schematron and XProc available for use on Web
services [4], utilizing the open source
transpect framework with its HTML reports [5].
Framework Customization
Schematron Rules
There are currently 142 report and assert statements in the article
Schematron file. The effects of using
this framework in oXygen may be seen in
and .
Schematron validation in oXygen text mode
Schematron validation in oXygen author mode
It can be seen from the screenshots that there are different types of checks: DTD validation (the iss
element) and Schematron validation for enforcing naming, metadata, and styling conventions.
XProc Package Building, Consolidated Report
As stated above, the package building is implemented as an XProc pipeline that may be invoked from oXygen as a
transformation scenario (). It will process a manifest file and then will
collect the issue’s content from a folder that corresponds to the ext-id element. It will check
the directory structure and generate an issue ToC.
Invoking the package building transformation scenario
It will also give a summary of all checks for the articles,
differentiated by severity (fatal-error, error, warning, info), in an HTML page (). If an article is not DTD-valid, it won’t be included but flagged as an error. The user then needs
to go back to oXygen an react upon the interactive validation messages.
The summary report created by the oXygen transformation scenario
In the absence of fatal errors, the HTML page
will contain a link to the package in the file system.
Putting the Framework on the Web
Although it might be desirable to have a full validating (including custom Schematron) JATS editor on the
Web, the premise on which this Web interface was build is as follows:
Typesetters continue using their preferred desktop XML editor. They zip their files and upload them to a service
that checks them. In principle, a summary report of will be sufficient
if the DTD checks will be performed by their editors (which all editors that they use support).
However, they’d need to be able to copy the error’s XPath from the report and navigate to the corresponding location
in their document. This is a feature that not every XML editor supports.
So we wanted to use transpect’s HTML reports that permit to display the messages in an HTML rendering of the
input, at the error locations. transpect HTML reports may consolidate validation messages of multiple intermediate steps,
or, in our case, schema and Schematron validations.
In order to use schema validation, the DTD had to be converted to Relax NG (which is straightforward) because transpect
is only able to render Relax NG and Schematron validation output back into the original document or a rendering thereof
(cf. [5]).
Github Project
The transpect project resides in a git repository that mostly consists of submodule specifications.
Most of the functionality is contained in xpl/process-manifest-transpect.xpl that is
a front end to the original framework pipeline, build-issue/process-manifest.xpl. This original
pipeline had to be tweaked a bit for transpect use: It now has an option to read also DTD-invalid files and to automatically
patch location information into the source files and to include this information in the Schematron messages, so
that the messages may be attached to the error locations. The oXygen framework now uses a different front end pipeline
to this common pipeline. The necessary changes to the pipeline have been quite moderate,
and there is no duplicated code in the two git repositories.
HTML Reports
Maintaining the location information (the @srcpath attributes) when rendering JATS to
HTML turned out to be a problem with the framework’s bundled jats-html.xsl XSLT. The
XSLT simply is not designed with extensibility in mind. A major rework of these 3rd-party stylesheets would
be necessary to be able to pass @srcpath attributes from the patched JATS to the HTML rendering.
Therefore we used transpect’s jats2html
renderer whose default output looks less fancy but that is more customizable.
The consolidated report may be seen in . Please note that this
is a rendering of the whole package, including directory structure and ancillary files, not just of the articles.
The full HTML report, with Relax NG errors and Schematron errors/warnings attached to their locations
in a rendering of the package contents
Upload Interface
The transpect front-end pipeline may be invoked on the command line, using the XML Calabash runtime that is
bundled with the transpect project as a submodule.
There is a simple Web GUI and WebDAV upload interface that is in use with many transpect projects (and that
will be open-sourced soon, too). This Rails application is already in use at Hogrefe for other conversion
pipelines (IDML→BITS→EPUB, BITS→docx), so it was quite easy to add this conversion pipeline to that server.
The upload interface may be seen in .
transpect upload interface
Conclusion/Outlook
It has been demonstrated that an oXygen framework that uses standard technologies such as schema/Schematron validation,
XSLT and XProc may be ported to transpect to yield something with similar functionality (apart from interactive editing)
on the Web, with moderate effort.
Currently the oXygen framework is being enhanced with Schematron Quick Fixes (SQF, [6]) for
more ease of use, i.e., correction suggestions. It will be desirable to port SQF also to transpect’s HTML reports. The
form elements (text entry fields, acceptance buttons, drop-down lists) may easily be rendered in the HTML report, and
the users’ choices may be posted to another pipeline that patches the changes into the source files. At least for a
single-step XML conversion as seen in the current application, applying the changes to the input should be not too
complicated.
In summary, typesetters deprived of running Schematron checks, accessing XPath locations, or running XProc pipelines,
may use transpect’s Web interface to check and package their journal production work a for friction-reduced upload to
Atypon Literatum.