NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet].

Show details

Fitting the Journal Publishing 3.0 Preview Stylesheets to Your Needs: Capabilities and Customizations

.

Author Information

.

Mulberry Technologies, Inc.

An introduction to the NCBI/NLM Journal Publishing 3.0 Preview XSLT stylesheets, which provide for basic styled display of Journal Publishing 3.0 data, in HTML and PDF, with an emphasis on features enabling extension and customization. With demonstrations.

Stylesheets for preview: defining the problem

When we set out to design and implement stylesheets to present JATS "blue" (publishing) data for preview, we immediately faced the problem of defining the problem: not simply how a preview is to look, but what a preview is, what a preview stylesheet has to do. This may seem simple, and in basic outline it is; moreover we simplified our design problem further by deciding that the basic look-and-feel of the earlier iteration of JATS preview stylesheets, for version 2.3, did not require overhauling, but was acceptable enough, while there were significant improvements possible to make them easier to use and customize. In fact, if the basic appearance of the results of a preview stylesheet is not much more than legible -- if it is not particularly pretty -- that actually has advantages, since our intention was actually to fall definitively short of the line where a publisher might see these results and say "this looks nice, let's use it". Whatever a preview stylesheet is, it isn't a production stylesheet, by which I mean a stylesheet capable of rendering your JATS data nicely enough to appear creditable to a reading, possibly a paying audience, who expects things to look professional. And even apart from production values as such -- the level of "polish" or "finish" that is appropriate -- a publisher naturally has concern for house style, look-and-feel, customized functionalities, branding, and so forth -- all things that are not the proper task of a freely distributed toolkit and were therefore out of scope for this task.

Yet even apart from these issues, there will necessarily be variances, sometimes considerable ones, between different approaches to tagging JATS data in detail. A preview stylesheet for general use needs to be as permissive of this variation as possible, avoiding situations where by catering to one organization's house style (or XML developer's notion of how things should be done), it becomes useless for another's. Similarly, it isn't practical for its coverage of JATS tagging in all its possible permutations to be absolutely complete. Part of the strength of the NLM/NISO Journal Article tag set, after all is in the allowances it makes for local practice; so rather than completeness, we considered that a good preview stylesheet will be transparent, in the sense that it will not make assumptions about the "right way" to do anything which the tag set itself does not mandate. Then too, while designing a production stylesheet was not in scope, we were aware that a good preview stylesheet, which rendered the semantics of Journal Publishing correctly and faithfully to the extent possible, should take any reasonable application of the tagging and display it in a reasonable way. Accordingly, we felt it important that we aim for an 80/20. We would handle normal things normally, and the logic of the stylesheet in preview would make minimal assumptions about tagging semantics -- ideally none, beyond the semantics asserted for tags by the Blue document model and its documentation.

A further complication we encountered had to do with the best strategy for handling error conditions. What is a preview stylesheet to do when things are not tagged properly? For production stylesheets, there are basically two possible approaches, each one suited to a different scenario for document authoring, editing and maintenance. When it is intended to work on arbitrary JATS data from different sources, with minimal control over the details of tag usage (and by this I mean questions such as whether labels are present on figures and tables, or whether xref elements are expected to have their own content or to acquire it from the target of the cross-reference), a production stylesheet has to do its best to make things look good, while letting nothing look bad and not distressing the user with error messages. On the other hand, in a system with good controls over tagging and tag usage -- since, over and above schema validation it is possible and arguably essential to assert other normative practices as well -- a production stylesheet can forego strenuous exception handling and error reporting, on the assumption that since rules are being followed and errors have been found and fixed, such problems will simply not occur; thus it achieves, at relatively low cost, a much higher degree of consistency and correspondingly better production values. In contrast to both these strategies, a preview stylesheet falls somewhere in the middle. It should do its best to display arbitrary inputs, since it is intended to be useful "out of the box" to the widest range of users. That is, it should follow the "garbage in, garbage out" rule. But when things are obviously awry -- and in particular, when bad tagging will result in problems that are hard to spot (such as missing data), it is probably better for it to show some sort of indication of the problem rather than to simply fail silently. While a preview stylesheet is not quite a "quality assurance" transformation in the sense that it performs validation over its input, in some respects, it needs to be prepared to behave like one.

Finally, and perhaps foremost, we wanted to support extension and customization by users to bridge the gap between what we offered and what they needed. In turn, this meant we had to aim for the widest possible range of processing architectures (since a customization capability would be useless to users who couldn't use the stylesheets at all). In effect, this sets up a dilemma: the preview stylesheets both have to be general, and also have the capability of local customization and enhancement. And this is complicated because customizations themselves vary. Some are "inward-looking", in the sense that they fit processing to local tagging practice; others are "outward-looking", in the sense that they aren't designed to alter or extend the semantics of the transformation, but only to alter the appearance of the output. Some customizations are harder to achieve than others. Some are going to be more common or popular than others. And finally, some particular kinds of customization will be much easier to accomplish in XSLT 2.0 than XSLT 1.0 -- at the cost of limiting which XSLT processors can be used to apply it.

In outline, this means that customizations fall across a spectrum from easy to hard, while in development we also needed to be concerned with the fact that some customizations would be popular and common, while others would be very peculiar and local. We needed to cater especially to customizations we expect to be common, while not making it more difficult than necessary to do those that are hard.

To balance all these concerns, we decided to aim for preview stylesheets (both for HTML and XSL-FO) with the following characteristics:

  • Basic preview stylesheets creating HTML (for browser view) and XSL-FO (for PDF production using an XSL formatter). At least the HTML preview stylesheet should be in XSLT 1.0 so it can be used directly on XML documents in web browsers (on either a local network or the Internet). Both of these stylesheets should support local extension and modification, in order to be adaptable not only to particular publishers' profiles (tag subsets, supersets, local practices) of JATS, but also to their specific requirements for formatting and display, and to whatever extent possible, within an XSLT 1.0 framework.
  • Yet we also wanted to demonstrate how more difficult customizations could be accomplished, even if they weren't expected to be common. A good example of this would be logic to punctuate bibliographic references in PMC format. Not only is this a fairly difficult transformation technically (in fact, for PMC format we gratefully borrowed code from NCBI rather than write our own from scratch), but also, another organization might not want it at all -- they want APA format, while a third wants MLA or CMS (Chicago Manual of Style), and a fourth wants no logic at all, since punctuated references are provided by other means.
A good rule of thumb to follow in programming as in life is to separate problems, when they become too complex to manage, into parts. Following that principle, we determined that we could best meet the combined need for capability and flexibility by complementing the basic XSLT 1.0 customization mechanism with a pipeline architecuture. A basic XSLT 1.0 stylesheet would provide for preview of JATS data in web browsers; similarly an XSLT 1.0 stylesheet would be available for producing PDF previews using an XSL-FO engine supporting XSLT 1.0.* But these basic stylesheets could also be augmented in a framework supporting XSLT 2.0 to do more complex operations, using separate processes that could be applied either before or after the main transformation, as needed.

Architectures for customization

Probably the commonest way developers approach customization is what might be called "monolithic": the developer simply intervenes directly in the code and changes it to do what is necessary. While it is common -- and for understandable reasons, if only because it is frequently expedient, or seems to be -- this isn't generally regarded as the right way to do it. Its main disadvantage is in maintenance: a developer whose customizations are made directly in the code of the base stylesheet is unable to use a new version, when one is released to fix bugs or provide enhancements, without significant effort to sort things out and reproduce all the changes in it. A more robust customization architecture allows the code for customizations to be kept entirely separately from the main stylesheet, which greatly eases forward migration.

XSLT supports a mechanism, xsl:import, by which stylesheets can be modularized by dividing into "higher-level" and "lower-level" components. A stylesheet at a higher level imports logic from one or more stylesheets at lower levels (you can have as many tiers as you need). The lower-level stylesheets give whatever logic is "generic" and/or "fallback"; then the higher-level stylesheet can either use this logic as given (simply by silently accepting it), or override it with its own customized processing. In keeping with this metaphor, it is convenient to call this kind of arrangement "vertical customization". The basic idea is illustrated in "Vertical customization". Both the basic HTML and XSL-FO stylesheets in the preview package are designed to support this well-understood arrangement. Since the code for both (or all) layers in the import hierarchy is compiled by the XSLT processor, there is no impact on the system except to the developer and maintainer, who can more easily keep generic logic and local customizations separate. This is especially useful when they are authored by different parties: if NCBI distributes bug fixes or enhancements to its stylesheets, a new lower-level module can simply be swapped in for the old one; this greatly simplifies whatever local adjustments and alterations are required.

Vertical customization.

Figure

Vertical customization. The usual XSLT approach to customization uses the xsl:import instruction to provide a modular architecture. A stylesheet working at a "higher level" provides customization logic, while also invoking a "lower-level" or fallback (more...)

The xsl:import mechanism has been standard in XSLT 1.0 since its inception (1999), and is well supported in XSLT processors. Accordingly, it also works when deploying XSLT to web browsers. All major web browsers have had support for XSLT for several years now, and can perform all necessary compiling and transformation on the fly. An illustration of this arrangement appears in "Vertical customization in a web browser". The only difference between this arrangement, and customization in a standalone process, is that a literal output file is never created: the browser simply accepts all inputs at runtime.

Vertical customization in a web browser.

Figure

Vertical customization in a web browser. As long as they are all limited to XSLT 1.0 and generate HTML results (not XSL-FO), the same stylesheet modules used in a standalone ("batch") or server-side architecture (as illustrated in "Vertical customization") (more...)

But some kinds of modification are better supported through a different approach, namely running pre- or post-processes over data. We can call this "horizontal customization"; it takes advantage of a general method called "pipelining". A pre-process is essentially a preparation phase of processing, in which data may be normalized or otherwise processed (altered, filtered, enhanced) before the application of the main transformation, which generates HTML or XSL formatting objects from JATS source data. A post-process, similarly, is a modification or tune-up that occurs to the HTML or XSL-FO result, before it is processed further in a web browser (which actually displays it) or XSL formatter (which converts XSL-FO into PDF). Either type of transformation can be an easy way to introduce customized logic into a transformation. And of course, you can have multiples of either as well.

In contrast to modification by import, pipelining is appropriate for operations that can be distinguished from the data conversion (the mapping of elements from source data to target format) that is performed by the main transformation. It has two main advantages:

  • Since the process is isolated, a customization can perform a radical or complex operation more easily than can be done by way of a standard stylesheet modularization, which is better for ad-hoc changes. In general, the more complex the transformation requirements, the more advantageous it becomes to break operations apart into stages.
  • For the same reason, when a customization is achieved as a pre-process -- which will be a near-identity (JATS to JATS) transformation, since that is the format the main module expects -- it is entirely neutral with respect to what happens next. Consequently, the same pre-process can be used in pipelines that create different forms of final output (such as HTML and PDF by way of XSL-FO), without modification.
In the Journal Publishing Tag Set preview stylesheet package as delivered, several different functionalities are offered this way. First and foremost, logic provided for the auto-punctuation of bibliographic references (citation elements), is deployed as a pre-process, for several reasons:
  • Not every transformation will need this logic; this will vary according to local tagging policy; it is convenient to be able to switch it in and out easily.
  • When it is required, it will often need to be different. One organization wants references formatted in PMC format; another in APA format; yet a third according to MLA or CMS (Chicago Manual of Style). That is, according to the particulars of the target format, a different module needs to be used.
  • As explained above, the same module can be used for either HTML or PDF production.
In the preview stylesheets package, we demonstrate this principle by providing two such modules, for PMC and APA citation format respectively. (The PMC format was modified from code developed at NCBI, which we gratefully adapted with very few changes. The APA format, we developed ourselves as a demonstration.)

Horizontal customization, by means of pipelining.

Figure

Horizontal customization, by means of pipelining. Some kinds of transformation problems become too complex to manage in a single transformation pass; or it may simply be very convenient to manage them as "pre-" or "post-processes". The name developers (more...)

Other pre- and post-processes we developed and distributed with the package were much simpler, but also show what kinds of customization are easily accomplished through this method:

  • Specialized logic for filtering content based on the production medium were accomplished using near-identity transforms that selected web-only or print-only versions of graphics or tables, based on use of the alternatives element in conjunction with flags given on the specific-use attributes of objects to be included or removed. Inside alternatives, only elements with a given specific-use would be included in the result. This is easily accomplished with a small stylesheet applied as a pre-process.
  • Since the main HTML stylesheet generated generic HTML, and some users would want XHTML, a post-process was provided that would perform this conversion.
    Along exactly the same lines, specialized forms of HTML or XHTML such as ePub format, or HTML5 optimized for an iPad or other specialized display, could similarly be managed straightforwardly via post-processes running over the results of a main display transformation, in a pipeline.

While pipelining is extremely powerful, it has one significant disadvantage, namely that it isn't supported natively in XSLT 1.0 processors; and while it can be performed internally in XSLT 2.0, the mechanisms available for doing so are somewhat too rigid to support the flexibility we needed. (And if we chose this approach, we could forget about XSLT 1.0 processors and browser-based transformations.) The best ways of accomplishing it are not native to XSLT at all, but rely on an external controller or processor to manage it. This introduces a new architectural dependency. (These issues are discussed in more detail in "How to pipeline XSLT")

Because of this problem, this paper limits itself mostly to discussing modifications made "vertically", in an XSLT 1.0 context (generating HTML output), as these will be the most widely useful. And the two approaches can be combined.

Examples: customizing the main HTML module

In "vertical customization", as described earlier, one makes modifications to the off-the-shelf stylesheets by writing code in separate stylesheets that import the main stylesheet, overriding its logic selectively.

Rather than explain XSLT here, we proceed by simple examples that show useful results. The common theme here is that the modifications themselves are straightforward once the operations of the main stylesheet, including the customization features built into it, are understood. In the CSS, these features take the form of ID and class attribute values assigned in the HTML, which CSS can take advantage of. In the XSLT, they take the form of variable and parameter assignments, named and matched templates and (in the case of XSLT 2.0) stylesheet functions, all of which can be replaced by customized versions in an importing (higher-level) stylesheet.

Customizing look-and-feel of HTML output via modified CSS

Your first choice for customization of HTML output is to modify the CSS. Like customization in general, this can be done either of two ways: you can modify the packaged CSS directly (monolithic); or you can maintain your own CSS separately, which may or may not invoke the packaged CSS for fallback logic (vertical customization). For the same reasons as in XSLT, the second method is recommended for ease of long-term maintenance (it will make upgrades easier to integrate), but it does require that you change a setting in your XSLT in order for your HTML results to invoke a different CSS file.

Here we have an XSLT customization layer that does a single thing, namely override the name of the CSS file to be invoked by the output:

Providing your own CSS to your HTML output

This XSLT is included as file newcss-mod-html.xsl.

The preview created by this customization, along with the CSS in "CSS modifying the body font of HTML preview", is shown in "Customized CSS" (see "Screenshots of HTML customizations").

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="jpub3-html.xsl"/>
  
<xsl:param name="css">custom-preview.css</xsl:param>

</xsl:stylesheet>
This indicates that the CSS file invoked from the result of the transformation should be custom-preview.css. (XSLT developers will notice that the setting is a global parameter: it could be set at runtime without needing this XSLT module.)

Having changed the css parameter, the result file will now call the CSS named, which might (for example) look like this:

CSS modifying the body font of HTML preview

This CSS is included as file custom-preview.css.

@import "jpub-preview.css";

div.body,
div.back
{ font-family: monospace;
  font-size: 12pt;
  line-height: 140% }
   
/* since the body will be in monospace, we make tt sans-serif instead */
tt { font-family: sans-serif } // since the body will be in monospace, tt won't
This has the result of changing the base font for display into a monospace font with extra leading (to make the preview look more like a preview).

Similarly broad changes can be made just as easily, and any web developer will have the skills to analyze the HTML generated by the stylesheet and write CSS with the logic needed to accomplish specific results.

Customizing HTML preview content via modified XSLT

Modifying the CSS is a simple and effective way of changing the basic "look and feel" or appearance of HTML results. But in order to make changes to the content of the HTML itself, either in its organization or in the specialized production of generated content such as labels or numbering, deeper modifications need to be introduced. And the same thing goes for the XSL-FO for PDF production, since in this case the CSS logic is not maintained separately from the stylesheet. Fortunately, these modifications are similarly easy to code, once the underlying logic is clear. Here are a few examples.

Auto-numbering article structures

Whether a preview stylesheet should automatically generate numbering for anything at all is a vexed question. On the one hand, the ability to automatically generate numbering (and thereby manage numbering consistently) is a particular strength an XML-based system, removing significant overhead from document maintenance. On the other hand, not everyone wants numbering, and even when they do, they do not necessarily want it to be auto-generated. Then too, even if we know we want auto-numbering, this could be described as a simple data enhancement or normalization, which could be achieved easily in a pre-process (that is, in a pipeline). Yet not everyone can use pipelines or afford the overhead they require.

In order to thread this needle, the preview stylesheets provide an easy mechanism to switch on auto-numbering for some structures for which it is common, such as figures and tables. For others, such as sections (where numbering is typically more complex, as sections might only be numbered to a certain level and where numbering itself is frequently hierarchical), a somewhat deeper modification is needed. Both these examples are demonstrated here.

XSLT modifications for auto-numbering

This XSLT is included as file auto-number-mod-html.xsl. A screenshot of the output appears in "Auto-numbering".

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="jpub3-html.xsl"/>

<!-- We switch on autonumbering for boxed-text, fig and
     table-wrap elements by declaring variables with
     Boolean (true/false) values -->
  
<xsl:variable name="auto-label-boxed-text" select="true()"/>

<xsl:variable name="auto-label-fig" select="true()"/>

<xsl:variable name="auto-label-table-wrap" select="true()"/>

<!-- We provide for section numbering as part of the section
     title, using templates that override the imported
     stylesheet and then call its logic, parameterizing
     its contents -->

<xsl:template match="sec/title" priority="1">
  <xsl:call-template name="main-title">
    <!-- It turns out that 'main-title' is the right template
         to call: it generates an h2 -->
    <xsl:with-param name="contents">
      <xsl:number count="sec" level="multiple" format="1.1 "/>
      <xsl:text> </xsl:text>
      <xsl:apply-templates/>
    </xsl:with-param>
  </xsl:call-template>
</xsl:template>
  
<xsl:template match="sec/sec/title" priority="2">
  <xsl:call-template name="section-title">
    <!-- The 'section-title' template generates an h3 -->
    <xsl:with-param name="contents">
      <xsl:number count="sec" level="multiple" format="1.1 "/>
      <xsl:text> </xsl:text>
      <xsl:apply-templates/>
    </xsl:with-param>
  </xsl:call-template>
</xsl:template>

</xsl:stylesheet>

Developers will have a couple of questions regarding the techniques here:

  • If a mechanism can simply be switched on for elements like fig, table-wrap and boxed-text, why not also for sections?
    Sections and section numbering are more complicated than other elements, not only because they are more often hierarchical, but also because in the Journal Article Publishing models, sections (sec elements) can have both labels and titles. It is an open question where the numbering belongs (different tagging practices will be different), and the stylesheets do not take sides on this issue. The elements for which auto-numbering logic is built in and switchable are those that are either more commonly auto-numbered (such as fig or app), or where only label elements and not titles are implicated, or both.
  • Okay then, so why do the templates here call other templates by name, passing parameters, instead of simply inserting the needed content?
    That alternative approach (by which the templates in this example would simply insert h2 and h3 elements in the HTML) will certainly work. But the method used here better insulates the developer from lapses or errors in the HTML results, and particularly of values there (such as class attributes) that are used as hooks by the CSS layer. Reusing the logic of the base stylesheet is an easy way to keep things consistent, at the price of a little extra coding overhead.

Suppressing warning messages

The preview stylesheets are designed to generate in-line warning messages when the source data is defective in certain particulars. Because, again, there will be a great deal of variability in local requirements, we have limited this capability to reporting when xref (cross-reference) elements cannot be expanded properly. Basically, xref elements are handled as follows:

  • If an xref has any content, this is used without further elaboration
  • If it has no content, however, the stylesheet tries to derive content from the target element of the xref. Typically this will be a label element on the target (fig, table-wrap, boxed-text, what have you).
  • If a label element is not present on the target, but the stylesheet is generating a label (to accommodate an auto-generated number), the generated label content is used.
  • If all these fail, a warning text is displayed in three places: at the point of the xref that requires the label; at the point the label is expected; and in a summary at the end of the document.
This hierarchy of conditions is designed to accommodate the widest possible variety of tagging strategies. If you place literal content into your xref elements, or you always provide label elements, you can expect not to drop cross-references. If you insert xref elements without content, but labels are generated, then as long as you do this using the apparatus as the stylesheet expects, again your cross-references will display reasonably. Only if labels are not generated, and you have an xref with no content of its own targeting an element with no label, will you get a warning message. This might actually be useful.

In order to facilitate modification, all this logic is concentrated in a single template. The HTML stylesheet's version looks like this:

  <xsl:template name="make-label-text">
    <xsl:param name="auto" select="false()"/>
    <xsl:param name="warning" select="false()"/>
    <xsl:param name="auto-text"/>
    <xsl:choose>
      <xsl:when test="$auto">
        <span class="generated">
          <xsl:copy-of select="$auto-text"/>
        </span>
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates mode="label-text"
          select="label | @symbol"/>
        <xsl:if test="$warning and not(label|@symbol)">
          <span class="warning">
            <xsl:text>{ label</xsl:text>
            <xsl:if test="self::fn"> (or @symbol)</xsl:if>
            <xsl:text> needed for </xsl:text>
            <xsl:value-of select="local-name()"/>
            <xsl:for-each select="@id">
              <xsl:text>[@id='</xsl:text>
              <xsl:value-of select="."/>
              <xsl:text>']</xsl:text>
            </xsl:for-each>
            <xsl:text> }</xsl:text>
          </span>
        </xsl:if>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
Most of the action here is in how parameters are used to drive the behavior of the template. Two parameters have Boolean values (which can be either true or false), exclusively for reference by conditional tests in the template. The third carries an auto-generated label, if there is one. The body of the template works as follows: if a label has been auto-generated, it is used. If not, a label element or symbol attribute from the template's context (the element being labeled) is processed. Then, if no label element or symbol is present and the parameter warning is set to true(), then generate a warning message.

Since this template is called anytime a label is needed, suppressing the warning behavior is easy. (So is enhancing labels, if wanted, by passing in values of auto.) Simply remove the code that does this work:

Modifications to suppress labeling

This XSLT is included as file custom5-html.xsl. Results are shown in "Suppressing warnings"

<xsl:template name="make-label-text">
  <xsl:param name="auto" select="false()"/>
  <xsl:param name="warning" select="false()"/>
  <xsl:param name="auto-text"/>
  <xsl:choose>
    <xsl:when test="$auto">
      <span class="generated">
        <xsl:copy-of select="$auto-text"/>
      </span>
    </xsl:when>
    <xsl:otherwise>
      <xsl:apply-templates mode="label-text" select="label | @symbol"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
Of course, we haven't said this is a good idea. It depends on how you want your errors handled.

Customizations for local semantics

The examples so far suggest changes that might be made to alter or enhance display of JATS data as tagged conventionally. But JATS tagging also offers many more possibilities beyond the normal generic encoding of article structures. Specialized semantics can be expressed both by means of attribute flags (such as sec-type attributes on sections) and through other means, such as named-content elements in mixed content.

For example, this article uses named-content elements to indicate where XML element and attribute names, along with several other controlled terms, appear in the text. A couple of modifications to the stylesheet can introduce distinctive visual cues to show them, so that (for example) <named-content content-type='gi'>tag</named-content> will appear as "<tag>" in output. (The old SGML term for an element name, still used in some contexts, is "generic identifier" or "gi".)

This example also shows the range of possibilities here. The desired effect could be accomplished any of three ways:

  • This code shows the enhancement of content based on assigned values of content-type at the same time as the named-content elements are converted to HTML. Place this code in an importing stylesheet module, and it can be used by "vertical customization" (since this behavior overrides the imported stylesheet's behavior for these elements).
  • A web developer, however, will note that none of the transformations called for here fall outside of what is possible in CSS, given appropriate expression of the element semantics in HTML class attributes. And indeed the main stylesheet as delivered assigns the content-type of named-content as a class. So some fairly simple modification in the CSS can achieve the same outcome.
  • On the other hand, it would also be possible to achieve exactly the same elaboration in a different transformation altogether. As part of a near-identity pre-process, changing named-content into elements already handled by the preview stylesheet (such as monospace or whatever is appropriate), while inserting the literal content wanted, is trivial.
    Such a pre-process could also be used as part of a PDF pipeline, making it easier to maintain HTML and PDF pathways in sync. This version of the stylesheet is shown in

XSLT modifications expressing local markup semantics

This XSLT is included as file named-content-mod-html.xsl.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="jpub3-html.xsl"/>

<xsl:template match="named-content[@content-type='gi']">
  <!-- when content-type='gi', angle brackets are provided -->
  <tt>
    <xsl:text>&lt;</xsl:text>
    <xsl:apply-templates/>
    <xsl:text>&gt;</xsl:text>
  </tt>
</xsl:template>
  
<xsl:template match="named-content[@content-type='gi']">
  <!-- when content-type='attr', an '@' is provided -->
  <tt>
    <xsl:text>&lt;</xsl:text>
    <xsl:apply-templates/>
    <xsl:text>&gt;</xsl:text>
  </tt>
</xsl:template>
  
<xsl:template match="named-content[@content-type='command'] |
  named-content[@content-type='file'] |
  named-content[@content-type='variable']">
  <!-- several other types of named-content map to monospace -->
  <tt>
    <xsl:apply-templates/>
  </tt>
</xsl:template>
  
</xsl:stylesheet>

Modifying the metadata header

As described above, we did not feel it appropriate to deploy a preview stylesheet that would do an especially nice job in formatting, at least inasmuch as this would require much in the way of design. Similarly, we thought it was important to attempt to render more or less every bit of data present in the source: a preview stylesheet should not hide things. Both of these are reasons why local users will want to introduce enhancements and modifications.

A particular area where this will happen is in the metadata header, for two reasons. First, it is in displaying metadata that your own design is most likely to dictate particular requirements, which cannot be anticipated in a stylesheet meant for general use. And secondly, even given that we do not attempt to be very elaborate, this is a very difficult target to hit. When local users do not want a more elaborate rendition of document metadata (perhaps with table of contents, linking apparatus and so forth), it is likely they will want a simpler one.

Nevertheless, following the principle of vertical integration, components of the stylesheet can be overridden in an importing layer. Here is an example of a simple metadata header which strips out presentation of significant elements in the journal article front matter.

Modifying the metadata header

This XSLT is included as file metadata-mod-html. A screenshot may be seen in "Modifying metadata display"

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="jpub3-html.xsl"/>

<xsl:param name="css">custom4-preview.css</xsl:param>

<xsl:template match="front | front-stub">
  <xsl:for-each select="article-meta | self::front-stub">
    <!-- changing context to where the elements needed are found -->
    <xsl:apply-templates mode="metadata" select="title-group"/>
    <xsl:apply-templates mode="metadata" select="contrib-group"/>
    <xsl:for-each select="abstract">
      <div class="metadata-group">
        <xsl:if test="not(normalize-space(title))">
          <h4 class="subsection-title">
            <span class="generated">
              <xsl:text>Abstract</xsl:text>
            </span>
          </h4>
        </xsl:if>
        <xsl:apply-templates select="*"/>
      </div>
    </xsl:for-each>
    <xsl:call-template name="toc"/>
  </xsl:for-each>
</xsl:template>

<xsl:template mode="metadata" match="contrib-group">
  <div class="metadata-group">
    <xsl:for-each select="contrib">
      <xsl:apply-templates mode="metadata" select="anonymous | collab | name"/>
      <xsl:apply-templates mode="metadata"
        select="address | aff | author-comment | bio | email |
              ext-link | on-behalf-of | role | uri"/>
    </xsl:for-each>
  </div>
</xsl:template>

<xsl:template name="toc">
  <xsl:if test="/article/body/sec">
    <div class="metadata-group">
      <h4 class="subsection-title">
        <span class="generated">Contents</span>
      </h4>
      <xsl:apply-templates select="/article/body/sec" mode="toc"/>
    </div>
  </xsl:if>
</xsl:template>

<xsl:template match="sec" mode="toc">
  <xsl:apply-templates select="title | sec" mode="toc"/>
</xsl:template>

<xsl:template match="sec/sec" mode="toc">
  <div style="margin: 0em; margin-left: 1em">
    <xsl:apply-templates select="title | sec" mode="toc"/>
  </div>
</xsl:template>

<xsl:template match="sec/title" mode="toc">
  <xsl:variable name="sec-id">
    <xsl:value-of select="../@id"/>
    <xsl:if test="not(normalize-space(../@id))">
      <xsl:value-of select="generate-id(..)"/>
    </xsl:if>
  </xsl:variable>
  <p style="margin:0em">
    <a href="#{$sec-id}">
      <xsl:apply-templates/>
    </a>
  </p>
</xsl:template>

</xsl:stylesheet>
The details of what is happening here are less important than the general principle: this customization is achieved by copying the relevant sections of the main stylesheet and then stripping them down. To the extent possible, components in the main stylesheet are reused -- so here, for example, a number of named templates are invoked in the main stylesheet without being rewritten, because they remain useful as-is.

This customization builds a simple document header containing only title, author(s) and abstract(s), with a table of contents pointing into the document.

Generating PDF using the XSL-FO stylesheet

The XSL-FO stylesheet in the distribution generates output formatted for print, usually in PDF, using an XSL formatting engine such as AntennaHouse XSL Formatter (the engine we used for testing) or RenderX XEP. The design of this stylesheet is essentially similar to the HTML preview stylesheet, with one main exception. Since an XSL formatter doesn't accept a separate CSS file at runtime (like an HTML browser), the formatting properties designated in a CSS file must be embedded into the stylesheet itself.

In order to make modification of the XSL-FO stylesheet as easy as possible, however, these properties are all exposed at the top of the main stylesheet as XSLT attribute sets (using the xsl:attribute-set element).

Apart from this, however, the customization process for PDF output is essentially similar, so it is not explained in detail.

Pre- and post-processing stylesheets

Further customizations can be accomplished by introducing new pre- and post-processing steps into a transformation pipeline. This is explained above, and in the package documentation, so it does not require much explanation here. But it is useful to consider the strengths and weaknesses of either the vertical or horizontal method. In general, the tradeoff is that pipelining requires more of an infrastructure to support, but becomes a more attractive alternative to the extent that the requirements of the transformation become more complex. In particular, any time a transformation can be more simply conceived in terms of a data normalization, as opposed to a format conversion, the more a pipeline makes sense. Data normalization, of course, can be easy or hard; and when it is easy, there is no harm in doing it at the same time (within the same process) as the format conversion that constitutes the main transform. (Indeed, one enhancement we could perform as a separate step in a pipeline chain would be auto-numbering, since numbers can be added to sections or structures as a data normalization before the main transformation. In this case, when the main transformation was run, the numbers would already be in place, and no further logic would be required.) But when it is hard, it is generally much easier to design, code and maintain in a separate process.

For example, here is a pre-processing stylesheet that achieves the same result as the modification in "XSLT modifications expressing local markup semantics":

A pre-process to express local markup semantics

This XSLT is included as file named-content-prep.xsl. The only difference between this module and the one shown in "XSLT modifications expressing local markup semantics" is that this one generates JATS, not HTML. It "dumbs down" the descriptive elements it recognizes into a form that the main stylesheet will accept without modification.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="named-content[@content-type='gi']">
  <!-- when content-type='gi', angle brackets are provided -->
  <monospace>
    <xsl:text>&lt;</xsl:text>
    <xsl:apply-templates/>
    <xsl:text>&gt;</xsl:text>
  </monospace>
</xsl:template>

<xsl:template match="named-content[@content-type='attr']">
  <!-- when content-type='attr', an '@' is provided -->
  <monospace>
    <xsl:text>@</xsl:text>
    <xsl:apply-templates/>
  </monospace>
</xsl:template>

<xsl:template match="named-content[@content-type='command'] |
  named-content[@content-type='file'] |
  named-content[@content-type='variable']">
  <!-- several other types of named-content map to monospace -->
  <monospace>
    <xsl:apply-templates/>
  </monospace>
</xsl:template>

</xsl:stylesheet>

Screenshots of HTML customizations

Customized CSS.

Customized CSS

As explained in "Providing your own CSS to your HTML output" and "CSS modifying the body font of HTML preview", simple but dramatic alterations to display can be achieved simple by calling in a modified CSS file. Here, the body font has been changed.

Auto-numbering.

Auto-numbering

See "XSLT modifications for auto-numbering". Autonumbering can simply be switched on for elements like boxed-text and fig, while for others (such as sec), code can be provided (depending on exactly how you want it to be done).

Suppressing warnings.
Suppressing warnings.

Suppressing warnings

When labels are needed for cross-references, but are not found in the data or generated by the stylesheet, the preview stylesheet will generate warning messages. These can be suppressed by a modification like that shown in "Modifications to suppress labeling".

The first screenshot here shows what the preview stylesheet does without modification. The second shows the result when there are no warning messages. Another way to remedy the problems of the first is to provide for auto-generated labels, as shown in "XSLT modifications for auto-numbering": then these labels will be used.

Of course, the yellow highlighting was introduced by hand to show the missing warnings (it is not part of the result).

Expressing local markup semantics.
Expressing local markup semantics.

Expressing local markup semantics

In this example (see "XSLT modifications expressing local markup semantics", XSLT is provided to take advantage of certain values given on inline elements in the source data, namely gi (for element names), attr (for attribute names) and a few others.

Again, two screenshots are shown, the first without the modification (with the lapses highlighted), the second with it.

Developers interested in pipelining will see that the same effect can be achieved using a pre-process, as explained in "A pre-process to express local markup semantics"

Modifying metadata display.

Modifying metadata display

In our example (see the code in "Modifying the metadata header" we have opted to simplify display of the document metadata. But we have added a table of contents.

How to pipeline XSLT

The good news is that pipelining is essentially simple in concept, and has been practiced successfully for many years (and not only in XSLT). The bad news is that this means there are many ways to do it. The table below lists a few (see "Comparison of XSLT pipelining methods"), but this list is by no means exhaustive. (In particular, it leaves out platforms that support pipelining dynamically on web servers, such as Apache Cocoon. Although these might be useful for some publishing systems, these lie outside the scope of this paper so they are not discussed.)

When we first distributed the version 3.0 JATS Preview stylesheets, we decided on a mechanism using an extension function in Michael Kay's Saxon processor to achieve pipelining. At that time, it offered two major advantages. First, it introduced no new dependencies to the workflow, since Saxon was required, for all practical purposes, to run XSLT 2.0. This avoided problems both with platform dependency and with new components such as a make infrastructure (long used on Unix systems for this sort of thing) or Apache Ant. Secondly, since the process could be engineered internally, pipelines using Saxon would be very fast and clean, creating no extra files that would have to be cleaned up after running a transformation. For this purpose we developed a very generic and general-purpose pipelining method using a Saxon extension to call each stylesheet in the pipeline in turn. Pipelines could then be represented simply, each with its own a "shell" stylesheet which could be invoked like a normal XSLT 2.0 stylesheet.

Since that time, reasons have emerged to want to migrate off of Saxon. First, Michael Kay (Saxon's developer) changed the level of support for extension functions across his product line; they are no longer available in the basic (and free) version of Saxon. In effect, you have to use either an older version of the product, or a commercial paid version, in order to use the Saxon-based mechanism; to some extent this reverses the presumed advantage of platform-independence. (That is, while we can still run on any platform, we are locked into Saxon and Java, which it requires.) Secondly, however, a more robust and general-purpose option has become available in the form of the W3C XProc standard, which is designed specifically with the necessary functionality in mind. XProc is now supported on several processes, and performs as well as Saxon (both as cleanly and as fast, in informal tests). So it is probably the method of choice at present.

Be that as it may, there is no one single way to do it, and you may have many reasons to run your pipelines differently. In particular, if you are already running documents through pipelines of transformations for whatever purpose, it could make sense simply to run the Preview stylesheets through the same system.

Comparison of XSLT pipelining methods

Different approaches to implementing pipelines (sequences of transformations on a single data source) have different affordances on your system. Are you comfortable with a command line? Do you need to create intermediate files on disk, or avoid doing so? How often do you need to alter your setup? What are your users and development team already comfortable with?

Pipelining methodPortabilityPerformanceFamiliarityEase of useEase of maintenance
Batch files or scriptsNonePoorGoodGoodVariable
XML shell utilitySomeGoodMaybeVariableVariable
AntGood (Java)PoorMaybeVariableGood
SaxonGood (Java)ExcellentGood (XSLT)GoodGood
XProcExcellentExcellentImproving?GoodGood

Demonstration files for download

XSLT and CSS files to run the demonstratins described in this paper are available for download as jats-xslt-demonstrations.zip.

The main XSL-FO stylesheet is marked XSLT 2.0 to signal the intention that it fit within an XSLT 2.0 framework, but it will actually function correctly in an XSLT 1.0 processor, including the processors that are generally used in conjunction with XSL formatters. The HTML preview stylesheet is marked as XSLT 1.0 in order to minimize chances for unexpected behaviors in browsers. In effect, both stylesheets use the subset of XSLT 2.0 that is also XSLT 1.0.

Footnotes

*

The main XSL-FO stylesheet is marked XSLT 2.0 to signal the intention that it fit within an XSLT 2.0 framework, but it will actually function correctly in an XSLT 1.0 processor, including the processors that are generally used in conjunction with XSL formatters. The HTML preview stylesheet is marked as XSLT 1.0 in order to minimize chances for unexpected behaviors in browsers. In effect, both stylesheets use the subset of XSLT 2.0 that is also XSLT 1.0.

Copyright Notice

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

Bookshelf ID: NBK47104
PubReader format: click here to try

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...