NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2020/2021 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2021.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2020/2021

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2020/2021 [Internet].

Show details

WYSIMTWYG: What you see is more than what you get when using XML

.

Author Information

Authoring standards documents in XML provides publishing options not available when using traditional word-processing-based desktop publishing. Organizations not having used XML yet may be unaware of the opportunities to tailor back-room renditions of standards documents that are augmented to be useful to authors and editors yet remain unpublished and are never seen by users of the final result. This is a case study about meeting a client's requirements to convert existing standards to NISO STS, update them, and add semantic markup. It reveals the use of a tailored rendition of standards documents to expose and review markup not visible to the end user. Agreement on the semantic markup is critical to the use of the content from the published document in downstream processes. Tailored augmented renditions are important tools to support the review of hidden markup, and are not available as readily when not using XML as the basis for publishing.

Overview

Often there are important business requirements that underpin the technical work we do when we deploy XML. The technology helps us to improve the products we are producing already. And as business requirements change over time, more sophisticated use of the technology is needed. Often, sophistication can make it more difficult to use technology.

The business requirements for the standards-writing world of ISO and other standards developing organizations (SDOs) are good examples of exhibiting such changes over time. SDOs are migrating from a history of producing simple paper products to a future of technology-supported conformance and compliance testing tools. We are in the midst of these changes and at the point where our authors are using new sophisticated XML-based technologies where they have not done so before.

This leads to important questions that need to be asked about how we are producing XML-based standards documents, such as:

  • how can an SDO review the quality of the XML content when the original document was not or is not authored by an XML expert or even a Word expert?
  • how can the accessibility of a standards document be improved to respond to the increasing pressure on an SDO from governments and society for such features?
  • how can an SDO get more value from the content of a standard that could be consumed digitally in many ways other than a printed page image (or why bother with XML in the first place)?

For a long time the instant feedback of a WYSIWYG (What You See Is What You Get) editing environment has dominated the authoring of standards documents. In the case of ISO, writers of specifications were required to produce the PDF according to the mandated Directives. Eventually, a Microsoft Word template was developed to assist those writers willing to use the commercial tool in the document creation. At all times the writer was responsible for the final layout, and their adherence to the Directives was monitored as required by ISO.

Then ISO developed an XML vocabulary targeted for converting their back catalogue of standards documents into a structure used for republishing the content and making the content available in new and planned end-user products. The Standards Tag Set (STS) is derived from the Journal Article Tag Suite (JATS).

But the editing process has remained oriented towards the word processing environment with focus on the final Directives appearance. Authors still have available word processing templates that drive the transformation into an XML structure suitable for publishing. But ISO and committee reviewers of draft documents have had only the visual final layout to work with. Unfortunately, the end-user visual layout does not expose critically-important information needed in the review process to validate the proper syntactic use of structure and content for publishing.

Moreover, real value unrelated to the publishing objective can be distilled from a specification marked up in XML. Perhaps unlike other XML journal workflows, though certainly not excluding them, the standards-development process presents unique opportunities for identifying embedded semantic information. There are requirements management database systems that promote standards requirements compliance. But for such database systems to provide value they must be accurately and completely filled with the requirements dictated by and expressed within the text of a standard. In turn, this mandates that the obligations imposed by a standard be properly identified, recorded, and loaded into the system. The fewer steps involved in loading requirements into the management system, the faster and more reliable the loading task is achieved.

Real-world experience suggests subject matter experts are unwilling or, at the least, unhappy being burdened with the loading of requirements into such systems. Often the standards author is a different resource than the subject matter expert, tasked with supporting the expert by capturing the expert knowledge in the document. But that leaves still the expert’s responsibility to review and validate the content of the standard. Subsequent loading of a requirements management system, possibly by a third resource, needs validation again by the subject matter expert to ensure that the standard has been properly interpreted.

With XML now in the authoring/publishing pipeline, semantic markup is available to interleave within the syntactic markup. The content of semantic markup is governed by an agreed-upon ontology of concepts necessary for downstream processing. This ontology is reflected by a taxonomy of terms and values injected in the semantic markup to be detected by programs analyzing the XML. A typical ontology suitable for use in standards documents identifies the many possible kinds of conformance requirements that need to be managed. The corresponding taxonomy is used to label the authored requirements for downstream identification and subsequent automation. But there are hard and soft costs incurred to get experts familiar with the authoring tools, or even familiar with XML itself, let alone the cost of the tools themselves.

Therefore, new review tools are needed. XML users long have known the leverage provided by using markup that facilitates the production of multiple outputs from a single marked-up input. Production of a single standards document in PDF, HTML, and EPUB from its source in XML is an obvious example of this.

But, as mentioned above, there are more parties involved in the production process than just the target end users. And it is useful to remember that from the XML one can publish renditions that are useful to these other parties while being useless or irrelevant to end users. It is possible to publish augmented renditions of the content to expose the syntactic and semantic markup found in the XML. Reviewers and subject matter experts already are familiar with the simple tools that present formatted renditions such as PDF files. When the augmented renditions satisfy the review process there is no need to incur the cost in training, the time in learning and operating, nor the wrath from participants unable or unwilling to work directly with the XML markup.

An additional challenge is that a committee typically has a number of experts, not just one expert. The PDF allows a group of experts to review, debate, and agree the content without recourse to collaborative XML editing tools. And if you thought it was difficult to get one expert to use such tools, imagine the challenge of getting all the experts to do so.

This case study reviews how Standards Digital leverages simple augmented PDF renditions of NISO STS standards documents in the review of syntactic and semantic markup. These augmented renditions are off-the-shelf features of the Réalta Online Publishing Solutions Limited subscription-based publishing cloud service. Standards Digital has assembled a quality control process engaging subject matter experts in the production of rich, thorough, and accurate syntactic and semantic markup in specifications published by Standards Norway without exposing the experts to the rigours (and the tools) of XML. The resulting high-quality XML files then are transformed for the direct importation by requirements management systems using the systems’ existing facilities.

Tools are needed for the review process

Authors long have had their focus solely on the final appearance of a standards document because, in the past, the printed standards document was the only end-user product being produced. As long as the editor put the words and diagrams on the page according to the mandating ISO Directives, both ISO and the end users would have a (fairly) consistent way of working with the concept. The end user then gleans the appropriate conformance content from the publication.

ISO released the XML vocabulary ISOSTS for standards publications and based it on JATS. Subsequently the NISO STS (Standards Tag Suite) project refined the vocabulary that is used today. The XML is created by the SDO converting the back catalogue and by authors using XML-based tools. Some of these tools leverage word processing document formats. There are publishing systems that take the XML and produce the final output according to the ISO Directives Part 2.

But there are constructs in the XML markup that are not revealed on the printed page according to the Directives. Also, there is hidden content placed in the XML markup that is targeted for downstream processes, such as the use in semantic constructs of the values of a taxonomy representing an ontology. That ontology need not be restricted only to the use of STS and the standards-development process, as any organization having adopted JATS for their documentation structures has the opportunity to develop, deploy, and benefit from an ontology of concepts captured by the words on the page. Harvesting the use of the ontology in a non-traditional downstream process increases the value of the effort put into the use of JATS for traditional publishing.

But if those semantic constructs are incorrectly structured or populated, the quality of the end result is diminished even if the printed page is suitable for the end-user view.

For committee members and experts in the standards community in the past, this end-user view has been the only alternative to using XML-based tools for reviewing the content.

But XML long has been used for producing multiple outputs from a single source. It is a straightforward extension to the end-user PDF print rendition to conceptualize an augmented PDF print rendition targeted for the editors and the experts. Such a rendition exposing the hidden markup and hidden content to the reader would be inappropriate for the end users, but entirely suitable to the content editor and subject matter expert reviewer. Content reviewers can trust the publishing system to produce the suitable end-user rendition and focus on the exposed augmentations. Subject matter experts can review distilled semantic content for accuracy without regard for its appearance in the document, but still seeing the essence of the layout. All downstream processes benefit, not only the publishing of the words for end users.

PDF commenting tools are ubiquitous. Reviewers and subject matter experts likely already are familiar with their operation. A document review process incorporating simple PDF tools has been found to be very effective in engaging stakeholders who have no need (nor desire) to learn arcane XML tools and technology. This validates the effort put into publishing augmented renditions for quality assurance processes.

Exposition of syntax (semantics of the JATS vocabulary)

Getting the markup right

The names of the elements and the attributes in any XML vocabulary used to mark up content simply are labels associating the content with the semantics of the vocabulary.

Consider, then, what semantics are being represented by the syntax of elements and attributes in JATS:

Box 1

JATS is an XML vocabulary (similar in purpose to other document-based XML vocabularies such as DocBook or TEI) designed to model current journal articles. JATS is a named collection of XML elements and attributes that can be used to mark the structure and semantics of a single journal article.

… and in the NISO STS specialization of JATS:

Box 2

ANSI/NISO Z39. 102-2017 Standards Tag Suite (STS) is a standard for the XML encoding of standards documents, with the goal of enabling interchange of tractable versions of standards documents. STS provides tags for Standards and Adoptions of Standards, Errata and Corrigenda, and other normative documents.

The structural nature of journal articles and standards documents dictates the semantics of the vocabulary to be the kinds of constructs often, though not exclusively, consumed visually. Sections of the document contain paragraphs, figures, formulae, tables, etc. Such are laid out on the page or screen in an order and appearance anticipated by the reader so that the reader can absorb the words as intended. For example, how a table can be used to convey components of information under column headers or correlations of information at the intersections of columns and rows.

The semantics of the markup include concepts leveraged only in some target renderings, not all. Consider that a figure includes concepts that are visualized in most target renderings, such as the graphic image appearing on both physical paper and the screen, and concepts that are visualized only in a subset of target renderings, such as the alternative text associated with the graphic only on the screen.

Example of figure markup

See End-user view for an example of the printed page of an STS annex that begins with a <p> element, a <disp-formula> element, another <p> element, and a <fig> element. The markup for that image is as follows, and it may be a challenge to some to find in the XML the <alt-text> element quickly should it be necessary to review what text is going to be shown for the figure to an eventual user with a screen reader or other assistive technology:

<sec id="sec_I.1">
<label>I.1</label><title>General</title>
<p>The hook tilting resistance factor <italic>C</italic><sub>t</sub> is defined through
 <xref ref-type="disp-formula" rid="formula_I.1">Equation (I.1)</xref> as follows:</p>
<disp-formula id="formula_I.1"><mml:math display="block" id="mml_m174">
 <mml:mrow>
  <mml:msub>
   <mml:mi>C</mml:mi>
   <mml:mtext>t</mml:mtext>
  </mml:msub>
  <mml:mo>=</mml:mo><mml:mfrac>
   <mml:mi>M</mml:mi>
   <mml:mi>F</mml:mi>
  </mml:mfrac>
 </mml:mrow>
</mml:math>	<label>(I.1)</label></disp-formula>
<p>where</p>
<fig fig-type="figure" id="fig_I.1">
<label>Figure I.1</label>
<caption><title>General presentation of hook tilting resistance</title></caption>
<array>
<table frame="void" rules="groups">
<col width="4.08%"/>
<col width="4.07%"/>
<col width="91.85%"/>
<tbody>
<tr>
<td align="left" scope="row" valign="top"/>
<td align="left" valign="top"><italic>C</italic><sub>t</sub></td>
<td align="left" valign="top">is the hook tilting resistance factor;</td>
</tr>
<tr>
<td align="left" scope="row" valign="top"/>
<td align="left" valign="top"><italic>M</italic></td>
<td align="left" valign="top">is the moment resisting the tilting
 movement of the hook, see
 <xref ref-type="fig" rid="fig_I.1">Figure I.1</xref>;</td>
</tr>
<tr>
<td align="left" scope="row" valign="top"/>
<td align="left" valign="top"><italic>F</italic></td>
<td align="left" valign="top">is the vertical force acting on the hook.</td>
</tr>
</tbody></table></array>
<graphic xlink:href="fig_I.1.png">
 <alt-text>A hook opening to the left is illustrated indicating
 with an arrow the down direction of vertical force, the
 counter-clockwise direction of moment, and the clockwise direction
 of the movement.</alt-text>
</graphic>
<table-wrap content-type="fig-index" id="tab_ar" position="float">
<caption>
<title>Key</title></caption>
<table rules="groups">
<col width="4.08%"/>
<col width="95.92%"/>
<tbody>
<tr>
<td align="left" scope="row" valign="top"><italic>β</italic></td>
<td align="left" valign="top">direction of the tilting movement</td>
</tr>
</tbody></table></table-wrap>
</fig>
</sec>

Looking at the PDF rendering of the section, one cannot see the alternative text. The reviewer is obliged to review the XML source, or the HTML source, or hover over each individual diagram in the HTML browser. These options may be untenable for the reviewer.

Fig. 1. End-user view.

Fig. 1 End-user view

Consider in Augmented view, then, how the reviewer is served by augmenting the PDF presentation with a visual indication of the names and values of source XML constructs from which each of the laid-out components are found.

With clients this augmented rendition is termed “the editorial view” of the document.

In the editorial view some of the obvious constructs are not annotated, such as titles and paragraphs. However, should such an obvious construct have any attributes, the construct name and its attributes are revealed so that the reviewer does not miss important information. The initial use of such a rule quickly led to adding a feature of indicating which attributes are to be ignored in the rendering. For one client each and every construct has an id= attribute with a 36-character UUID that clutters the page enough to make the rendering less useful.

The markup reveals some non-obvious details. The table is included in an array and visually appears to have two columns but has been marked up with three.

The markup for the graphic exposes the file name referenced in the xlink:href= attribute, and the complete text of the <alt-text> is exposed. These two pieces of information likely are very useful for the reviewer to validate for any graphic.

Fig. 2. Augmented view.

Fig. 2 Augmented view

The alternative text for a graphic is becoming increasingly important in meeting accessibility guidelines. The editorial view supports this task by summarizing for the editor every use of <graphic> and <inline-graphic> and indicating that construct’s specification (or lack thereof) of alternative text. In Summary of the use of alternative text for graphics one can see that the graphics used in figures 2 and 4 have alternative text but such cannot be found for the graphic in figure 3 and so requires attention.

Fig. 3. Summary of the use of alternative text for graphics.

Fig. 3Summary of the use of alternative text for graphics

Not everything seen on the page comes from the author. Consider in Augmented view of massaged content the editorial-view rendering of a figure of a portion of a table found in an earlier draft of this document that is authored using the JATS authoring model. The authoring model does not include labels for sections, tables, graphics, and other constructs. Yet the rendering of the JATS content needs such labels and so these are injected as part of the publishing process of a JATS authoring instance. Simply exposing these injected constructs as if they were authored constructs may mislead the editor to think the labels are created by the author. By distinguishing the colour (magenta vs. red) and the bracketing (brace vs square) of the construct name it becomes obvious that the figure label shown is different than the figure title shown, in that the label is injected and the title is authored.

Fig. 4. Augmented view of massaged content.

Fig. 4 Augmented view of massaged content

Features and configuration of the editorial view

As mentioned, not every bit of markup needs to be exposed to be useful. In fact, when every included element and attribute is exposed, it becomes difficult to separate the wheat from the chaff. But the tool cannot make all the decisions on what to reveal and what to keep hidden.

A configuration file controls each client’s leverage of the editorial features. By default, all features of the editorial view are engaged and so the client chooses which features to suppress. An example is in Tool configuration. The generic design of <named-content> allows the editorial tool to support any project’s taxonomy without any changes to the code, only to the configuration.

Fig. 5. Tool configuration.

Fig. 5 Tool configuration

Note the indicated feature of suppressing the XPath address. When not suppressed, every element that is exposed includes in its attributes an absolute XPath address that can be copied and pasted into XML authoring tools (for example, oXygenXML) allowing the author to jump to the precise location in the input file. Lengthy XPath addresses are accommodated by using a near-zero point size padded left and write with a full-size underscore. The underscores provide targets for the cursor when selecting the XPath text between the underscores for copy and paste. This feature has proven helpful for the tool developers but, so far, uninteresting to clients.

Exposition of values (semantics of the user content)

Establishing and using user-defined semantics

As mentioned before, an XML vocabulary for marking up a document is a set of labels representing the semantics of the information being marked-up. In JATS, we use labels representing concepts of journal articles such as sections, paragraphs, etc. In the STS derivative of JATS yet more labels are available representing concepts of standards documents, though primarily in the metadata and not in the body content. But the labels published, standardized, and then constrained by the available schemas cannot, themselves, be extended to include yet more labels. Such would violate the published schemas. Publishing yet another schema may inhibit using downstream processing systems that recognize only the labels found in the STS and JATS vocabularies.

Thankfully, the JATS designers accommodate the need to seed the document with user-defined semantics by providing the element <named-content>. Using this element one can wrap journal constructs in a way that does not impact their presentation because the visualization of the construct, itself, is transparent to the reader reading an article. Downstream processes can leverage semantic markup for its own purposes, re-purposing the wrapped journal markup for whatever need must be met. These downstream needs must be identified, enumerated, and catalogued so that all processes know what all other processes are going to be recognizing, from authoring through to utilization.

There are some basic steps to take when establishing user-defined semantics. The collection of identified concepts as a whole is the ontology of the subject area. Each concept must be catalogued with its definition and its relationship to other identified concepts. The labels used to classify the concepts in their relationships is the taxonomy of the ontology.

The author’s objective, then, is to recognize in the journal content any important association of that content to the ontology and then record its classification using the taxonomy in the user-deployed semantic markup. Downstream processes then can extract the journal content associated with a semantic concept and leverage it for its own objectives beyond simply their visualization as a journal article.

Moreover, the Schematron assertion-based schema language is ideally suited to validate the appropriate use of the taxonomy found in the XML elements’ attributes the author has used for semantic markup. Unlike a structural schema language, the use of Schematron inspects the contextual use of taxonomy values in content against the rules that are asserted by the designer of the ontology.

Process case study

Requirements-management databases and their systems are tools to help remove ambiguities from a project’s requirements. Articulating requirements against which implementations are measured vastly improves the task of meeting those requirements in the end result.

The quality of the articulating is predicated on the quality of the inputs. If the inputs are incomplete, or ambiguous, or simply wrong, the measures against which results are based will skew the results away from what is intended. The inputs, then, should be the responsibility of the experts in the know to ensure the quality of the results. In the real world, however, the experts find the repeated demands on their time and effort to be a burden. Often the task of requirements definition falls to a project manager or someone with limited domain knowledge or technical proficiency.

Standards documents are good examples of a source of requirements against which performance or implementation must be measured consistently across all users of the document. Conformance may be critical to interoperability, safety, or community. Identifying the requirements in a reliable fashion assures the standards publisher of the utility of the document to its users. Without such, and as has been for a long time in standardization, identifying critical requirements of a standard has been subjective to the reader based solely on the way the words have been used and presented, and thereby interpreted. When different readers have differing interpretations of requirements, the goal of consistency is lost and the purpose of having written the standard in the first place is not met.

Presumably, the domain experts have participated sufficiently in the writing of the standards content in order to capture the requirements at the necessary level of detail, granularity, and arrangement to be measured in a conforming instantiation of the concepts. The standards document writer is responsible for arranging and expressing the requirements for the benefit of the reader.

Having to repeat the process, likely involving the experts once again, to populate a requirements-management database may be a bridge too far for their enthusiastic engagement. As a result, the quality of the database is lesser than the quality of the standard itself. It follows, then, that implementations measured against the database will be lesser than an ideal implementation of the standard. Likely this ends up with extra work going back to the original standard trying to ascertain conformance criteria by the age-old method of gleaning what one can from the words in a private interpretation.

Standards Norway, a national body SDO, recognizes the need to properly identify conformance requirements in their standards documents. By creating a standards conformance ontology of concepts reflecting their requirements-management database, and then marking up a standards document with a taxonomy of classifications of the authored content, the standards document itself becomes a mechanically-processed input to be imported in a lossless fashion into the requirements database. When using STS the authored content in XML, semantically identified by an ontology using its taxonomy, can be extracted and transformed, say using XSLT, into any XML needed for database import.

Standards Digital, a Norwegian consulting firm servicing SDOs, prepared an ontology including (but not limited to) the following concepts (illustrated in the figures). Requirements to be loaded into the management database, based on the subject matter, are considered for three types of capabilities: general, health, and safety. For each requirement its weight, based on ISO Directives, is considered as being possible, permitted, recommended, required, or constrained.

Accordingly, for the semantic vocabulary identified as “requirement”, the taxonomy classifies the content types as “capability”, “health”, and “safety”, and the vocabulary terms as “can”, “may”, “should”, “shall”, and "must".

Using the JATS <named-content> element in XML, various statements of the standard document are then semantically marked up for their appropriately-identified content type and vocabulary term within the given vocabulary.

A formal review process involving subject matter experts was launched and proved to be very successful in revealing, refining, and repairing the complete set of requirements that had meant to be conveyed to the reader of the document. The marked-up XML was then input to the transformation that produced the file used to import the requirements into the database.

An interesting byproduct of putting the effort into semantically marking up the content is the improvement of the content itself in the written word. Consider the possibility that a standards writer has written a sentence describing a mandatory requirement without using the word “shall”. Should the domain expert instruct the sentence to be marked up semantically as a requirement, it becomes obvious that the wording falls short. Repairing the wording improves the quality of the document for the historical reader of only the words who isn’t using a database of any kind.

The mechanics of the semantic markup

Consider in End-user PDF view end-user PDF view of some content of a NORSOK standard. This is what the reader of the document sees as it follows the particular style guide for presentation of simple paragraphs.

Fig. 6. End-user PDF view.

Fig. 6 End-user PDF view

In End-user HTML view the end-user HTML view shows nothing different from the PDF as the same style guide applies.

Fig. 7. End-user HTML view.

Fig. 7 End-user HTML view

For those readers not using a requirements database, the onus is on them to distill the requirements from only the words being used. One sees the keyword “should” in the first, third, and fourth paragraphs. One also sees the word “shall” in the third paragraph. But the wording of the “shall” appears to be within the context of the wording with the “should”. This is an example of a hierarchical requirement incorporating contexts of some requirements within others.

In Authoring view the XML illustrates how the sentences have been marked up using <named-content> with content-type=, vocab=, and vocab-term=. There is no semantic value found within the second paragraph. The first and fourth paragraphs appear to have their entire content as separate suggestions. The whole of the third paragraph also appears as a suggestion, but it contains wholly within a subordinate mandatory requirement that is in play should the suggestion itself be in play for the end user.

Fig. 8. Authoring view.

Fig. 8 Authoring view

Such a nuance as found in the third paragraph, of the last sentence being a subordinate requirement rather than a standalone requirement, is not readily apparent in the simple formatting of the paragraph according to the style guide. It is allowed that some paragraphs have multiple standalone requirements while other paragraphs such as this have subordinate requirements. Different readers of only the rendered text published by the SDO may selectively interpret the published requirements differently. This may impact on interoperability or even on health and safety. This underscores the importance of having the intent of the subject matter experts properly conveyed to the reader of the information.

The editorial view of semantic markup reveals the attributes used per the restrictions specified by the configuration. In addition, it colours the content using different background colours for different content types, in this case a deep yellow. This allows the reader of the editorial view to readily find where semantic statements are marked up and where they are not.

In Content marked up with user semantics one can see that the second paragraph of the four does not include any semantic markup but the other three paragraphs do.

Note in the image how the <named-content> elements are enumerated with ordinals from the beginning of the document. This numbering helps in a few ways. When referencing a particular element in the review process, one can cite its ordinal reference. When a paragraph contains multiple elements, one can readily establish of the elements are nested or not. The “closing” markup showing the ordinal reference is more concise than ending with the element name, plus the reference makes clear which of the possibly many elements is being ended.

Fig. 9. Content marked up with user semantics.

Fig. 9 Content marked up with user semantics

Walking through each of the pages of a standards document is labourious and time consuming for the subject matter expert whose time is considered the most valuable.

The editorial view includes at the end a summary, excerpted in Summary of each marking of a semantic to show the four constructs from the four example paragraphs, that collects all of the <named-content> elements and repeats the surrounding content for review. Note how the text of requirement “5.1” can be found also at the end of requirement “5”. The configuration elides reporting the vocab= attribute value and so only the content-type= and vocab-term= columns are shown.

To assist the reviewer in establishing the context in which the requirements are found, the columns are hyperlinked to the content in context, and the editorial exposition text is hyperlinked to the content in summary. Jumping back and forth is easy to do.

Thus the table becomes a summary of all of the content targeted for downstream semantic processing. It allows the subject matter expert to ensure the content being sent is appropriate and not inadvertently included when not. Reviewing the body of the document for all content without a coloured background allows the subject matter expert to see readily what is not being marked up for downstream processing, possibly identifying something that was missed during editing.

Fig. 10. Summary of each marking of a semantic.

Fig. 10 Summary of each marking of a semantic

Illustrating the wider application of the ontology in a document, Summary of each marking of a semantic shows the use of alternative colours. These colours are reflected in the summary as shown in Summary of each marking of a semantic.

Fig. 11. Semantics of different content types.

Fig. 11 Semantics of different content types

Fig. 12. Summary of semantics of different content types.

Fig. 12 Summary of semantics of different content types

It is not uncommon for a standard to have hundreds of requirements identified using semantic markup. To review the overall application of a taxonomy within an instance, the editorial review also produces a summary of the summary as shown in A summary of the summary of semantics. This particular example reveals there are six uses of semantic markup absent a vocabulary term. This is inappropriate for loading into the requirements management database. While this would be caught by a downstream application of Schematron validation to check all values, the summary of the summary reveals the list of oversights to the reviewer instantly.

Fig. 13. A summary of the summary of semantics.

Fig. 13A summary of the summary of semantics

A common tool for the subject matter expert

Given that the complexities of the XML semantic markup now are represented with visual distinction in the editorial PDF rendition, there is no need for the subject matter experts and other reviewers to engage any XML-based tools for review.

In Expert feedback one can see the use of a PDF commenting tool. The author "doro" has marked up two successive sentences as requirements enumerated as “128” and “129”. The subject matter expert “by232” picked some text arbitrarily to make a comment and succinctly states “129=128.1” that the author acknowledges. This is simple and unambiguous, and it doesn’t take long for the subject matter expert to record their observation.

Fig. 14. Expert feedback.

Fig. 14 Expert feedback

When using XML you can see more than what you get

XML tools and technology are being deployed to address new important business requirements in publishing, particularly within standards developing organizations. Authors of standards need tools to help in producing quality XML content. Moreover, their editors and other expert stakeholders need tools to help in reviewing the quality of that content. In tandem, these tools help to extract more value from the writing investment, producing more accessible documents and supporting downstream tools, such as conformance and compliance systems, with richly encoded information associated with the written word.

PDF commenting tools are ubiquitous and easy to use by stakeholders involved in the quality assurance process of publications. Costs and frustration both are reduced when familiar tools are used and there is no need to engage the stakeholders in XML technology.

This illuminates the utility of augmented renditions of documents that expose hidden markup and content used in the XML but not visible in the end-user publications. This review editorial rendition simply is another result of applying the long-time principle of “write once, publish everywhere”, but to internal requirements in addition to the usual external requirements.

Whether for standards development or any other semantic ontology of concepts represented by the words, a document review process incorporating simple PDF tools has been found to be very effective in engaging stakeholders who have no need (nor desire) to learn arcane XML tools and technology.

This validates the effort put into publishing augmented renditions, where for quality assurance processes it is very valuable to see more than what you get in the end-user rendition.

Copyright © 2021 Réalta Online Publishing Solutions Limited, Ireland.

The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 Ireland License.

Bookshelf ID: NBK556169

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...