General Tagging Practice

DTD

2.3

PMC will accept content tagged in any version of the NLM DTD or NISO JATS 1.0. Details are provided below.

The XML should conform to the NLM Journal Publishing DTD, version 2.3. (http://dtd.nlm.nih.gov/publishing/2.3/index.html)

The DTD is available on the web: http://dtd.nlm.nih.gov/publishing/2.3/journalpublishing.dtd

The complete Tag Library is available on the web: http://dtd.nlm.nih.gov/publishing/tag-library/2.3/index.html

All of the files are available by FTP ftp://ftp.ncbi.nih.gov/pub/archive_dtd/publishing.

3.0

PMC will accept content tagged in any version of the NLM DTD or NISO JATS 1.0. Details are provided below.

The XML should conform to the NLM Journal Publishing DTD, version 3.0. (http://dtd.nlm.nih.gov/publishing/3.0/index.html)

The DTD is available on the web: http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd

The complete Tag Library is available on the web: http://dtd.nlm.nih.gov/publishing/tag-library/3.0/index.html

All of the files are available by FTP ftp://ftp.ncbi.nih.gov/pub/archive_dtd/publishing.

1.0

PMC will accept content tagged in any version of the NLM DTD or NISO JATS 1.0. Details are provided below.

The XML should conform to the JATS Journal Publishing , version 1.0. (http://jats.nlm.nih.gov/publishing/1.0/index.html)

The DTD is available on the web: http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd

The complete Tag Library is available on the web: http://jats.nlm.nih.gov/publishing/tag-library/1.0/index.html

All of the files are available by FTP ftp://ftp.ncbi.nih.gov/pub/jats/publishing/1.0/.

Associating Schemas

PMC prefers to receive data associated with DTDs but will accept content associated with W3C XML Schema, RelaxNG, or RelaxNG compact syntax schema types.

A schema may be associated with an XML document in several ways. A DTD can be referenced using a DOCTYPE declaration, a W3C XML Schema can be referenced using the @schemaLocation or @noNamespaceSchemaLocation attributes on the root element, or any schema type may be associated with the <?xml-model?> processing instruction defined by the W3C (http://www.w3.org/XML/2010/01/xml-model/).

In PMC, we strongly prefer the DOCTYPE declaration for associating DTDs and the @schemaLocation or @noNamespaceSchemaLocation attributes for W3C XML Schema association, but we will accept the <?xml-model?> processing instruction for DTD, W3C Schema, RELAX NG, and RELAX NG compact syntax when the following rules are observed:

  1. The processing instruction(s) must be placed after the XML declaration and before the root element.
  2. Each processing instruction MUST have an @href pseudo attribute. The content of this attribute must be either the filename of the schema or the complete URL of the schema (e.g. "JATS-journalpublishing1.rng" or "http://jats.nlm.nih.gov/publishing/1.0/rng/JATS-journalpublishing1.rng"). Local or relative paths should not be used (../schemas/JATS-journalpublishing1.rnc or C:\work\schemas\JATS-journalpublishing1.rnc).
  3. Each processing instruction MUST identify the schema type that is being referenced with either an @type or @schematypens pseudo attribute according to the list below:
    1. for DTD, set "type='application/xml-dtd'"
    2. for XSD, set "@schematypens='http://www.w3.org/2001/XMLSchema'"
    3. for RNG, set "@schematypens='http://relaxng.org/ns/structure/1.0'"
    4. for RNC, set "type='application/relax-ng-compact-syntax'"

<?xml-model href="JATS-journalpublishing1.rng" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://dtd.nlm.nih.gov/publishing/2.3/xsd/journalpublishing.xsd"
     schematypens="http://www.w3.org/2001/XMLSchema"?>
<?xml-model href="journalpublishing3.dtd" type="application/xml-dtd"?>

If more than one <?xml-model?> processing instruction is provided in an article, or if they are provided along with a DOCTYPE declaration or article attributes that reference an XSD file, PMC will process the articles in the following way:

  1. DTD (from DOCTYPE declaration or <?xml-model?>)
  2. XSD (from @noNamespaceSchemaLocation or <?xml-model?>)
  3. RNG (from the <?xml-model?>)
  4. RNC (from the <?xml-model?>)

Capitalization

Use title case capitalization for PMC, particularly in <article-title> and <subject>.

Continuous Makeup Articles

When more than one article starts on the same page in a printed journal, treat each as an individual article. Tag each in its own file with a unique filename. Articles that start on the same page will have the same <fpage>. Use the @seq to assign sequence letters so that each article will have a unique fpage/sequence.

Embargo Delay

Generally speaking, all articles for a journal in PMC have the same delay time between the publication date and the time they are available in PMC. In some cases, however, these delays need to be set at the article level rather than at the journal level. See "Release Delay" under Processing Instructions for details on how to tag embargo delays for individual articles.

Empty Elements

Do not use empty elements for formatting or any other purpose.

All required elements should have content.

Formatted Text

As a rule, use formatted text (<bold>, <italic>, <sc>, etc) only to set off a piece of information. Do not set entire elements in formatted text. For example, if a <title> is set completely in boldface, do not tag the <bold>. However, if a title has a word or some words set in boldface for emphasis, tag those words using <bold>. Mainly this will apply to <title>, <p> in <abstract>, and <label>. It might also show up in <aff> and special sections, like <ack>.

Languages

Based on the agreement between the publisher and NLM, PubMed Central may accept non-English articles and/or English articles with non-English parts (titles, abstracts, etc.)

Non-English content must be identified with @xml:lang. Unlike nearly all other attributes in XML, the value of language is inherited. This means that all elements inside the one with the language attribute (its descendants) are assumed to be in the same language unless they explicitly set their own @xml:lang attribute.

The rule for tagging language is that the primary language of the article should be set in the @xml:lang on <article>. Any item within the article that is in a different language must be tagged with an @xml:lang to identify the language of that piece.

Note: English is the default value for @xml:lang on <article>, <response>, and <sub-article> and does not need to be set explicitly at these levels.

See the multiple-language examples in <abstract> and Article Title for tagging details.

Links

2.3

Tag all links to objects within the document (e.g. tables, figures, display formula) with the <xref> element and include the appropriate @ref-type. See <xref>.

Tag all external links with <ext-link> and include the appropriate @ext-link-type. See <ext-link>.

Tag all links to related articles (e.g. from a correction to the corrected article) with <related-article> and include the appropriate @ext-link-type, @related-article-type, and citation information. See <related-article>.

3.0

Tag all links to objects within the document (e.g. tables, figures, display formula) with the <xref> element and include the appropriate @ref-type. See <xref>.

Tag all external links with <ext-link> and include the appropriate @ext-link-type. See <ext-link>.

Tag all links to related articles (e.g. from a correction to the corrected article) with <related-article> and include the appropriate @ext-link-type, @related-article-type, and citation information. See <related-article>.

1.0

Tag all links to objects within the document (e.g. tables, figures, display formula) with the <xref> element and include the appropriate @ref-type. See <xref>.

Tag all external links with <ext-link> and include the appropriate @ext-link-type. See <ext-link>.

Tag all links to related articles (e.g. from a correction to the corrected article) with <related-article> or <related-object>.

Math

Tag all display formula with MathML mixed or presentation markup. Tag inline formula with MathML mixed or presentation markup when it cannot be represented by regular article elements and Unicode™ characters. MathML 2.0 is included in the DTD. Each <mml:math> should have an id.

Do not set math as <tex-math> except within <alternatives>.

Mathmatical elements (<disp-formula> and <inline-formula>) should contain a single mathematical expression. That is, they should contain a single representation of the expression in plain markup, <mml:math>, or <graphic>. If the expression is being supplied in more than one format, the representations should be tagged to indicate that they are alternative representations of the same expression (see Alternate Versions of a Single Object).

The MathML 2.0 Specification describes a number of requirements that are not enforced with the MathML DTD or W3C Schema. See <mml:math> for details on these requirements.

Publication Dates

Requirements for publication dates are based on the publishing model of the journal in which the articles are published.

There are two basic classes of publication: issue-based and article-based.

  • Issue-based publication is when an entire issue is published at one time―in print, online, or both. In this class of publication, issue and article publication dates coincide, so the publication date of each article is the same as the publication date for the issue.
  • Article-based publication is when articles are published individually or in small groups. They may be published in issues or collected in some other way―by volume or by year. In this class of publication, each article has two publication dates: the date (including day, month, and year) on which the article was published and the publication’s broader collection date.

For specific examples of various publishing models and their corresponding date types, see <pub-date>.

Series Articles

Sometimes articles are part of a series. This series may either be a group of articles all in one issue or a series of articles (like a recurring column) spanning issues. Use the <series-title> to identify the title of the series tp whick that article belongs.

Character Encoding and Special Characters

If the XML is encoded in other than UTF-8 (or ASCII, which can pass as a subset of UTF-8), then the character encoding should be declared in the XML declaration in the prolog of each file.

Tag special characters with the Unicode hex number in character entity style (&#x2203;, ∃) or directly in the declared encoding. For accented characters that cannot be represented by a single Unicode value, use the base character and Combining Diacritical Marks (x0300 to x036F) or Combining Diacritical Marks for Symbols (x20D0 to x20E3).

Do not use values from the Private Use Areas: xE000–xF8FF, xF0000–xFFFFD, and x100000–x10FFFD.

Do not use Unicode values designated as control codes. These ranges include but are not limited to x0000–0020, x0080–x009F.

Subjects

PMC uses subjects (under <article-categories>) to sort the issue contents and build the Table of Contents. Subjects can be hierarchical. They may describe the content of the article: Physical Sciences. Or they may give an indication of the type of article: Erratum.

Supplementary Issues

Each named supplement is considered an issue and may have one or many articles. Tag this information in <issue> within the <article-meta>.

See Funding Information for specifics on capturing funding sources, grants, and conflicts of interest.