PubMed

 

 

Entrez

PubMed

Nucleotide

Protein

Genome

Structure

OMIM

PMC

Journals

Books

NLM Standard Publisher Data Format

Last Updated: February 14, 2008

This is the standard data format that publishers are required to use in submitting citation data to NLM for processing into PubMed. This is a tagged format, each part of a citation is preceded by an opening <Tag> string and followed by a closing </Tag> string, where Tag is some appropriate label. The XML tags are listed below followed by several examples. Additional information on XML tagged format is available at the following Web sites: Xmlu.com, XML.com, and OASIS. For further assistance, please send an e-mail to publisher@ncbi.nlm.nih.gov.


Note:

  • This format is required for submission of citation and abstract data to NLM. Other formats are not acceptable. Only journals that are already approved for inclusion in PubMed should be submitted. See our Journal Submission FAQs for more information about journals indexed for MEDLINE.
  • If you wish to have non-ASCII characters in your citations you must use standard SGML entity names. It is not possible to keep a separate translation table for each publisher, given the number of possible non-ASCII characters.
  • Links to your Web site, if available, may be submitted using LinkOut.

Return to Information for Publishers re: XML-Tagged Data for additional publisher information.


The XML Tags

Data Tags (R = Required, O = Optional).  Tags are case sensitive. Required tags must be included; optional tags must be included only if the data requested appears in the print or electronic issue.

  • File Header (R) The header information should include: <!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.0//EN" "http://www.ncbi.nlm.nih.gov:80/entrez/query/static/PubMed.dtd" >
  • ArticleSet (R) An entire submission of the set of articles for each issue. Each issue of a given journal must be enclosed in these tags.
    • Article (R) Each article must be enclosed in these tags. Do not submit data for the following items: book reviews, advertisements, announcements, erratum notices, software and equipment reviews, and papers to appear in forthcoming issues. In addition, do not submit individual citations for abstracts or shortened versions of presentations or papers from conference proceedings unless the full-text of the article is published. In most instances, NLM does create a single citation to cover a group of meeting abstracts or shortened versions of conference proceedings; for example, see PMIDs 12526142, 12516608, and 12516600.
      • Journal (R) Bibliographic information about the journal issue contained in the file.
        • PublisherName (R) The publisher name.
        • JournalTitle (R) The standard MEDLINE abbreviation for the journal title. If you do not know the abbreviation, see the Journals Database.
        • Issn (R) The ISSN or ESSN of the journal.
        • Volume (R) The volume name or number of the journal, including any supplement information, e.g., 12 Suppl 2, 514 ( Pt 2), 19 Suppl A, etc.
        • Issue (O) The issue number, e.g., 6 Pt 2, 7-8, etc.
        • PubDate (R) The publication date information must be enclosed in the following date tags. NOTE: Print publication dates should accurately reflect the date format on the cover of the journal and online publication dates should accurately reflect the date format on the journal Web site. The PubDate tag includes the PubStatus attribute, which may contain only one of the following values:

ppublish - published in print (default value)
epublish - electronically published only, never published in print
aheadofprint - electronically published, but followed by print

The latest (current) article status with the date of this status must be submitted in PubDate within Journal. Which value you choose to use depends on whether the article is a print, electronic or ahead of print article. See our page entitled Properly Coding Print, Electronic and Ahead of Print Articles for more details.

          • Year (R) The 4-digit year of publication. <Year> can only contain 4-digit ranging between 1966 and 2010.
          • Month (O) The month of publication. <Month> can only contain the numbers 1-12, the month (in English) or the first three letters of the English months. NOTE: The only PubStatus attribute that allows for a dual month in <Month> is ppublish.
          • Season (O) The season of publication (do not use if a month is available).
          • Day (O) The day of publication. <Day> can only contain the numbers 1-31.
        • Replaces (O) The identifier of the article that this one replaces. Do not use this tag for new articles. The <Replaces> tag can be used to update an Ahead of Print citation, or to correct an error in citations with [PubMed - as supplied by publisher] status. The Replaces tag includes the IdType attribute, which may contain only one of the following values:

pmid - PubMed ID (PMID) (default value)
pii - controlled publisher identifier
doi - Digital Object Identifier

See our Instructions for Replacement Files for more details.

        • ArticleTitle (O) The article title, in English, if published in English or translated to English in the journal. Do not submit this tag if the published title is not in English or is not translated to English in the journal. See VernacularTitle.
        • VernacularTitle (O) The article title in the original language, if not in English. Used only for Latin based alphabets. See our Instructions for Non-English Languages.
        • FirstPage (R/O) This tag is required if and only if the ELocationID tag is not present and filled. This tag should contain the first page on which the article appears. If an article appears in more than one language with consecutive pagination, pagination should be inclusive of all texts.
        • LastPage (O) The last page on which the article appears. If an article appears on one page, this is the same as FirstPage. If an article appears on non-consecutive pages this tag should still contain the last page on which the article appears. If an article appears in more than one language in the same issue, pagination should be inclusive of all the texts.
        • Language (O) The language the article is in. This should be chosen from the language codes in ISO 639. If unspecified, EN (English) is assumed. If an article appears in more than one language in the same issue, submit multiple language tags listed in the order in which the texts appear in the journal, not in the alphabetical order of the symbols. If one of the languages is English, enter EN first. NOTE: NLM requires transliteration of Cyrillic letters as outlined here. See our Instructions for Non-English Languages.
        • AuthorList (O) The author information must be enclosed in these tags. If a given article has one or more authors, this tag must be submitted. Authors should be listed in the same order as in the printed article, and author name format should accurately reflect the printed article. Do not use all upper case letters.
          • Author (R) Information about a single Author must begin with this tag.
            • FirstName (O) The Author's full first name is required if it appears in the print or online version of the journal. First initial is acceptable if full name is not available. To represent a Single Personal Author Name use the FirstName EmptyYN attribute value "Y".
            • MiddleName (O) The Author's full middle name(s), or initial(s) if the full name(s) not available.
            • LastName (O) The Author's last name.
            • Suffix (O) The Author's suffix, if any, e.g. "Jr", "Sr", "II", "IV". Do not include honorific titles, e.g. "M.D.", "Ph.D.".
            • CollectiveName (O) The name of the authoring committee or organization. CollectiveName can be used instead of or in addition to a personal name.
            • Affiliation (O) The institution(s) that the Author is affiliated with. If a given article contains affiliations, this tag must be submitted. Please submit the affiliation for the first author only. If there are multiple affiliations and it cannot be determined which is the first author's affiliation, use the first affiliation. The data should be provided as a simple string within the <Affiliation> </Affiliation> tags. The body of the affiliation should include the following data, if applicable, separated by commas: division of the institution, institution name, city, state, postal or zip code, country (use USA for the United States) followed by a period, then a space followed by the e-mail address which itself should not end in a period. Do not include the word 'e-mail'.
      • PublicationType (O) Used to identify the type of article. The only available PublicationTypes are NEWS, LETTER or EDITORIAL. The default value, JOURNAL ARTICLE, will be added to citations if this tag is left blank or an invalid PublicationType is used.
      • ArticleIdList (O) - The list of Article Identifiers.
        • ArticleId (R) - The Article Identifier. The ArticleId tag includes the IdType attribute, which may include only one of the following values for each identifier:

pii - controlled publisher identifier (default value)
doi - Digital Object Identifier

See our Journal Submission FAQs for more information about Article Identifiers.


      • History (O) The history of a publication (e.g., received, accepted, revised, published, ahead of print). Publishers may supply PubDates and PubStatus in History using the PubDate format detailed above. History PubDate is optional; however the PubDate within Journal, outlined above, is required. The History PubDate tag includes the PubStatus attribute, which may contain only one of the following values for each date in the publication history:

received  - date manuscript received for review
accepted - accepted for publication
revised - article revised by publisher or author
aheadofprint - published electronically
*The <History> tag plays an important part in the process of submitting Replacement Files for Ahead of Print citations.


      • Abstract (O) The article's abstract. Include all text as a single ASCII paragraph. Headings of structured abstracts; e.g., OBJECTIVE, DESIGN, etc. should be capitalized and end with a colon, followed by a space before the text. Our DTD does not support text formatting tags such as line breaks, italics, or boldface; the only acceptable formatting tags are for superscript (<sup></sup>) or subscript (<inf></inf>). Do not include KEYWORDS or bibliographic citations in the Abstract tag.
      • CopyrightInformation (O) The Copyright information associated with this article.

XML File Validator

The PubMed Citation File Validator is available at http://www.ncbi.nlm.nih.gov/entrez/publisher/citvalidator.cgi. Use this utility to validate your citation files against the NCBI PubMed DTD before submitting them to NCBI.

Special Characters

Characters not in the standard ASCII character set must be represented using standard SGML entity codes. For example, use "&ccedil;" to represent c, cedilla, "&rsquo;" for a right single-quote, <sup> for superscript, and <inf> for inferior or subscript. Where they occur within the text of any tag, the following symbols must be represented by entities: & (ampersand), < (less than), > (greater than). Where these three occur in tag names or entities, simply use the ASCII characters. For example:

Entities:
&uuml; NOT &amp;uuml;
&apos; NOT &amp;apos;

Tag Names:
<Month> NOT &lt;Month&gt;

Text:
[P &lt; 0.01] NOT [P < 0.01]
 

XML File Examples

Standard File Example - A typical file submitted to PubMed.

Ahead of Print Example - A file sent "Ahead of Print", and the Replacement File that follows.
 

Subset of Language Codes

The following is a subset of the ISO 639 standard for language codes. NOTE: NLM requires transliteration of Cyrillic letters as outlined here.
 
 
 

CODE

LANGUAGE

EN

English 

AF 

Afrikaans

SQ 

Albanian

AM 

Amharic 

AR 

Arabic 

AZ 

Azerbaijani 

HY 

Armenian

BN 

Bengali

BS 

Bosnian

BG 

Bulgarian

CA 

Catalan

ZH 

Chinese

HR

Croatian

CS 

Czech

DA

Danish

NL

Dutch

EO 

Esperanto

ET 

Estonian

FI 

Finnish 

FR 

French 

GD 

Scottish Gaelic 

KA 

Georgian 

DE 

German 

EL 

Greek, Modern 

HE 

Hebrew 

HU 

Hungarian 

HI 

Hindi 

IS 

Icelandic 

ID 

Indonesian 

IT

Italian 

JA 

Japanese 

RW 

Kinyarwanda 

KO 

Korean 

LA 

Latin 

LV 

Latvian 

LT 

Lithuanian 

MK 

Macedonian 

ML 

Malayalam 

MI 

Maori 

MS 

Malay 

MU 

Multilingual 

NO 

Norwegian 

FA 

Persian 

PL 

Polish 

PT 

Portuguese 

PS 

Pushto 

RO 

Romanian 

RU 

Russian 

SA

Sanskrit 

SR 

Serbo-Croatian, Cyrillic 

SR 

Serbo-Croatian, Roman 

SK 

Slovak 

SL 

Slovene 

ES 

Spanish 

SV

Swedish 

TH 

Thai 

TR 

Turkish 

UK 

Ukrainian 

UR 

Urdu 

VI 

Vietnamese 

CY 

Welsh