nih.gov

Please note that current APIs will be replaced in PubTator3 (see details here) on May 1, 2024.

Export Annotations

Export our annotated publications in batches of up to 100 in GET or 1000 in POST, in BioC, pubtator or JSON formats.

In order not to overload the PubTator server, we require that users post no more than three requests per second.

To programmatically retrieve text-mined results in PubTator, one can use web queries as follows:

https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/export/[Format]?[Type]=[Identifiers]&concepts=[Bioconcepts]

Parameter Description
Format
  • pubtator (pubtator)
  • biocxml (BioC-XML)
  • biocjson (BioC-JSON)
Click here to see our format descriptions
Type pmids (for abstracts) or pmcids (for full-texts)
Identifiers Single pmid (or pmcid) or list (e.g. pmids=23819905,23819906)

Please note that pmcids can only be used to retrieve publications in biocxml or biocjson formats

Bioconcept

Optional comma-delimited list of the bioconcept types to include in the results, one or more of: gene, disease, chemical, species, mutation and/or cellline. If this parameter is not present, then results will contain all six bioconcepts.

This parameter is only compatible with pubtator and biocxml formats.

Setting concepts=none will hide all annotations.

Some examples

Process Raw Text

Run our state-of-the-art NER tools on your own texts in BioC, pubtator or JSON formats.

First submit a request with your file to annotate :

curl -X POST --data-binary @[Inputfile] https://www.ncbi.nlm.nih.gov/research/pubtator-api/annotations/annotate/submit/[Bioconcept]

A session number will be returned to you in the format of XXXX-XXXX-XXXX-XXXX.

For example :

  • curl -X POST --data-binary @examples/ex.PubTator https://www.ncbi.nlm.nih.gov/research/pubtator-api/annotations/annotate/submit/Gene

Retrieve the annotated file, by submitting the session ID for your job :

curl https://www.ncbi.nlm.nih.gov/research/pubtator-api/annotations/annotate/retrieve/[SessionNumber]

[SessionNumber] is the number previoulsy returned by your submitted request. e.g., 1441-7295-7121-9907

When submitting this request, the system will return a warning message : [Warning] : The Result is not ready" with a 404 (Not Found) HTTP status code before result ready.

For example :

  • curl https://www.ncbi.nlm.nih.gov/research/pubtator-api/annotations/annotate/retrieve/1441-7295-7121-9907
Click here to see our format descriptions.
Download Code samples