Author Manuscript Collection

The PMC Author Manuscript Collection (“Collection”) consists of articles in author manuscript form that have been made available in PMC in compliance with the NIH Public Access Policy or similar policies of other funders. The text of manuscripts in the Collection may be downloaded in XML and plain text formats. These files are available for text mining. They may also be used consistent with the principles of applicable copyright law. The Collection encompasses all NIH manuscripts posted to PMC since July 2008.

Search Filters

Find all Author Manuscripts in the Collection:

These search filters limit your search by publication date to find only author manuscripts that are included in the Author Manuscript Collection by using: AND ("2008/07/01"[PubDate] : "3000/12/31"[PubDate]) - for PMC and AND ("2008/07/01"[Date - Publication] : "3000/12/31"[Date - Publication]) - for PubMed.

Download Methods

The Author Manuscript Collection is available for download via the FTP service and BioC API.

Download via FTP Service

The files can be accessed using PMC’s FTP service. The URL of the Collection on the FTP site is ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/manuscript/.

The Collection files have been packaged based on PMCID. This means that an author manuscript that has a PMCID of PMC3947720 would be packaged in the file PMC003XXXXXX.xml.tar.gz. Note that these files are quite large (up to 4 GB).

The files that contain the XML of all of the articles are named as follows:

The plain text files containing the extracted full text are:

These files are updated twice a week, Monday and Thursday.

Suggested FTP client configuration

After a series of experiments using ftp clients with NCBI's FTP server, we've found that the configuration of ftp clients can seriously affect performance. NCBI recommends setting the TCP buffer size to 32Mb. For more information, please see https://ftp.ncbi.nlm.nih.gov/README.ftp.

Support Center

Last updated: Fri, 13 Mar 2020