The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. The advent of high-throughput, cost-effective methods for genotyping and sequencing has provided powerful tools that allow for the generation of the massive amount of genotypic data required to make these analyses possible.
dbGaP provides two levels of access - open and controlled - in order to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information. Summaries of studies and the contents of measured variables as well as original study document text are generally available to the public, while access to individual-level data including phenotypic data tables and genotypes require varying levels of authorization. More complete descriptions of the dbGaP system are available in Pub Med Central and the NCBI Bookshelf.
View Certificate of Confidentiality
The data in dbGaP will be pre-competitive, and will not be protected by intellectual property patents. Investigators who agree to the terms of dbGaP data use may not restrict other investigators' use of primary dbGaP data by filing intellectual property patents on it. However, the use of primary data from dbGaP to develop commercial products and tests to meet public health needs is encouraged.
Submitters who are not Federally-funded and affiliated with an NIH IC will need to work with an NIH DAC so that proposed submission can be reviewed for consistency with appropriate policies to protect the privacy of research participants and confidentiality of their data. Submissions to dbGaP will not be accepted without assurance that the submitting institution approves the submission and has verified that the data submission is consistent with all applicable laws and regulations, as well as institutional policies. Submitters must also identify any Data Use limitations that are specifically set for each individual research participants, (e.g., through their informed consent). Please see NIH data sharing policy website for more details.
Open-access data can be browsed online or downloaded from dbGaP without prior permission or authorization. These data will include, but may not be limited to, the following:
|dbGaP Data Type||Where to Find It|
|Studies||'Study' column when browsing studies|
|Result of a search under the tab 'Studies'|
|Part of the breadcrumb path of a variable or document|
|Study Documents||Link from 'Browse Studies'|
|Link under 'Associated Documents' on study report|
|Result of a search under the tab 'Study Documents'|
|Phenotypic Variables||Link under 'Browse Studies'|
|Link under 'Associated Variables' on study report|
|Result of a search under the tab 'Variables'|
|Genotype-Phenotype Analyses||Link under 'Associated Analyses' on variable report|
|Link under 'Associated Analyses' on study report|
Please note that this is a general description of what is available to open- access users. Data available to open-access users may vary between studies and may also differ from what is described here without notice. You can find more details regarding data access policies for specific studies on the individual study report pages.
Controlled-access data can only be obtained if a user has been authorized by the appropriate Data Access Committee (DAC). Information on requesting controlled data access, is available below. Data available to authorized investigators may include the following:
Since data access policies are determined on a per-study basis, data available to users with controlled access authorization may vary between studies and may also change from what is described here without notice. You can find more details regarding data access policies for a specific study on the study report page along with a link to the appropriate authorizing body.
Access to controlled data in dbGaP will be granted by an NIH Data Access Committee or DAC. Users wishing access to controlled data must submit a Data Use Certification, or DUC, to the appropriate NIH DAC for approval. DAC approval for controlled data access will be dependent upon completion of the DUC, and confirmation that the proposed research use is consistent with patient consent forms and any constraints identified by the institutions that submitted the dataset(s) to dbGaP. Links to a study's DAC will be found on the study report page. Consult this instructional video to see how to make a request.
Submitters of controlled-access data housed in dbGaP may retain the exclusive right to publish an analysis of their submitted data for a specified period of time. Users of controlled-access data should consult the DUC or the study report page to determine the specific publishing exclusivity period for that study.
Access to individual-level data housed in dbGaP is under the jurisdiction of the sponsoring institute. Therefore, any questions regarding access to controlled data should be directed to the DAC for the study in question, and not to NCBI.
Prior to submitting data to dbGaP a study must be registered by a NIH Genomic Program Administrator (GPA). Submitters should consult this high level overview of the study registration and data submission process. In addition to providing individual-level phenotype and genotype data to dbGaP, we also require the submission of sufficient metadata to enable NCBI to provide a browse-able interface for a study.
The following should be included in submissions to dbGaP:
Please note that these are general submission requirements. Since data submission policies are still being developed by participating studies/institutions, submission requirements may vary between studies and may also change from what is described here without notice.
As dbGaP is a NCBI data distribution service, the control and management of the data housed in dbGaP is under the jurisdiction of the sponsoring institute or study; therefore, any questions regarding submission requirements or other data issues should be directed to the DAC for the study in question.
Data Access Committee (DAC): Data Access Committees are established based on programmatic areas of interest as well as technical and ethical expertise. All DACs will operate through common principles and under similar mechanisms to ensure the consistency and transparency of the controlled- data access process.
Data Use Certification (DUC): A Data Use Certification is the application a user submits to a particular study's Data Access Committee (DAC) for consideration for authorized use of controlled dbGaP data. The Data Use Certification should include a list of the controlled data set(s) required by the user and a brief description of the proposed research use of the requested data. The user must also offer the following assurances in the Data Use Certification that:
Finally, the completed DUC must be co-signed by a designated official representing the institution for which the applicant works.
Please note that this is a general description of the DUC. Since data access policies are still being developed by participating studies/institutions, controlled access requirements, and hence, DUC requirements may vary between studies and may also change from what is described here without notice. Additional details regarding controlled access requirements for a specific study will be provided on the study report page.