• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2009; 37(Database issue): D858–D862.
Published online Oct 23, 2008. doi:  10.1093/nar/gkn770
PMCID: PMC2686576

MDPD: an integrated genetic information resource for Parkinson's disease

Abstract

Parkinson's disease (PD) is the second most common neurodegenerative disorder affecting millions of people. Both environmental and genetic factors play important roles in its causation and development. Genetic analysis has shown that over 100 genes are correlated with the etiology and pathology of PD. However, accessing genetic information in a consistent and fruitful way is not an easy task. The Mutation Database for Parkinson's Disease (MDPD) is designed to fulfill the need for information integration so that users can easily retrieve, inspect and enhance their knowledge on PD. The database contains 2391 entries on 202 genes extracted from 576 publications and manually examined by biomedical researchers. Each genetic substitution and the resulting impact are clearly labelled and linked to its primary reference. Every reported gene has a summary page that provides information on the variation impact, mutation type, the studied population, mutation position and reference collection. In addition, MDPD provides a unique functionality for users to compare the differences on the type of mutations among ethnic groups. As such, we hope that MDPD will serve as a valuable tool to bridge the gap between genetic analysis and clinical practice. MDPD is publicly accessible at http://datam.i2r.a-star.edu.sg/mdpd/.

INTRODUCTION

Parkinson's disease (PD) is a progressive neurological disease that affects millions of people world wide, with ~1.8% of the population at age >65 years (1). The death of dopaminergic neurons in the substantia nigra is a pathological feature of the disease. PD is a complicate disease that both environmental and genetic factors play various roles in its causation and development. The genetic evidence is that first degree relatives of the familial PD patients are more vulnerable to PD than the general population, especially significant for early-onset PD (2,3). However, reliable biomarkers or tests to facilitate early and accurate diagnosis are currently not available (4). Genetic testing, if available, could complement clinical diagnostic criteria. Several genetic mutations and variants (point substitution, deletion, insertion or even polymorphisms) have been positively associated with PD (5,6). PARK2, LRRK2, PINK1, SNCA, UCHL1 and PARK7 are the most frequently studied genes (7). Some of the genetic variants are considered as causal factors while others may cause neuronal dysfunction indirectly. For example, mutations in the 5′-UTR of NR4A2 have significantly decreased the expression of NR4A2 gene and its downstream gene tyrosine hydroxylase (8). At present, over 100 genes have been reported to associate with PD in various forms. Some of these genetic variants are widespread in patients while others are ethnic-related risk factor. LRRK2 G2019S is a common pathogenic mutation found in 5–7% of familial PD and 1–2% of sporadic PD worldwide (9,10). At the same time, this mutation shows specific ethnic prevalence with exceptionally high frequency in North African Arabs (37–42% in familial and 41% in sporadic PD) (11) but is rare in Chinese (12,13). Another population-specific example is the GBA gene mutation. Both R496H and c.84insGGfs are found in patients from Ashkenazi Jews (14,15) and have not been reported in other ethnic groups. Genetic screening and treatment strategies could be improved if genetic features have been well characterized. Further elucidation of such information may also lead to new developments in diagnostic methods and early treatment.

Due to advances in genetic technology and the polygenic nature of PD, genetic information of the disease has accumulated rapidly in the past decade. The amount of data is a daunting challenge for individual researchers in searching and examining desirable information. For instance, there are over 1300 reports in the PubMed database if ‘Parkinson's disease’ AND ‘mutation’ are the keywords of search. The combination of gene names, official symbols and aliases in literature reports further amplifies the difficulty of retrieving relevant information. In addition, although information of gene function, gene sequences, protein structure and mutation reports are searchable, their availability are scattered across various databases such as Entrez Gene (16), GenBank (17), Swiss-Prot (18), OMIM (19) and PubMed (http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed). The Human Gene Mutation Database [HGMD; (20)], which covers mutation information for over 2800 human genes, is the most comprehensive but its public free-access version is limited to less-updated information. Moreover, it is not built specifically for PD. If one searches the HGMD for PD, one may only get the mutation information for less than 20 genes. Thus, to obtain desired information, researchers have to perform at least two time-consuming tasks: (i) examine large volume of data; and (ii) query a number of different databases. There are a few databases that specialize in PD mutation information, but they are limited in either coverage or functionality. For example, the LOVD Parkinson's disease mutation database developed by the Parkinson's Institute in Leiden University (http://www.grenada.lumc.nl/LOVD2/TPI/home.php) briefly covers six genes with total of 71 variants. The PDGENE (http://www.pdgene.org/) is more comprehensive and contains the most updated list of PD candidate genes with emphasis in genetic association studies. However, it lacks the functionality for ethnic comparison and it does not provide summary and statistic reports. To address some of the current limitations of PD databases and to facilitate effective and comprehensive information acquisition, we have developed the Mutation Database for PD (MDPD). Through a single online location with user-friendly interfaces, researchers are able to retrieve the latest information on PD, covering genetic variation (mutation and polymorphism), population studies, literature evidence and gene sequences. Various cross-references to public databases are incorporated to assist further exploration and evaluation.

DATA SOURCE AND CLASSIFICATION

MDPD is a specialist database that presents the human mutation information relevant to PD as an online resource. Data from animal models and cell lines are not included. Mutation evidences are extracted from PubMed database (from 1995 to 10 June 2008) based on keyword search [‘Parkinson's disease/genetics’ (MeSH)]. Sequences, variants and general gene information are obtained from other databanks as mentioned above (the latest update of MDPD was 4 September 2008). Every mutation entry has been manually examined by a researcher specialized in genetics. Important information, such as size of study sample, control group, population, age of onset, type of PD (sporadic versus familial), mutation outcome, its reference sequence and possible impact of the variant, have been manually extracted from the data source.

In addition to mutation, single nucleotide polymorphism (SNP) shows the linkage between genotype and the susceptibility of disease (21). SNPs can fall within the coding region and non-coding region in most cases. In non-coding region, a SNP does not change protein sequence but may still have consequences on the risk of disease by affecting the splicing site or transcription factor binding site. Some SNPs have protective effects while others may increase the risk of PD, or show no significant outcomes and need further research. As we believe SNP information plays an important role in the design of genetic tests and also in understanding the mechanisms of the disease, they are included in our database. On the other hand, variations that lack precise genetic locations, such as variants in approximate chromosome regions or markers in inter-gene regions, are not included in MDPD. We believe genetic location is very critical in mutation study and such approximate results need to be resolved before inclusion in the database.

Case–control studies and genome-wide association studies (GWAS) have shown that PD is a polygenic disease that multiple gene mutations are responsible for the malfunction (7,22). Even monogenetic causes of PD could have resulted from a variety of mutations. The mutations in a particular gene can yield diverse consequences. A single nucleic acid substitution may lead to a multitude of possible outcomes, including amino acid exchange (missense mutation), no amino acid change (silent mutation), peptide truncation (nonsense mutation), absence of the protein (deletion) and even the production of a different protein (frame shift or insertion). Other types of mutation, such as duplication and compound mutation (more than one type of mutation) are also found in PD. As such, we categorize the genetic variations into the following: missense mutation, silent mutation, nonsense mutation, compound mutation, deletion, insertion, duplication, triplication, frame shift, short repeat and SNP. All of these classifications are based on the primary reference as the trusted data resource. We believe that such categorization can help the user to examine and compare the variations more efficiently. Another classification in MDPD is the variations’ impact to reflect the outcomes of the variations. Divergent impacts are expected due to differences in the method used, sample size and the studied population. One variation may have more than one ‘Impact’. For example, the V380L substitution in PARK2 gene is marked ‘Associated’ since it has been found in the sample of early onset PD patient (23). It is also tagged as ‘Negative Result’ in other occasion due to the lack of significant association with patients (24). Such inevitable discrepancy of impact reflects the complicated nature of PD in which multiple genetic factors play various roles and that interactions between genetic and environmental factors may influence the end result dynamically. According to the found effects of each variation, we classified its ‘Impact’ as ‘protective factor’, ‘risk factor’, ‘associated’, ‘questionable’ and ‘negative result’. For example, if ‘protective’ or ‘risk’ effect has been mentioned in the primary reference of a variance, we assign its impact as ‘protective factor’ or ‘risk factor’. ‘Associated’ is allocated to variances showing significant difference between patients and controls. If a variance has been reported in both patients and controls without statistic difference, we label it as ‘questionable’. The classification aims to help user recapitulate information according to comparison outcome.

DATABASE STRUCTURE AND USAGE

MDPD is designed to be a publicly accessible online resource with user-friendly interface. MySQL, a reliable and proven relational database, is used to organize information. A web-based user interface to the database is provided via an Apache 2.0 HTTP server with PHP scripting engine.

MDPD contains the following functional pages: Browser, Search, Compare, Statistics and Variation Report. Three searching options are available in the search page: a variation search can be based on the gene name, gene ID or SWISS-PROT accession number. In accordance to the recommendations from the HUGO Nomenclature Committee, official gene symbols are used in each record, but MDPD also provides all known aliases in the result pages for easy reference and retrieval. For example, both PARK1 and SNCA are valid search terms in the database. Searching mutation information based on geographic region or the author's name in reference collection are two other helpful options. Searching for mutation information based on a geographic region enables a user to quickly know what genes have been studied in the region (or population) and the number of related publications may indicate regional research efforts. On the other hand, searching for mutation information based on the reference author's name can help researchers in the community easily identify various leading authors' research interests and their collaborators. Hyperlinks to each reference and gene symbol are provided in the report page.

Various useful features have been built into MDPD and span several web pages for ease of use and navigation. The complete list of web pages and their corresponding features are listed in Table 1. We highlight the ‘Variation Report’ page here as we believe that it is a useful feature of MDPD. In addition to providing the generic gene information, hyperlinks to Entrez, Swissport and OMIM, the ‘Variation Report’ page also covers detailed information of ‘Variation Impact’, ‘Variation Type’, ‘Studied countries’, ‘Variation sequence’ (in both amino acid and nucleic acid levels) and ‘PubMed collection’. For ‘Variation Impact’ and ‘Variation Type’, we do not attempt to modify the findings from the primary reference, as we have mentioned previously. Users are advised to judge the classification based on his/her knowledge and the up-to-date research. To understand the impact of a genetic substitution, the user should be aware of any conflicting results from divergent ethnic groups or from specific subsets of patients. Such information is readily accessible in MDPD. ‘Studied countries’ provides information about the geographic regions and ethnic groups of patients studied. This information is also valuable for refining population screening target to avoid wasting resource. Researchers may also use it to identify key genetic factors, namely those for which impactful variations have been reported in many geographic regions (or ethnic groups). For example, MDPD includes 74 literature reports that described 258 different variants of PARK2 from 33 geographic regions. Deletion, duplication, triplication, insertion, missense mutation, nonsense mutation, silent mutation and compound mutations were all found in this gene. Among them, missense mutation and deletion are the most frequently conveyed with 140 and 86 records, respectively. Based on such information, a user can make reasonable inference about the importance of PARK2 instantly. To further confirm his/her speculation, the user can examine the primary reference through ‘PubMed collection’ and investigate the mutation ‘hot sport’ through ‘Variation sequence’.

Table 1.
Functional summary of MDPD

Allowing users to compare mutations between ethnic groups is another helpful element of MDPD. Users can readily obtain a list of mutation genes of an interested ethnic group from ‘Search’. Comparing the mutations between two ethnic groups of interest can also be done easily in ‘Comparison’. Currently, more than 2300 entries covering 202 human genes are stored in MDPD. Through systematic data mining on the integrated information, MDPD offers researchers new means for inspecting and making sense of the mutation evidences in published findings.

MDPD is publically accessible at http://datam.i2r.a-star.edu.sg/mdpd/.

DISCUSSION

Gene mutations and variations have become the focus of PD research in the last decade. Linkage mapping, case–control study, pedigree analysis and GWAS are powerful approaches to identify and correlate genetic contribution to PD. However, how various mutations and variants affect the disease and shape its development remains unclear. In addition, each mentioned approach has certain limitations. Various research biases and errors contribute to fewer reproducible association findings and diminish the assessing power between genetic variants and the risk of common disease (25,26). To partially overcome the limitation of accumulated imperfect data, we intend to include all published literatures with precise number of sample size (both case and control), variation position, variation impact and geographic location. We expect multiple independent genetic studies could yield meaningful results. At the same time, we remind users to be caution in interpretation of deductions from published data.

Many genes and multiple variants in a gene are registered positive correlation with PD. Possessing the available information is an essential first step to further understand and eventually to elaborate effective strategies for diagnosis and treatment. MDPD is an integrated information system that aims to facilitate PD research. It contains records for over 100 PD-associated genes verified from various genetic tests. Among them, the top 10 most reported mutation genes are LRRK2, PARK2, SNCA, CYP2D6, MAPT, PINK1, UCHL1, PARK7, MAOB and APOE (Table 2). The data in MDPD also reveal that current research has been focused in certain key genetic targets—the top 10 genes accounted for 1053 entries from 326 publications, which makes up to 44% of the total records and more than half of the literature reports (57%) in MDPD. At the same time, we realize that 202 genes are somewhat under the current research radar.

Table 2.
Top 10 genes with the most published reference in MDPD

Another interesting outcome from MDPD is the high frequency of ‘negative result’ in the variation reports. For example, >30% of records are labelled ‘negative’ in 9 out of the top 10 most reported genes (the exception is PARK2 with 15.2% negative reports). There are at least three implications: (i) many variants have low incident rates in PD patients and may not be a good screening target for survey; (ii) these variants may have insignificant impact to PD; and (iii) discrepancy may be caused by ethnic-related genetic variance, sample size, methods used or research errors. The variants with the most positive reports could be valuable genetic targets and further studies on them may warrant potential breakthrough in diagnosis and treatment.

PD is a multi-factorial disease for which the environmental factors and genetic elements are likely to be equally important. Studies have showed that lifestyles (such as smoking and coffee consumption), pesticides and metal exposure, and even well water drinking are factors that influence the risk of disease in both sporadic and early-onset PD (27,28,29). The involvement of multiple genes, the high incident rate in aging population and high percentage of sporadic cases suggest the possibility of multiple interactions and connections in etiology of PD. In many cases, it is difficult to isolate the environmental factors and to specify the short- and long-term exposure. As such, we did not include environmental study information in MDPD, but the user should be mindful about the potential interactions with environmental factors.

FUTURE WORK

Discovering the relationships between the various genetic factors is an essential step toward understanding the mechanism of complex diseases such as PD. From MDPD, we know at least 202 genes have been examined for their possible involvement in PD. Our future work would be to develop an information system that can assess the impact of disease-causing mutations in terms of the functional changes of their encoded proteins and the interactions. Further integrated information, such as multiple level protein–protein interactions, and the role of the genetic variants in various neurodegenerative pathways will hopefully provide insights that will lead to novel treatments for PD.

FUNDING

Institute for Infocomm Research (I2R); Agency for Science, Technology and Research (A*STAR), Singapore.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENT

We thank Kar Leong Tew for amendment of the article.

REFERENCES

1. de Rijk MC, Launer LJ, Berger K, Breteler MMB, Dartigues J-F, Baldereschi M, Fratiglioni L, Lobo A, Martinez-Lage J, Trenkwalder C, et al. Prevalence of Parkinson. Neurology. 2000;54:21–23.
2. Clark LN, Ross BM, Wang Y, Mejia-Santana H, Harris J, Louis ED, Cote LJ, Andrews H, Fahn S, Waters C, et al. Mutations in the glucocerebrosidase gene are associated with early-onset Parkinson disease. Neurology. 2007;69:1270–1277. [PMC free article] [PubMed]
3. Marder K, Levy G, Louis ED, Mejia-Santana H, Cote L, Andrews H, Harris J, Waters C, Ford B, Frucht S, et al. Familial aggregation of early- and late-onset Parkinson's disease. Ann Neurol. 2003;54:507–513. [PubMed]
4. Chan DK. Parkinson disease and its differentials. Diagnoses made easy. Aust. Fam. Physician. 2001;30:1053–1056. [PubMed]
5. Dodson MW, Guo M. Pink1, Parkin, DJ-1 and mitochondrial dysfunction in Parkinson. Curr. Opin. Neurobiol. 2007;17:331–337. [PubMed]
6. Tan EK, Skipper LM. Pathogenic mutations in Parkinson disease. Hum. Mutat. 2007;28:641–653. [PubMed]
7. Farrer MJ. Genetics of Parkinson disease: paradigm shifts and future prospects. Nat. Rev. Genet. 2006;7:306–318. [PubMed]
8. Le WD, Xu1 P, Jankovic1 J, Jiang H, Appel SH, Smith RG, Vassilatis DK. Mutations in NR4A2 associated with familial Parkinson disease. Nat. Genet. 2003;33:85–89. [PubMed]
9. Gilks WP, Abou-Sleiman PM, Gandhi S, Jain S, Singleton A, Lees AJ, Shaw K, Bhatia KP, Bonifati V, Quinn NP, et al. A common LRRK2 mutation in idiopathic Parkinson's disease. Lancet. 2005;365:415–416. [PubMed]
10. Kachergus J, Mata IF, Hulihan M, Taylor JP, Lincoln S, Aasly J, Gibson JM, Ross OA, Lynch T, Wiley J, et al. Identification of a novel LRRK2 mutation linked to autosomal dominant parkinsonism: evidence of a common founder across European populations. Am. J. Hum. Genet. 2005;76:672–680. [PMC free article] [PubMed]
11. Benamer HT, de Silva R, Siddiqui KA, Grosset DG. Parkinson. Mov. Disord. 2008;23:1205–1210. [PubMed]
12. Lu CS, Simons EJ, Wu-Chou YH, Fonzo AD, Chang HC, Chen RS, Weng YH, Rohé CF, Breedveld GJ, Hattori N, et al. The LRRK2 I2012T, G2019S, and I2020T mutations are rare in Taiwanese patients with sporadic Parkinson's disease. Parkinsonism Relat. Disord. 2005;11:521–522. [PubMed]
13. Skipper L, Li Y, Bonnard C, Pavanni R, Yih Y, Chua E, Sung WK, Tan L, Wong MC, Tan EK, et al. Comprehensive evaluation of common genetic variation within LRRK2 reveals evidence for association with sporadic Parkinson's disease. Hum. Mol. Genet. 2005;14:3549–3556. [PubMed]
14. Aharon-Peretz J, Rosenbaum H, Gershoni-Baruch R. Mutations in the glucocerebrosidase gene and Parkinson. N. Engl. J. Med. 2004;351:1972–1927. [PubMed]
15. Gan-Or Z, Giladi N, Rozovski U, Shifrin C, Rosner S, Gurevich T, Bar-Shira A, Orr-Urtreger A. Genotype-phenotype correlations between GBA mutations and Parkinson disease risk and onset. Neurology. 2008;70:2277–2783. [PubMed]
16. Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2007;35:26–31. [PMC free article] [PubMed]
17. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;36:D25–D30. [PMC free article] [PubMed]
18. Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A. UniProtKB/Swiss-Prot: the manually annotated section of the UniProt KnowledgeBase. Methods Mol. Biol. 2007;406:89–112. [PubMed]
19. McKusick VA. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 2007;80:588–604. [PMC free article] [PubMed]
20. Stenson PD, Ball E, Howells K, Phillips A, Mort M, Cooper DN. Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 2008;45:124–126. [PubMed]
21. Naccarati A, Pardini B, Hemminki K, Vodicka P. Sporadic colorectal cancer and individual susceptibility: a review of the association studies investigating the role of DNA repair genetic polymorphisms. Mutat. Res. 2007;635:118–145. [PubMed]
22. Maraganore DM, de Andrade M, Lesnick TG, Strain KJ, Farrer MJ, Rocca WA, Pant PV, Frazer KA, Cox DR, Ballinger DG. High-resolution whole-genome association study of Parkinson disease. Am. J. Hum. Genet. 2005;77:685–693. [PMC free article] [PubMed]
23. West A, Periquet M, Lincoln S, Lücking CB, Nicholl D, Bonifati V, Rawal N, Gasser T, Lohmann E, Deleuze JF, et al. Complex relationship between Parkin mutations and Parkinson disease. Am. J. Med. Genet. 2002;114:584–591. [PubMed]
24. Oliveira SA, Scott WK, Nance MA, Watts RL, Hubble JP, Koller WC, Lyons KE, Pahwa R, Stern MB, Hiner BC, et al. Association study of Parkin gene polymorphisms with idiopathic Parkinson disease. Arch. Neurol. 2003;60:975–980. [PubMed]
25. Ioannidis JP, Gwinn M, Little J, Higgins JP, Bernstein JL, Boffetta P, Bondy M, Bray MS, Brenchley PE, Buffler PA, et al. A road map for efficient and reliable human genome epidemiology. Nat. Genet. 2006;38:3–5. [PubMed]
26. Yesupriya A, Evangelou E, Kavvoura FK, Patsopoulos NA, Clyne M, Walsh MC, Lin BK, Yu W, Gwinn M, Ioannidis JP, et al. Reporting of human genome epidemiology (HuGE) association studies: an empirical assessment. BMC Med. Res. Methodol. 2008;8:31. [PMC free article] [PubMed]
27. McCulloch CC, Kay DM, Factor SA, Samii A, Nutt JG, Higgins DS, Griffith A, Roberts JW, Leis BC, Montimurro JS, et al. Exploring gene-environment interactions in Parkinson. Hum. Genet. 2008;123:257–265. [PubMed]
28. Elbaz A, Tranchant C. Epidemiologic studies of environmental exposures in Parkinson. J. Neurol. Sci. 2007;262:37–44. [PubMed]
29. Aguiar Pde C, Lessa PS, Godeiro C., Jr., Barsottini O, Felício AC, Borges V, Silva SM, Saba RA, Ferraz HB, Moreira-Filho CA. Genetic and environmental findings in early-onset Parkinson. Mov. Disord. 2008;23:228–233. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...