![]() | ![]() |
Formats:
|
||||||||||||||||||||
Copyright © 2008 The Author(s) Gene–disease relationship discovery based on model-driven data integration and database view definition 1Laboratory for Human Genetics, Nancy Medical Faculty, rue du Morvan, 54500 Vandoeuvre-les-Nancy cedex and 2LORIA UMR7503, CNRS, INRIA, Nancy-Université, BP239, 54506 Vandoeuvre-les-Nancy cedex, France Associate Editor: Alex Bateman *To whom correspondence should be addressed. Received July 1, 2008; Revised November 20, 2008; Accepted November 21, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Motivation: Computational methods are widely used to discover gene–disease relationships hidden in vast masses of available genomic and post-genomic data. In most current methods, a similarity measure is calculated between gene annotations and known disease genes or disease descriptions. However, more explicit gene–disease relationships are required for better insights into the molecular bases of diseases, especially for complex multi-gene diseases. Results: Explicit relationships between genes and diseases are formulated as candidate gene definitions that may include intermediary genes, e.g. orthologous or interacting genes. These definitions guide data modelling in our database approach for gene–disease relationship discovery and are expressed as views which ultimately lead to the retrieval of documented sets of candidate genes. A system called ACGR (Approach for Candidate Gene Retrieval) has been implemented and tested with three case studies including a rare orphan gene disease. Availability: The ACGR sources are freely available at http://bioinfo.loria.fr/projects/acgr/acgr-software/. See especially the file ‘disease_description’ and the folders ‘Xcollect_scenarios’ and ‘ACGR_views’. Contact: devignes/at/loria.fr Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION Understanding the molecular basis of a disease ultimately means correlating disease symptoms with altered gene function(s) thus highlighting gene–disease relationships. Identifying the genes responsible for human diseases is a first step towards this goal. More than 6100 disease phenotypes are described in the OMIM (Online Mendelian Inheritance in Man) database (DB). Among these phenotypes, more than 2400 have at least one known molecular basis (entries prefixed with #). Thus, about 3700 disease phenotypes described in the OMIM DB are not yet associated with any responsible gene. These disease phenotypes are particularly challenging since they include rare syndromes for which limited experimental data are available and complex multi-genic disorders involving various causative and susceptibility genes (Botstein and Risch, 2003). Integrative genomics approaches are becoming indispensable tools for discovering new gene–disease relationships. These approaches rely on efficient exploitation of functional genomics data sources (Giallourakis et al., 2005) and take advantage of numerous computer-based systems that have been developed in the last 5 years. These systems can be classified into three main groups. First, generalist systems predict disease genes based on their properties or interactions (Adie et al., 2005; Calvo et al., 2007; Lopez-Bigas and Ouzounis, 2004; Lopez-Bigas et al., 2006; Oti et al., 2006; Tu et al., 2006; Xu and Li, 2006). Consistent features are thus detected among approximately 1600 disease genes listed in the OMIM morbid map and used for these studies. Indeed, disease genes tend to be longer, are composed of more exons, show a higher degree of interspecies conservation, and are involved in more interactions than other genes. However, these approaches are unable to establish the correspondence between a given disease and a set of genes. The second group of systems apply strategies relying on the hypothesis that similar diseases are most likely caused by similar genes. These strategies are often called prioritization methods since they aim to rank a given list of genes with respect to their probability to cause a disease (Adie et al., 2006; Aerts et al., 2006; Freudenberg and Propping, 2002; George et al., 2006; Perez-Iratxeta et al., 2002, 2005; Rossi et al., 2006; Turner et al., 2003). Additionally, alternative strategies based on the same similarity hypothesis aim to characterize user-defined groups of genes (Barillot et al., 2004; Chiang et al., 2006; Masseroli et al., 2004, 2005; Sun et al., 2006). In order to find additional responsible genes, prioritization methods are often applied to a single disease whose associated chromosomal loci are known. A pool of statistical methods is then used to compute similarity measures dealing with various gene features. Such gene features are particularly well covered in the endeavour system (Aerts et al., 2006), e.g. sequence similarity, domain composition, tissue expression, Gene Ontology (GO) annotation, interspecies conservation, protein–protein interactions, involved pathways and cis-regulatory elements. However, this type of prioritization strategy requires at least one well-known gene to be used as a reference candidate gene. Finally, a third group of methods gathers integrated systems that help users to formulate complex multi-criteria queries to retrieve appropriate collections of relevant genes. For instance, the GeneSeeker system (van Driel et al., 2005) and the GeneSorter functionality proposed by UCSC Genome Browser (Kent et al., 2005) allow experts to test various hypotheses on criteria that can link genes to diseases. An example is found in Tiffin et al. (2005), who developed a strategy to identify genes expressed in tissue affected by a disease. Hence, candidate genes are selected if their corresponding annotations with respect to a controlled vocabulary (i.e. eVOC, which is used in Ensembl EST annotation) match the disease annotation. Relevant eVOC annotations for the studied diseases were derived from PubMed abstracts using text-mining techniques. The Approach for Candidate Gene Retrieval (ACGR) presented in this article is inspired from this last group of methods. Indeed, we propose four steps to guide the discovery of gene–disease relationships. First, several precise definitions of candidate genes are formulated. Next, these definitions are used to design a relational data model and to populate a dedicated DB with relevant data extracted from various internet resources. Finally, to retrieve sets of candidate genes, DB views that express candidate gene definitions are created. Available experimental data can be included in the disease gene definitions and thus exploited together with public annotation data. The approach presented here is tested with three case studies, including a rare orphan gene syndrome. 2 SYSTEMS AND METHODS 2.1 Explicit gene–disease relationships The definition of a candidate gene provided by the Webster Medical Dictionary is ‘any gene thought likely to cause a disease’. This definition implies that a candidate gene is a gene which is somehow related to a disease. However, specific gene–disease relationships that exist between candidate genes and studied diseases can be articulated in more useful ways by considering information that is available in various public DBs as well as wet-lab datasets. The most obvious relationship between candidate genes and disease, hereafter called ‘is_co-localized_with’ (denoted by l), expresses the inferred relationship between the localization of a candidate gene and a chromosomal region linked to a given disease. This principle embodied within this statement has guided positional cloning for a long time. The precision of disease localization on chromosomes is highly variable depending on available data. Thanks to recent techniques such as array-CGH (Shaw- Smith et al., 2004; Vermeesch et al., 2007; Vissers et al., 2005), available localization data can be refined using experimental data. Another direct relationship is tissue or developmental co-expression of both genes and disease features. This relationship has been used in various prioritization methods (Tiffin et al., 2005). A variant of this relationship called ‘is_dysregulated_in’ (denoted by d) considers the dysregulation (over-expression or repression) of candidate genes in transcriptomic studies involving patient samples. Functional annotation of genes is improving in most available DBs and can be connected to disease descriptions. Hence a relationship called ‘has_similar_functional_annotation_with’ (denoted by f) is defined on the basis of a similarity measure between functional annotations of a gene and a disease. One key aspect of our approach is that the relationship between a candidate gene and a disease may also involve an intermediate gene which satisfies some relationship with the disease. Here, we explore two types of intermediate genes, namely orthologous and interacting genes. It is noteworthy that the co-localization relationship l only applies to the candidate gene itself; whereas, both dysregulation d and functional similarity f relationships apply to intermediate genes as well. Complex definitions are then constructed in the form: ‘a candidate gene is a gene that is co-localized with the disease and is orthologous to a gene that has similar functional annotation with the disease’ and ‘a candidate gene is a gene that is co-localized with the disease and that interacts with a gene that is dysregulated in patients affected by the disease’. The former definition assumes the existence of two relationships, namely l and f, which connect the disease with the candidate gene and with one of its orthologs in a model organism, respectively. The latter definition assumes the existence of two relationships, namely l and d, which connect the disease with the candidate gene and with one of its interaction partners, respectively. Further complex definitions can be formulated similarly, such as ‘a candidate gene is a gene that is co-localized with the disease and that interacts with a gene which is in turn orthologous to a gene having similar functional annotation with the disease’. Retrieving sets of candidate genes which match such complex definitions from masses of biological data are the challenge taken up by the ACGR approach described in this article. 2.2 Relevance of functional gene–disease relationships In order to assess the relevance of discovered gene–disease relationships, we introduce a measure quantifying the functional similarity relationship f between a gene and a disease. However, to date, no common vocabulary is available to describe functional features of both diseases and genes, hence impeding any straight-forward comparison of disease and gene functional annotations. Current prioritization methods quantify the functional similarity between test genes and training genes based on their GO annotations (Khatri and Draghici, 2005). Ideally the disease functional features should be described with GO vocabulary so that the similarity between gene and disease can be obtained by calculating the similarity between their GO annotations. In practice such disease annotation is performed by an expert of the disease. This procedure for assessing the relevance of gene–disease relationship presents three main advantages. First, an initial set of training genes is no longer required. Second, available knowledge about the disease is included in disease description. Finally, the rich GO annotations that are available for genes from model organisms will be propagated to human genes thanks to candidate gene definitions involving intermediate orthologous genes. 2.3 Overall presentation of the ACGR approach The following five steps conceptually describe the proposed in silico methodology for candidate gene retrieval. (i) Our system takes as input a functional description of a disease, established by an expert using the GO vocabulary (see Section 3.2), as well as available experimental datasets. The system then collects data from various public DBs. (ii) It first retrieves genes sharing GO annotations with the input disease from either human or model organisms. (iii) Next, relevant annotations of these genes are added, including cytogenetic localization, functional annotation, interacting genes and human orthologs of genes from model organisms. (iv) All retrieved genes are then assigned similarity values that are calculated on the basis of their annotation similarity with the input disease. (v) Finally, sets of candidate genes along with relevant annotation data are built that correspond to various candidate gene definitions. Our system's architecture is centred on a DB which is controlled by a DataBase Management System (DBMS). There are three main features of a DBMS that make it attractive to use: centralized data management, data independence and data integration. This contrasts with conventional data processing systems in which each application program has direct access to the data it manipulates. In a DBMS, all data are integrated thereby reducing redundancies and inconsistencies and making data management more efficient. Finally, the existence of a domain data model ensures global data coherence. The most commonly used conceptual framework for a DBMS is the three-level architecture suggested by the ANSI/SPARC committee (ANSI/X3/SPARC, 1975). The three levels are considered as three different views on the data: (i) the external level or individual user view; (ii) the conceptual level or community user view; and (iii) the internal level or storage view. This three-level DB architecture allows a clear separation of the information meaning (conceptual view) from the physical data structure layer. A DB system that can separate these modelling levels is likely to be flexible and adaptable. The external level is a restricted view on the data, and the same DB may provide a number of different views for different categories of users or needs. In our approach, the candidate gene definitions proposed in Section 2.1 constitute external views on data collected about genes and diseases. The conceptual level determines the data model of the domain of interest, and includes all the information that will be represented in the DB. Finally, the physical model will be replaced here with the so-called ‘logical model’ (Teorey et al., 2006) because the latter is independent of any particular commercial DBMS. 3 ALGORITHM 3.1 DB design The detailed definitions and relationships presented in Section 2.1 lead to a specification of the various types of data relevant for the retrieval of candidate genes. The resulting conceptual data model is presented in Figure 1
Queries corresponding to any candidate gene definition (Section 2.1) can be addressed to a DB constructed according to the model shown in Figure 1
3.2 Populating the DB On the basis of the relational data model, it is possible to specify the initialization steps of ACGR DB. Entering a disease description consists of inserting one row of data, hereafter called a tuple, into the ‘Disease’ table and several tuples in the ‘GO_Term’ and ‘Disease_GO_Term’ tables. To this aim, an expert of the studied disease has to carefully (i) extract from her knowledge and from OMIM the phenotypes which characterize the disease, (ii) associate keywords to these phenotypes and (iii) retrieve the most relevant GO terms corresponding to these keywords. The ‘Author_ID’ attribute is useful to distinguish different descriptions of the same disease. When available, experimental data are entered by inserting one tuple into the ‘Experiment’ table for each performed experiment, and several tuples into the ‘Gene’ and ‘Gene_In_Experiment’ tables, representing all signature genes and their dysregulation ratios. Finally, the system retrieves from public DBs all human, mouse and fly genes that are annotated by at least one GO term associated with the studied disease. Only gene identifiers are inserted into the ‘Gene’ table at the initialization stage. The data collection process consists of first retrieving identifiers of human orthologs for mouse and fly genes and then retrieving all required annotations for all gene identifiers present in the ‘Gene’ table. In particular, interacting genes are retrieved and inserted into the ‘Interaction’ table. Identifiers for interacting genes which are not present in the ‘Gene’ table are then added and undergo their own data collection process. Nevertheless at this stage, interaction partners are omitted to prevent an explosion of relationships. The specification of data wrappers implies selecting appropriate DBs (see Section 4) and mapping the relevant fields onto the ACGR relational data model. Specific wrappers have been designed to plug in external ranking tools for calculating functional similarity values between genes and diseases. Such wrappers will insert tuples into the ‘Gene_Disease_Similarity’ table, i.e. one tuple per gene and per ranking tool. 3.3 Building sets of candidate genes In order to express the candidate gene definitions, views are defined in Standard Query Language (SQL) at the logical level of our conceptual framework. A view associates an SQL query with a view name leading to the creation of a virtual table. We have selected four basic definitions leading to the four views described below. The corresponding SQL queries can be found in the Supplementary Material. For the sake of readability, the datasets produced upon view execution are called ‘Datasets’. Dataset1: genes ranked according to their functional similarity with disease description. This first view retrieves the gene symbol, species, cytogenetic localization and similarity of all ACGR DB genes, sorted by decreasing similarity value. Human, mouse and fly genes are thus collated according to their similarity with disease description. Mouse genes are often ranked better than their human orthologs because of the richer annotation in the model organism. The higher a gene is ranked in Dataset1, the stronger is the functional relationship with the disease. Dataset2: human orthologs of model organism genes listed in Dataset1. This second view displays all features of Dataset1 for genes retrieved from model organisms (here, mouse and fly) together with the gene symbol, cytogenetic localization and similarity of their human orthologs. Good ranking of a mouse gene can pull its human ortholog to the top of Dataset2 when it was formerly at the bottom of Dataset1 because of poor GO annotation in human. This behaviour is observed, in the CHD7 gene of CHARGE syndrome, for example (see subsequently). Dataset3: genes interacting with the genes listed in Dataset1. For each gene in Dataset1, the symbol, cytogenetic localization and similarity of the genes reported as interacting with it (mostly via the gene products but other types of interactions are not excluded) are displayed. The source of information concerning these interactions is also displayed. Only intra-species interactions are listed here. Genes that display proper cytogenetic localization but poor similarity values may reveal good disease candidates because of interactions with well-ranked genes mapped elsewhere in the genome. Dataset4: human orthologs of model organism genes listed in Dataset3. Dataset4 is intended to display candidate genes which are human orthologs of model organism genes that interact with well-ranked genes. When experimental data are available, it can be included into each of the views described above, thereby producing four supplementary views: from Dataset1Exp to Dataset4Exp. An example of this is presented below in the case study on AICARDI syndrome. Further queries on the basic ACGR views can then provide customized lists of candidate genes. Indeed, creating sets of annotated candidate genes as SQL views allow biologists to benefit from the numerous advantages of this powerful approach. First, writing new queries is simplified. Second, the views are automatically refreshed whenever the DB is updated. Finally, defining views contributes to the integrity and security of the DB because end-users may be given tuned privileges on views rather than on the underlying data tables. 4 IMPLEMENTATION The technical implementation choices described in this work are not mandatory since other techniques are conceivable depending on the target deployment environment. For example, here wrappers for retrieving and integrating data from various data sources have been implemented as scenarios of the Xcollect software (Devignes et al., 2005). Xcollect scenarios are configured to formulate queries automatically, send them to a remote web resource, parse the returned document and store the desired data in an XML document. Capturing the date of last DB update is included in each scenario to help track data quality. The specific Xcollect scenarios used here are available in the Supplementary Material. In this work, data sources were selected according to their updating frequencies, annotation quality and coverage. Thus, GO terms corresponding to keywords describing the disease were retrieved from AMIGO DB; genes annotated with selected GO terms were retrieved from Entrez-Gene at NCBI as well as all gene annotations. Symbols of orthologous genes were retrieved from Entrez-HomoloGene. The storage of the collected data in the ACGR DB was performed with the help of XSL transformations designed to convert each Xcollect session document into appropriate SQL commands. Besides Xcollect wrappers, we developed a wrapper to invoke the GO-Family program available in the GOToolBox (Martin et al., 2004). The program was modified slightly because a list of GO terms rather than a list of reference gene symbols is required as well as the list of genes to be ranked. Briefly, the program fetches all GO terms annotating a candidate gene as well as their parent terms. It also fetches all parents of the disease-specific GO terms. Then it calculates a similarity percentage taking into account identical and non-identical terms between the set of GO terms associated with the candidate gene and the set of disease-specific GO terms. The EasyPHP package was used for data management and user interface development. This package includes a web server (Apache), a DBMS (mySQL) and a script language (PHP). The corresponding programs along with a user guide are available at http://bioinfo.loria.fr/projects/acgr/acgr-software/. 5 RESULTS AND DISCUSSION 5.1 Three case studies The ACGR approach was initially motivated by the need to analyse results obtained for AICARDI syndrome (OMIM %304050) which is currently being investigated experimentally (Yilmaz et al., 2007). To date, no responsible gene is known for this disease. Two other rare syndromes, CHARGE (OMIM #214800) and GOLTZ (OMIM #305600), were selected from the literature. The genes responsible for these two syndromes have recently been reported (Grzeschik et al., 2007; Vissers et al., 2004; Wang et al., 2007b), but this information is not included in the annotations collected in the ACGR DB. It is therefore relevant to test the ACGR approach on these recently elucidated diseases. 5.2 Populating the DB Table 2 shows for the three case studies the correspondence between disease phenotypes and Biological Process GO terms. Phenotypes were selected from OMIM notices regarding diagnoses. Keywords (data not shown, see Supplementary Material) were chosen to characterize each phenotype. For a given keyword, GO terms were selected at the relevant level of the GO hierarchy. A GO term is included when all its children are relevant. In the case of AICARDI syndrome, a third phenotype (infantile spasms) is frequently observed but does not correspond to any specific GO term. According to the clinicians, this phenotype is covered by the ‘Forebrain development’ GO term.
Experimental data were inserted into the DB for the AICARDI syndrome as explained in Section 3.2. These data concern 300 genes which ANOVA analysis of several transcriptomic experiments found to be dysregulated (Yilmaz, 2007). For these genes the ratio attribute was set to 1; whereas, it was set to 0 for any other gene. Table 3 summarizes the contents of the ACGR DB for the three case studies. The #GO column displays the number of GO terms specific to the disease. The #fly, #mouse and #human columns show the number of genes annotated by at least one of these GO terms for each organism. The ‘#dysregulated’ column indicates the number of experimentally determined human dysregulated genes stored in the DB. The last column gives the total number of genes after the inclusion of other orthologous and interacting genes.
5.3 Building sets of annotated candidate genes Dataset1 to Dataset4 were constructed for each case study as described in Section 3.3 to enable queries reflecting expert hypotheses about candidate genes to be formulated. The complete tables are available as Supplementary Material. Table 4 displays the first three tuples from CHARGE Dataset2. The human CHD7 gene that is responsible for this disease (Vissers et al., 2004) appears in second position as orthologous to the mouse Chd7 gene which has a high similarity to disease description (48%). It is worth noting that the low similarity of the human CHD7 gene annotation to CHARGE GO terms (4%) relegates it to the bottom of Dataset1. Selecting human genes from chromosome 8 in CHARGE Dataset2 yields the CHD7 gene as the first-ranked candidate gene.
The CHARGE case study shows that the ACGR approach would have been able to designate the CHD7 gene as the best candidate gene in the group of nine genes identified by the authors at 8q12 thus prioritizing its sequencing. It is worth noting that although the association of CHD7 with CHARGE syndrome was established 3 years ago, the GO annotation of this gene does not reflect this association. Table 5 shows the first six tuples from GOLTZ Dataset4. Despite its low similarity to disease description (7%), the responsible human PORCN gene appears at the fifth position in GOLTZ Dataset4 that contains 51 lines and as the first candidate gene located on chromosome X. This is due to the fact that the mouse Porcn gene is reported as interacting with the mouse Wnt7a gene which has good similarity to the disease description. Hence the ACGR approach could have pointed to the PORCN gene even before the localization refinement of the disease provided by the CGH array experiment (Grzeschik et al., 2007; Wang et al., 2007b).
In the case of AICARDI syndrome, Dataset1Exp to Dataset4Exp were produced including transcriptomic data. A first query on Dataset1Exp retrieved 71 genes located on human chromosome X. Table 6 displays the first four genes of this list. The best-ranked PLXNA3 gene seems to be an interesting candidate. Its annotation is rather similar to the AICARDI GO terms (56%). However, to date, it has not been associated with any human disease. The following ARX and SOX3 genes, namely MRX54 (OMIM #300419) and MRGH (OMIM #300123), are both responsible for diseases involving mental retardation. The next DCX gene is a good internal control since it is responsible for X-linked lissencephaly (LISX, OMIM #300067), a disease-like AICARDI syndrome involving agenesis of the corpus callosum and multiple heterotopia.
Further queries were applied to AICARDI Dataset3Exp to explore possible interactions between dysregulated genes and candidate genes. Table 7 shows four candidate genes (‘Interac_Symbol’ column) from Dataset3Exp, located on chromosome X and interacting with the four best-ranked dysregulated genes (‘Symbol’ column). The MAGED1 gene interacts with the DLX5 gene which is dysregulated in our transcriptomic experiments and its GO annotation displays 50% similarity with the AICARDI-specific GO terms. The interaction between these two gene products is based on in vivo experiments (Masuda et al., 2001).
5.4 Discussion Overall, the ACGR approach has received enthusiastic feedback from experimentalists. Indeed conducted experiments yielded very satisfying results in the CHARGE and GOLTZ case studies. We have shown that in both cases responsible genes related to the disease are found at the first rank position when chromosome localization is taken into account. Thus, the ACGR approach would have been useful at the time of the discovery of these responsible genes to avoid unnecessary sequencing. In the case of AICARDI syndrome, the ACGR approach provided several meaningful and promising candidate genes that are currently being analysed further. For instance, the MAGED1 gene displays several features associated with disease genes (Tu et al., 2006). It is a 99.3 kb long gene due to a large intron (91 kb) separating the first exon from the 12 other exons that are grouped over the remaining 8 kb. Interestingly, two of the retrieved candidate genes (MAGED1 andUBQLN2) are located in the same cytogenetic band (Xp11.23), which is known to be correlated with several neuro-psychiatric disorders. It should be noted that for this disease, the small number of recruited patients hampers the application of purely experimental protocols. In addition to the presented case studies, ongoing investigations indicate that the approach presented here may facilitate future endeavours to identify susceptibility genes for complex diseases. The robustness and flexibility of our approach makes it possible to explore various alternative approaches or strategies, including varying the ranking procedure and the selection of primary data sources. For example, data about interaction networks could be retrieved from the protein complexes curated by Lage et al. (2007). The GO-Family algorithm used for gene ranking in this study could be replaced by any other similarity measurement between GO terms (Lord et al., 2003; Wang et al., 2007a; Zhang et al., 2006). The similarity between eVOC terms annotating both gene expression and affected tissues could be used to assess ‘is_co-expressed’ relationships (Tiffin et al., 2005), for example. A possible limitation of the current work may be the low number of case studies analysed. Since an expert of each studied disease has to be involved in the first step of the approach, this clearly hampers automated large-scale evaluation. Moreover, it should be stressed that success in retrieving at a good rank the gene responsible for a disease strongly depends on both user's expertise and the quality of available data. Nevertheless, the results presented here clearly demonstrate the explicit querying capabilities of the ACGR system and the originality of this approach for providing explanations on why a certain gene is related to a disease. [Supplementary Data]
ACKNOWLEDGEMENTS We thank Sylvain Lambermont for his contribution at early stage of the work, Dr Leheup for helping in selecting disease-specific GO terms, Amine Rouhane-Hacène and Dave Ritchie for careful reading of the manuscript. S.Y. was supported by the AAL (Amis d'Anne-Lorène) association and Région Lorraine. Funding: Contrat de Plan Etat-Région Lorraine (PRST Intelligence Logicielle). Conflict of Interest: none declared. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||
Nat Genet. 2003 Mar; 33 Suppl():228-37.
[Nat Genet. 2003]Annu Rev Genomics Hum Genet. 2005; 6():381-406.
[Annu Rev Genomics Hum Genet. 2005]BMC Bioinformatics. 2005 Mar 14; 6():55.
[BMC Bioinformatics. 2005]Comput Methods Programs Biomed. 2007 Mar; 85(3):229-37.
[Comput Methods Programs Biomed. 2007]Nucleic Acids Res. 2004; 32(10):3108-14.
[Nucleic Acids Res. 2004]Bioinformatics. 2006 Feb 1; 22(3):269-77.
[Bioinformatics. 2006]Bioinformatics. 2006 Mar 15; 22(6):773-4.
[Bioinformatics. 2006]Nat Biotechnol. 2006 May; 24(5):537-44.
[Nat Biotechnol. 2006]Bioinformatics. 2002; 18 Suppl 2():S110-5.
[Bioinformatics. 2002]Nucleic Acids Res. 2006; 34(19):e130.
[Nucleic Acids Res. 2006]Nat Genet. 2002 Jul; 31(3):316-9.
[Nat Genet. 2002]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W758-61.
[Nucleic Acids Res. 2005]Genome Res. 2005 May; 15(5):737-41.
[Genome Res. 2005]Nucleic Acids Res. 2005; 33(5):1544-52.
[Nucleic Acids Res. 2005]J Med Genet. 2004 Apr; 41(4):241-8.
[J Med Genet. 2004]Eur J Hum Genet. 2007 Nov; 15(11):1105-14.
[Eur J Hum Genet. 2007]Hum Mol Genet. 2005 Oct 15; 14 Spec No. 2():R215-23.
[Hum Mol Genet. 2005]Nucleic Acids Res. 2005; 33(5):1544-52.
[Nucleic Acids Res. 2005]Bioinformatics. 2005 Sep 15; 21(18):3587-95.
[Bioinformatics. 2005]Genome Biol. 2004; 5(12):R101.
[Genome Biol. 2004]Eur J Med Genet. 2007 Sep-Oct; 50(5):386-91.
[Eur J Med Genet. 2007]Nat Genet. 2007 Jul; 39(7):833-5.
[Nat Genet. 2007]Nat Genet. 2004 Sep; 36(9):955-7.
[Nat Genet. 2004]Nat Genet. 2007 Jul; 39(7):836-8.
[Nat Genet. 2007]Nat Genet. 2004 Sep; 36(9):955-7.
[Nat Genet. 2004]Nat Genet. 2007 Jul; 39(7):833-5.
[Nat Genet. 2007]Nat Genet. 2007 Jul; 39(7):836-8.
[Nat Genet. 2007]J Biol Chem. 2001 Feb 16; 276(7):5331-8.
[J Biol Chem. 2001]BMC Genomics. 2006 Feb 21; 7():31.
[BMC Genomics. 2006]Nat Biotechnol. 2007 Mar; 25(3):309-16.
[Nat Biotechnol. 2007]Bioinformatics. 2003 Jul 1; 19(10):1275-83.
[Bioinformatics. 2003]Bioinformatics. 2007 May 15; 23(10):1274-81.
[Bioinformatics. 2007]BMC Bioinformatics. 2006 Mar 14; 7():135.
[BMC Bioinformatics. 2006]Nucleic Acids Res. 2005; 33(5):1544-52.
[Nucleic Acids Res. 2005]