![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||
Copyright © 2004 by The American Society of Human Genetics. All rights reserved. Genetic Inheritance of Gene Expression in Human Cell Lines 1Department of Statistics, Oklahoma State University, Stillwater, OK; Departments of 2Biostatistics and 3Pharmacology, University of Washington, and 4Rosetta Inpharmatics LLC, Seattle; 5Department of Epidemiology, Johns Hopkins University, Baltimore; and 6Merck Research Laboratories, Merck & Co., Rahway, NJ Address for correspondence and reprints: Dr. Stephanie A. Monks, Department of Statistics, 301G Mathematics, Statistics, and Computer Science Building, Oklahoma State University, Stillwater, OK 74078-1056. E-mail: stephanie.monks/at/okstate.edu Received July 9, 2004; Accepted October 1, 2004. This article has been cited by other articles in PMC.Abstract Combining genetic inheritance information, for both molecular profiles and complex traits, is a promising strategy not only for detecting quantitative trait loci (QTLs) for complex traits but for understanding which genes, pathways, and biological processes are also under the influence of a given QTL. As a primary step in determining the feasibility of such an approach in humans, we present the largest survey to date, to our knowledge, of the heritability of gene-expression traits in segregating human populations. In particular, we measured expression for 23,499 genes in lymphoblastoid cell lines for members of 15 Centre d'Etude du Polymorphisme Humain (CEPH) families. Of the total set of genes, 2,340 were found to be expressed, of which 31% had significant heritability when a false-discovery rate of 0.05 was used. QTLs were detected for 33 genes on the basis of at least one P value <.000005. Of these, 13 genes possessed a QTL within 5 Mb of their physical location. Hierarchical clustering was performed on the basis of both Pearson correlation of gene expression and genetic correlation. Both reflected biologically relevant activity taking place in the lymphoblastoid cell lines, with greater coherency represented in Kyoto Encyclopedia of Genes and Genomes database (KEGG) pathways than in Gene Ontology database pathways. However, more pathway coherence was observed in KEGG pathways when clustering was based on genetic correlation than when clustering was based on Pearson correlation. As more expression data in segregating populations are generated, viewing clusters or networks based on genetic correlation measures and shared QTLs will offer potentially novel insights into the relationship among genes that may underlie complex traits. Introduction In 1980, Botstein et al. proposed that sequence differences be treated as markers, in order to map genes involved in inherited traits. Since that time, the number of genes mapped to positions in the human genome has grown exponentially. Mapping these genes for inherited traits has been extremely successful for simple Mendelian diseases; however, finding such genes for diseases—and their associated risk traits—that are of large public health interest has proven difficult. Reasons for this difficulty include disease heterogeneity (disease subtypes with some or no overlapping genetic causes), misclassification (from using discrete classifications of disease from thresholds and combinations of thresholds), and unaccounted-for environmental influences. With the advent of technology to measure changes in molecular profiles—for example, changes in mRNA transcript abundance, protein levels, and metabolite levels—it should be possible to unravel some of the complexity of these complex diseases. In particular, gene expression can be viewed as a more refined phenotype, since it is a measure of phenotypic variation at the molecular level. In addition, each gene-expression phenotype provides annotation, pathway, and genome location data. Combining these data with genetic-inheritance information, for both molecular profiles and complex traits, is a promising strategy not only for detecting QTLs for complex traits but for understanding which genes, pathways, and biological processes are also under the influence of a given QTL. Jansen and Nap (2001) were among the first to suggest the use of expression profiles in segregating populations. They discussed the power of using well-developed methods and designs available for dissecting quantitative traits along with the rapidly expanding collection of methods for large-scale sets of phenotypes. They provided an illustration that combined linkage data from a set of genes with known genomic locations, to construct a putative pathway. Jin et al. (2001) studied the contributions of sex, genotype, and age on transcription in Drosophila melanogaster through a study of two inbred lines of Drosophila. They observed a large sex effect on expression and less of an effect due to genotype and age, although there was evidence for sex-by-genotype interactions. Brem et al. (2002) and Yvert et al. (2003) provided an in-depth exploration into the genetics of gene expression in yeast. These studies indicated significant control of gene expression by genetic variation with both cis- and trans-acting mechanisms. In addition, support was provided for linkage “hotspots” that controlled large sets of functionally related genes by a single QTL. Cowles et al. (2002) explored the role of cis-acting QTLs in mice and found evidence for regulatory variation, some of which was tissue specific. Schadt et al. (2003) provided a survey of the genetics of gene expression in maize, mice, and humans. This study further supported significant genetic control of gene expression in both cis- and trans-acting regulatory mechanisms. In addition, gene expression was utilized to subphenotype mice such that the underlying genetics for each subtype could be dissected. Data were also provided to support heritable influences on gene expression in human lymphoblastoid cell lines. Yan et al. (2002) took one of the initial steps for extending these studies to include humans. Their technique compared two alleles of the same gene within the same cellular sample, to identify differences in expression between the two alleles. Thirteen genes were studied in 96 individuals. Of the 13 genes, 6 showed differences in expression due to allelic variation. In addition, they presented three families with expression levels segregating according to allelic variation. Another study in lymphoblastoid cell lines further established familial aggregation of gene expression and related functional classification to expression (Cheung et al. 2003). The goal of the present article is to move beyond family aggregation, or heritability, and to more fully explore the genetic component of gene expression in humans through the use of lymphoblastoid cell lines in a sample of CEPH families (Dausset et al. 1990). Our study includes (1) determination of expressed genes, (2) estimation of the heritability of gene expression, (3) linkage analysis to establish oligogenic effects, and (4) characterization of cis and trans effects of detected QTLs. In addition, gene annotation will be studied in the context of each of the steps above, and we provide an example establishing the extra information, with regard to biological pathways, that is obtained by considering shared genetic influences. This study presents the largest survey to date, to our knowledge, of the heritability of gene-expression traits in segregating human populations. Material and Methods Families Fifteen families from the CEPH/Utah family collection were selected for profiling. The family identifiers were 1334, 1340, 1345, 1346, 1349, 1350, 1358, 1362, 1375, 1377, 1408, 1418, 1421, 1424, and 1477. These families were selected because of the availability of genotypes and lymphoblastoid cell lines for all three generations and because of their large numbers of children. In total, the families represent 210 individuals. Of these, 167 individuals provided adequate quantity and quality of RNA for expression profiling. Tissue Growth, Processing, and Profiling Lymphoblastoid cell lines were obtained from Coriell Repositories and propagated. All cell lines were grown in media and supplements purchased from the Invitrogen Corporation. The culture media consisted of RPMI supplemented with 15% fetal bovine serum, 1% penicillin/streptomycin, and 0.5% sodium pyruvate. To minimize variability between experiments, all fetal bovine serum used was from lot number 10082147 1129480. The cell lines were grown at 37°C in humidified incubators, in an atmosphere of 5% CO2. Experiment series were set up by seeding 25-ml cultures in T25 flasks at a density of 2.5×105 cells/ml. Each culture was grown for 48 h or until the cell density was at least 780,000 cells/ml. To harvest the cells, the cultures were centrifuged, the media was decanted, and 500 μl of guanidine isothiocynate cell lysis buffer (Buffer RLT, Qiagen) was added. Cell lysates were then transferred to 96-well block format and stored at −80°C. Total RNA was isolated using RNeasy 96 kits (Qiagen) with the following protocol modifications. Harvesting of cells was performed in 500 μl, instead of in the 150 μl specified by the protocol. To eliminate DNA contamination, the appended DNase protocol was used in concert with the isolation protocol. DNase was added to the membrane after the first 350-μl RW1 wash (guanidinium thiocyanate and ethanol) and was allowed to sit on an RNeasy membrane for 30 min. An additional 350-μl RW1 buffer wash and an additional 500-μl RPE buffer wash were performed. To quantitate and perform quality control on the experiments, the A260/A280 ratio was taken through use of a Spectramax spectrophotometer (Molecular Devices). Samples whose A260/A280 ratio deviated ±0.2 from the accepted ratio value of 2.0 were excluded. Formaldehyde gels (1.2%) were run on each sample to ensure that ribosomal RNA bands were intact and that significant degradation had not occurred. Samples that met the minimal mass requirement of 13 μg (for two replicates) and whose ribosomal bands were visible in the QC gel were transferred from the 96-well block and aliquoted into microcentrifuge tubes by use of a Multiprobe II EX (Packard BioScience Company). For samples of individuals that were to be used in the pool, 46 μg of RNA was allocated by use of the same procedure. In total, 167 individuals in 15 pedigrees provided adequate quantity and quality of RNA for expression profiling. The microcentrifuge tubes were vacuum dried and stored at −80°C before processing. Dried total RNA samples were reconstituted, and 3 μg of total RNA was used from each sample for subsequent RT-PCR–in vitro transcription amplification using the T7 promoter, which produced allyl-UTP–labeled single-stranded complementary RNA (sscRNA) (Hughes et al. 2001). Amplified cRNA was purified using the RNeasy purification kit (Qiagen) and was coupled with either cy3 or cy5 (Hughes et al. 2001). Purified cy3/cy5-labeled cRNA was fragmented using a ZnOAc/EDTA addition and was hybridized to at least two DNA microarray slides with fluor reversal for 24 h in a hybridization chamber, washed, and scanned using a laser confocal scanner (Hughes et al. 2001). Arrays were quantified on the basis of the intensity of each spot relative to background, by use of the Qhyb program (Rosetta Inpharmatics) (Marton et al. 1998). Expression profiling of lymphoblastoid cell lines was performed using a 25K human gene oligonucleotide microarray. All individuals were compared with a common pool created from equal portions of RNA from all samples that passed quality control and were from founders within the 15 pedigrees (Gene Expression Omnibus Web site). Sequences for the microarray were selected from the RefSeq database (NCBI Reference Sequence Web site; see the Electronic-Database Information section for genes and accession numbers) and EST contigs (van’t Veer et al. 2002). Genotype Data and Genetic Maps Genotype data for 346 autosomal genetic markers for 210 of the pedigree members were obtained from the CEPH genotype database, version 9.0 (CEPH Genotype Database Web site). Genetic markers were selected from the 14,404 markers represented in the full database, so that at least 75% of the pedigrees had genotypes available for at least 75% of the families. The median intermarker distance was 11 cM, on the basis of the deCODE genetic map (Kong et al. 2002). Marker-allele frequencies available from the CEPH genotype database were used for estimating identity-by-descent probabilities. Statistical Methods For each profile, genes were tested to assess differential expression relative to the pool, by use of procedures described elsewhere (Hughes et al. 2000). The color displays given in figure 4
Variance-components methodology was used to estimate the overall and QTL-specific heritabilities of gene expression and to test for linkage across the genome at 4-cM steps, as described below (Almasy and Blangero 1998). For consistency, we follow the notation of Almasy and Blangero (1998). Consider a phenotype denoted by y. A linear model is used to relate variation in y to covariates, QTLs, polygenic background, and random error:
In addition, bivariate segregation analyses were conducted using variance-components models (Almasy et al. 1997; Williams et al. 1999). Consider a vector,
All analyses, with the exception of the bivariate segregation analyses and Pearson correlation (PC), were adjusted for age, sex, and age-by-sex interaction, and likelihood-ratio tests were utilized for tests of heritability, linkage, and GC. Variance-components models were analyzed using the software package SOLAR, the sequential oligogenic linkage analysis routines (Almasy and Blangero 1998). Multiple testing for significance of total heritability was taken into account through the use of false-discovery rate procedures (Benjamini and Hochberg 1995). Here, P values are computed for the set of 2,430 genes that were selected on the basis of expression data alone. These P values are ordered such that P1 P2 … P2,430. The first k tests are significant where k is the largest i such that Pi (i/2,340)α. This rule controls the false-discovery rate at level α.For genes with available annotation in the Proteome BioKnowledge Library (Incyte), key phrases were compared between differentially expressed genes and the full set of 23,499 genes by use of a Fisher's exact test. A total of 4,783 categories are represented in the full set of genes. Hence, tests were conducted for each of the 4,783 possible categories represented in the full set of genes, and significance was assessed using a Bonferroni correction for a familywise type I error rate of 0.05. Results Genes Expressed in Lymphoblast Cell Lines Of the 23,499 genes represented on the microarray, 2,430 were differentially expressed (type I error rate 0.05) in at least half of the children. Nine key phrases were enriched within the differentially expressed set (testwise type I error rate 0.05/4,783); these include “immune response,” “response to viruses,” and “inflammatory response.” Table 1 contains a list of the nine key phrases, along with P values and corresponding occurrence counts.
Heritability of Gene Expression Heritability analysis was conducted for the set of differentially expressed genes. When a false-discovery rate of 0.05 was used, 762 genes (31%) were detected as heritable. The median heritability for the 762 genes is 0.34. Figure 1 0.5. It is noted that heritability estimates were >0.44 for only 25% of the genes. Given that adjustments have been made for age and sex, this implies a large environmental or nongenetic influence on expression for the majority of genes. Of the 762 heritable genes, 705 had a median fold change between 1 and 2, 46 had a median fold change between 2 and 3, 6 had a median fold change between 3 and 4, and 5 had a median fold change >4. Annotation, obtained from the Proteome BioKnowledge Library (Incyte), was compared between genes with significant heritability and those that were differentially expressed. Each of the 4,783 categories that were represented in the full set of genes was considered. No significant differences were detected (testwise type I error rate 0.05/4,783). Comparison of annotation for genes with significant heritability and the full set of genes yielded results comparable to those shown in table 1; however, a couple of differences do exist. In particular, only two categories are statistically significant: immune response (P=.000000256) and defense/immunity protein activity (P=.00000386). Several of the categories found to be enriched in the differentially expressed genes are no longer among the most enriched categories for the subset of heritable genes.
We previously conducted a heritability pilot study on four CEPH families (Schadt et al. 2003). Although the reference pool utilized in the pilot study was substantially different from the reference pool used in the present study, we expected to see some consistency between the two. For the 440 genes found to be differentially expressed and heritable in the pilot sample (false-discovery rate 0.05), 65% were confirmed in the present study. Expression QTLs Multipoint-based identity-by-descent sharing was computed and utilized in a linkage analysis at 4-cM steps across all autosomal chromosomes. Figure 2 .000005, 50 defined by at least one P value .00005, and 132 defined by at least one P value .0005. Not surprisingly, genes with significant linkages correspond well to those genes that were found to have significant heritability and that are associated with immune-related functions.
For the 33 genes with significant linkages at the .000005 level, there was minimal correlation among expression levels. The maximum absolute correlation was 0.61; however, the third quartile of all pairwise correlations was 0.29. Twenty-two of these genes have significant expression QTLs even after a Bonferroni correction is applied to genomewide significance levels to obtain a familywise error rate of 0.05. That is, when significance is assessed for each of the 2,430 differentially expressed genes on the basis of a genomewide significance level of 0.05/2,430, 20 genes have significant expression QTLs (LOD score threshold 6.53). Interestingly, 8 of these genes have QTLs that overlap with their physical location within 5 Mb. This is in contrast to 13 of 33, 18 of 50, and 25 of 132 genes with QTLs significant at the pointwise level of .000005, .00005, and .0005, respectively. For genes with significant linkages at the .000005 level, most (25 of 33) had only a single QTL detected from the linkage scan. Six genes had 2 QTLs, one gene had 3 QTLs, and one had 15 QTLs. Figure 3
Lack of Evidence for Linkage Hotspots Previous studies have detected linkage “hotspots” in studies of the genetics of gene expression (Brem et al. 2002; Schadt et al. 2003; Yvert et al. 2003). Our linkage analyses were conducted at 4-cM steps, for a total of 816 positions along the autosomal genome. At the pointwise significance level of .000005, there were 586 locations with no linkage hits, 159 with one linkage hit, 59 with two linkage hits, 6 with three linkage hits, 3 with five linkage hits, and 3 with six linkage hits. Simulations were used to study the distribution of linkage counts per location under the assumption that linkages are distributed randomly through the genome. On the basis of 60,000 simulations, the probabilities of seeing three, four, five, or six linkage hits at some location in the genome were estimated to be 0.4488, 0.04505, 0.00315, and 0.0001666667, respectively. Hence, the locations with five and six linkages are consistent with nonrandom clusters of QTLs. In addition, the QTLs are all located in a single area on chromosome 6 and correspond to linkages for six genes. Four of the genes (HLA-DPB1, HLA-DRB3, HLA-DRB5, and HLA-G) correspond to the major histocompatibility complex. One of the transcripts corresponds to an EST that is highly similar to a Homo sapiens major histocompatibility complex, class II, DR51 haplotype, and the last of the six genes is the cubilin gene. Two of the HLA genes, HLA-DPB1 and HLA-DRB3, are located close to the shared linkage segment on chromosome 6; however, two factors cast doubt on whether the linkages are due to pleiotropic effects. First, HLA genes are highly polymorphic; therefore, probe selection without regard to such variation is likely to yield a probe that is subject to more-extensive SNP variation than would be realized in other genomic regions. For example, the 60-mer probe for HLA-DRB5 has nine SNPs, that for HLA-DPB1 has seven SNPs, and that for HLA-DRB3 has five SNPs, on the basis of the dbSNP database (dbSNP Home Page). Hence, it is likely that the presence of genetic variation in the probe location could mimic expression patterns similar to those for genetic inheritance of a QTL. Hughes et al. (2001) demonstrated that variation in probe intensities realized on the microarray platform used in the present study were not significant if the probe contained fewer than five mismatches to the corresponding RNA sequence, but significant variation was observed for probes containing five or more mismatches. Second, HLA genes are highly similar to one another, making gene-specific probe selection difficult and resulting expression measures subject to significant cross-hybridization. Clustering of Genes on the Basis of GC Estimating GC between any two traits provides a measure of the extent of variation between two traits explained by common genetic components. Toward this end, we identified a set of genes by taking into account gene-expression activity criteria, heritability measures, and linkage analysis. Genes found to be transcriptionally active in at least 10% of the CEPH samples and that had a statistically significant heritability component (type I error rate 0.01) or at least one QTL with an associated LOD score 3 were identified for further analysis. This resulted in a set of 574 genes that was carried forward into a bivariate analysis performed on each pair of traits in the set, to estimate GC. In addition, PCs for all gene pairs were calculated. For each correlation measure, agglomerative hierarchical clustering was applied to the gene-expression and experiment (individuals from the CEPH families) dimensions (Hastie et al. 2001). Color matrix displays, in addition to the experiment and gene-expression cluster trees generated from this procedure, are shown in figure 4One measure of whether the GC- or PC-based cluster is providing more meaningful information is to examine the extent to which genes in known pathways are seen as clustering more closely together. Using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases (Gene Ontology Consortium Web site; KEGG Genes Database Web site), we determined that 164 and 32 genes, respectively, mapped to pathways represented in the GO and/or KEGG databases by 2 genes in the 574-gene set. To assess the extent of pathway coherence represented in the set of genes that mapped to these databases, we computed the average distance between genes in the cluster tree that mapped to the same pathway, for each pathway having 2 genes represented from the 574 gene set. For the PC clustering, the median distance for the GO pathways was 21 (minimum and maximum distances were 2 and 50, respectively), and the median distance for the KEGG pathways was 20 (minimum and maximum distances were 8 and 45, respectively). After 1,000 rounds of Monte Carlo simulation, the P values associated with the GO and KEGG pathway median distances were estimated to be .03 and .007, respectively.The above results indicate that the pathway information in these databases does reflect biologically relevant activity taking place in the lymphoblastoid cell lines, with greater coherency represented in the KEGG pathways than in the GO pathways. Given this, we wanted to assess whether the GC clustering provided for increased coherency over all pathways represented in the 574-gene set, compared with PC clustering based on the observed expression values. The median distances between genes in the GC clusters were 23 for the GO pathways and 15 for the KEGG pathways. Although the median distance was higher for the GO pathways, it was interesting to note that, of the 67 GO pathways represented by 2 genes in the 574-gene set, 29 had smaller distance measures in the GC cluster, compared with 30 having smaller distance measures in the PC cluster (8 had identical measures). In light of this, the degree of pathway coherence with respect to the GO pathways does not appear to be significantly different between the GC and PC clusters. This may reflect the more general nature of the GO pathways represented. However, the increased pathway coherence observed in the KEGG pathways is well reflected by the GC cluster; not only did the GC cluster show increased coherency for the KEGG pathways, but 29 of the 39 KEGG pathways represented in the 574-gene set had smaller distance measures than in the PC cluster, compared with 8 having smaller distance measures in the PC cluster.An example of two genes represented in a KEGG pathway that grouped more tightly together on the basis of genetic correlation is highlighted in figure 4 Discussion We have demonstrated that there is a genetic component to the control of gene expression in human lymphoblastoid cell lines. In fact, of the differentially expressed genes, 31% were heritable, on the basis of a false-discovery rate of 0.05. These genes were enriched for immunity-related functions, including immune response, defense/immunity protein activity, response to virus, and inflammatory response. Estimates of heritability were on the same order as that observed for complex traits. This is perhaps not surprising, given that the cell lines utilized were created for use in genotyping. Hence, a large amount of experimental variation could be diminishing any existing genetic effects. In particular, each family may not have been collected at a comparable time or with comparable sampling methods. Further, it is possible that the cell lines have not undergone the necessary procedures for immortalization at the same time or in the same experimental environment. Despite these potential problems, linkage analysis yielded several QTLs controlling for gene expression, even when a conservative Bonferroni correction was used to maintain the false-positive rate, across all linkage analyses for all genes, at a level of .05. For the given sample size, the study is powered to detect major QTLs only. In fact, QTLs detected at a pointwise significance level of .000005 account for 50% of the trait variance, with 75% of the QTLs having heritabilities >0.76.It is of note that other studies of the genetics of gene expression have detected “hotspots” in which a single QTL controls the expression of a large set of genes. Our results did not detect such a phenomenon. It may be that our study was not powered to pick up such effects. However, other studies have focused on genetic crosses of inbred lines that had been selected to be divergent for a particular phenotype. One could argue that this type of ascertainment would enrich for a small set of QTLs that are underlying the observed phenotypic differences among the resulting offspring. Given that the resulting phenotypic differences are due to many changes at the molecular level, one would expect to see these types of hotspots for expression QTLs. Our sample was not ascertained on the basis of a particular phenotype. Given a random sample of families, one might not expect to see hotspots for control of gene expression. One advantage of generating gene-expression data in a segregating population is the ability to decompose the relationship between any two traits into genetic and environmental components. Estimating genetic correlation between any two traits provides a direct measure of the extent of variation between two traits explained by common genetic components. It is this level of information that allows for more-direct causal inferences among the expression traits and clinical phenotypes of interest (e.g., disease-related phenotypes). The GC information among the gene-expression traits could be incorporated into standard Bayesian network reconstruction models, as a way of formalizing the reconstruction of genetic networks underlying complex phenotypes of interest. Successes achieved using more heuristic methods of incorporating genetic information into network reconstruction processes (Zhu et al. 2004) suggest that this is a worthy area of research to investigate. Here, we conducted analyses that showed clusters of genes based on GC corresponded well to biological pathways, and we provided an example of two genes involved in the same pathway that have been shown to be active in lymphoblastoid cell lines. These genes were more tightly clustered when GC, as opposed to PC, was utilized. Although there are similar examples in which PC offers a tighter association among genes of a given pathway, clustering based on GC offers a different view of the data that may enhance the information that can be derived from the clusters of gene-expression data and that has not previously been exploited. As more expression data in segregating populations are generated, viewing clusters or networks based on GC measures will offer potentially novel insights into the relationship among genes that may underlie complex traits. Our results establish the existence of genetic control of gene expression and include a description of what this control looks like in a random sample of families. In addition, clustering based on GC provided groupings of genes that are consistent with biological pathways. For a sample ascertained for the study of a complex trait, such information could provide in-depth functional information that could be overlaid with inheritance data for the complex trait. Studies in maize, mice, and yeast are starting to provide such examples (Brem et al. 2002; Schadt et al. 2003; Yvert et al. 2003). More studies are needed to determine the utility of such an approach in humans. For instance, what tissues are amenable to sampling? What types of traits could be studied with such tissues? Peripheral blood is easy to obtain, but for what diseases or risk factors will this be relevant? Studies have already shown influences of age, sex, time of sample draw, blood cell count, and health status in peripheral blood (Whitney et al. 2003). Another question of interest is, to what extent does expression in cell lines relate to expression in the original tissue? Also, should the focus be on expression in the tissue or, perhaps, changes in expression due to a challenge? The present study suffers from many of the same limitations of gene-expression studies, in that a large number of variables are tested. We tried to minimize multiple testing, by focusing on genes that we deemed to be expressed in the lymphoblastoid cell lines. Of course, there are other ways to determine a set of differentially expressed genes; however, we expect results based on alternative gene-selection methods to be comparable to those presented here. Although results were presented for 167 profiled samples, only 15 families were utilized. Given such a large number of tests on this sample size, it is likely that the asymptotic distributions, on which P values are based, are not appropriate for all genes. Permutation-based test statistics could be used to estimate such P values; however, the computation time required makes this approach infeasible in a reasonable amount of time. Regardless of these limitations, the results presented here establish the existence of genetic control of gene expression and provide a glimpse into the possibilities of using such an approach to better understand complex traits. Acknowledgment Rosetta Inpharmatics is a wholly owned subsidiary of Merck & Co., Inc. Electronic-Database Information Accession numbers and URLs for data presented herein are as follows: CEPH Genotype Database, http://www.cephb.fr/cephdb/ dbSNP Home Page, http://www.ncbi.nlm.nih.gov/SNP/index.html. Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo/ (for expression data for the 167 individuals utilized in the present study [GEO accession number GSE1726]). Gene Ontology Consortium, http://www.geneontology.org/ (for the GO database). Kyoto Encyclopedia of Genes and Genomes (KEGG) Genes Database, http://www.genome.ad.jp/kegg/genes.html. NCBI Reference Sequence, http://www.ncbi.nlm.nih.gov/RefSeq/ (for Pip5K2a [accession number NM_005028], Pip5K1a [accession number NM_003557], cubilin [accession number NM_001081], HLA-DRB5 [accession number NM_002125], HLA-G [accession number NM_002127], HLA-DPB1 [accession number NM_002121], and HLA-DRB3 [accession number V00522]). References Almasy L, Blangero J (1998) Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 62:1198–1211 [PubMed] Almasy L, Dyer TD, Blangero J (1997) Bivariate quantitative trait linkage analysis: pleiotropy versus co-incident linkages. Genet Epidemiol 14:953–958 [PubMed] doi: 10.1002/(SICI)1098-2272(1997)14:6<953::AID-GEPI65>3.0.CO;2-K. Belmaker RH, Shapiro J, Vainer E, Nemanov L, Ebstein RP, Agam G (2002) Reduced inositol content in lymphocyte-derived cell lines from bipolar patients. Bipolar Disord 4:67–69 [PubMed] doi: 10.1034/j.1399-5618.2002.00108.x. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300. Botstein D, White RL, Skolnick M, Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:314–331 [PubMed] Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755 [PubMed] doi: 10.1126/science.1069516. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen KY, Morley M, Spielman RS (2003) Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet 33:422–425 [PubMed] doi: 10.1038/ng1094. Cowles CR, Hirschhorn JN, Altshuler D, Lander ES (2002) Detection of regulatory variation in mouse genes. Nat Genet 32:432–437 [PubMed] doi: 10.1038/ng992. Dausset J, Cann H, Cohen D, Lathrop M, Lalouel JM, White R (1990) Centre d’etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics 6:575–577 [PubMed] Fulker DW, Chernew SS, Cardon LR (1994) Multipoint interval mapping of quantitative trait loci, using sib pairs. Am J Hum Genet 56:1224–1233. Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning. Springer-Verlag, New York. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH, Linsley PS (2001) Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 19:342–347 [PubMed] doi: 10.1038/86730. Hughes TR, Roberts CJ, Dai H, Jones AR, Meyer MR, Slade D, Burchard J, Dow S, Ward TR, Kidd MJ, Friend SH, Marton MJ (2000) Widespread aneuploidy revealed by DNA microarray expression profiling. Nat Genet 25:333–337 [PubMed] doi: 10.1038/77116. Jansen RC, Nap JP (2001) Genetical genomics: the added value from segregation. Trends Genet 17:388–391 [PubMed] doi: 10.1016/S0168-9525(01)02310-1. Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, Gibson G (2001) The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet 29:389–395 [PubMed] doi: 10.1038/ng766. Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K (2002) A high-resolution recombination map of the human genome. Nat Genet 31:241–247 [PubMed] Marton MJ, DeRisi JL, Bennett HA, Iyer VR, Meyer MR, Roberts CJ, Stoughton R, Burchard J, Slade D, Dai H, Bassett DE, Jr., Hartwell LH, Brown PO, Friend SH (1998) Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat Med 4:1293–1301 [PubMed] doi: 10.1038/3282. Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422:297–302 [PubMed] doi: 10.1038/nature01434. van’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530–536 [PubMed] doi: 10.1038/415530a. Whitney AR, Diehn M, Popper SJ, Alizadeh AA, Boldrick JC, Relman DA, Brown PO (2003) Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci USA 100:1896–1901 [PubMed] doi: 10.1073/pnas.252784499. Williams JT, Van Eerdewegh P, Almasy L, Blangero J (1999) Joint multipoint linkage analysis of multivariate qualitative and quantitative traits. I. Likelihood formulation and simulation results. Am J Hum Genet 65:1134–1147 [PubMed] Yan H, Yuan W, Velculescu VE, Vogelstein B, Kinzler KW (2002) Allelic variation in human gene expression. Science 297:1143 [PubMed] doi: 10.1126/science.1072545. Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet 35:57–64 [PubMed] doi: 10.1038/ng1222. Zhu J, Lum PY, Lamb J, GuhaThakurta D, Edwards SW, Thieringer R, Berger JP, Wu MS, Thompson J, Sachs AB, Schadt EE (2004) An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet Genome Res 105:363–374 [PubMed] doi: 10.1159/000078209. |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||
Am J Hum Genet. 1980 May; 32(3):314-31.
[Am J Hum Genet. 1980]Trends Genet. 2001 Jul; 17(7):388-91.
[Trends Genet. 2001]Nat Genet. 2001 Dec; 29(4):389-95.
[Nat Genet. 2001]Science. 2002 Apr 26; 296(5568):752-5.
[Science. 2002]Nat Genet. 2003 Sep; 35(1):57-64.
[Nat Genet. 2003]Nat Genet. 2002 Nov; 32(3):432-7.
[Nat Genet. 2002]Science. 2002 Aug 16; 297(5584):1143.
[Science. 2002]Nat Genet. 2003 Mar; 33(3):422-5.
[Nat Genet. 2003]Genomics. 1990 Mar; 6(3):575-7.
[Genomics. 1990]Nat Biotechnol. 2001 Apr; 19(4):342-7.
[Nat Biotechnol. 2001]Nat Med. 1998 Nov; 4(11):1293-301.
[Nat Med. 1998]Nature. 2002 Jan 31; 415(6871):530-6.
[Nature. 2002]Nat Genet. 2002 Jul; 31(3):241-7.
[Nat Genet. 2002]Nat Genet. 2000 Jul; 25(3):333-7.
[Nat Genet. 2000]Am J Hum Genet. 1998 May; 62(5):1198-211.
[Am J Hum Genet. 1998]Genet Epidemiol. 1997; 14(6):953-8.
[Genet Epidemiol. 1997]Am J Hum Genet. 1999 Oct; 65(4):1134-47.
[Am J Hum Genet. 1999]Am J Hum Genet. 1998 May; 62(5):1198-211.
[Am J Hum Genet. 1998]Nature. 2003 Mar 20; 422(6929):297-302.
[Nature. 2003]Science. 2002 Apr 26; 296(5568):752-5.
[Science. 2002]Nature. 2003 Mar 20; 422(6929):297-302.
[Nature. 2003]Nat Genet. 2003 Sep; 35(1):57-64.
[Nat Genet. 2003]Nat Biotechnol. 2001 Apr; 19(4):342-7.
[Nat Biotechnol. 2001]Bipolar Disord. 2002 Feb; 4(1):67-9.
[Bipolar Disord. 2002]Cytogenet Genome Res. 2004; 105(2-4):363-74.
[Cytogenet Genome Res. 2004]Science. 2002 Apr 26; 296(5568):752-5.
[Science. 2002]Nature. 2003 Mar 20; 422(6929):297-302.
[Nature. 2003]Nat Genet. 2003 Sep; 35(1):57-64.
[Nat Genet. 2003]Proc Natl Acad Sci U S A. 2003 Feb 18; 100(4):1896-901.
[Proc Natl Acad Sci U S A. 2003]