Science. 2010 Oct 29;330(6004):641-6. doi: 10.1126/science.1197005.
Diversity of human copy number variation and multicopy genes.
Altshuler DL, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Collins FS, De La Vega FM, Donnelly P, Egholm M, Flicek P, Gabriel SB, Gibbs RA, Knoppers BM, Lander ES, Lehrach H, Mardis ER, McVean GA, Nickerson DA, Peltonen L, Schafer AJ, Sherry ST, Wang J, Wilson RK, Gibbs RA, Deiros D, Metzker M, Muzny D, Reid J, Wheeler D, Wang J, Li J, Jian M, Li G, Li R, Liang H, Tian G, Wang B, Wang J, Wang W, Yang H, Zhang X, Zheng H, Lander ES, Altshuler DL, Ambrogio L, Bloom T, Cibulskis K, Fennell TJ, Gabriel SB, Jaffe DB, Shefler E, Sougnez CL, Bentley DR, Gormley N, Humphray S, Kingsbury Z, Koko-Gonzales P, Stone J, McKernan KJ, Costa GL, Ichikawa JK, Lee CC, Sudbrak R, Lehrach H, Borodina TA, Dahl A, Davydov AN, Marquardt P, Mertes F, Nietfeld W, Rosenstiel P, Schreiber S, Soldatov AV, Timmermann B, Tolzmann M, Egholm M, Affourtit J, Ashworth D, Attiya S, Bachorski M, Buglione E, Burke A, Caprio A, Celone C, Clark S, Conners D, Desany B, Gu L, Guccione L, Kao K, Kebbel A, Knowlton J, Labrecque M, McDade L, Mealmaker C, Minderman M, Nawrocki A, Niazi F, Pareja K, Ramenani R, Riches D, Song W, Turcotte C, Wang S, Mardis ER, Wilson RK, Dooling D, Fulton L, Fulton R, Weinstock G, Durbin RM, Burton J, Carter DM, Churcher C, Coffey A, Cox A, Palotie A, Quail M, Skelly T, Stalker J, Swerdlow HP, Turner D, De Witte A, Giles S, Gibbs RA, Wheeler D, Bainbridge M, Challis D, Sabo A, Yu F, Yu J, Wang J, Fang X, Guo X, Li R, Li Y, Luo R, Tai S, Wu H, Zheng H, Zheng X, Zhou Y, Li G, Wang J, Yang H, Marth GT, Garrison EP, Huang W, Indap A, Kural D, Lee WP, Leong WF, Quinlan AR, Stewart C, Stromberg MP, Ward AN, Wu J, Lee C, Mills RE, Shi X, Daly MJ, DePristo MA, Altshuler DL, Ball AD, Banks E, Bloom T, Browning BL, Cibulskis K, Fennell TJ, Garimella KV, Grossman SR, Handsaker RE, Hanna M, Hartl C, Jaffe DB, Kernytsky AM, Korn JM, Li H, Maguire JR, McCarroll SA, McKenna A, Nemesh JC, Philippakis AA, Poplin RE, Price A, Rivas MA, Sabeti PC, Schaffner SF, Shefler E, Shlyakhter IA, Cooper DN, Ball EV, Mort M, Phillips AD, Stenson PD, Sebat J, Makarov V, Ye K, Yoon SC, Bustamante CD, Clark AG, Boyko A, Degenhardt J, Gravel S, Gutenkunst RN, Kaganovich M, Keinan A, Lacroute P, Ma X, Reynolds A, Clarke L, Flicek P, Cunningham F, Herrero J, Keenen S, Kulesha E, Leinonen R, McLaren WM, Radhakrishnan R, Smith RE, Zalunin V, Zheng-Bradley X, Korbel JO, Stütz AM, Humphray S, Bauer M, Cheetham RK, Cox T, Eberle M, James T, Kahn S, Murray L, Chakravarti A, Ye K, De La Vega FM, Fu Y, Hyland FC, Manning JM, McLaughlin SF, Peckham HE, Sakarya O, Sun YA, Tsung EF, Batzer MA, Konkel MK, Walker JA, Sudbrak R, Albrecht MW, Amstislavskiy VS, Herwig R, Parkhomchuk DV, Sherry ST, Agarwala R, Khouri HM, Morgulis AO, Paschall JE, Phan LD, Rotmistrovsky KE, Sanders RD, Shumway MF, Xiao C, McVean GA, Auton A, Iqbal Z, Lunter G, Marchini JL, Moutsianas L, Myers S, Tumian A, Desany B, Knight J, Winer R, Craig DW, Beckstrom-Sternberg SM, Christoforides A, Kurdoglu AA, Pearson JV, Sinari SA, Tembe WD, Haussler D, Hinrichs AS, Katzman SJ, Kern A, Kuhn RM, Przeworski M, Hernandez RD, Howie B, Kelley JL, Melton SC, Abecasis GR, Li Y, Anderson P, Blackwell T, Chen W, Cookson WO, Ding J, Kang HM, Lathrop M, Liang L, Moffatt MF, Scheet P, Sidore C, Snyder M, Zhan X, Zöllner S, Awadalla P, Casals F, Idaghdour Y, Keebler J, Stone EA, Zilversmit M, Jorde L, Xing J, Eichler EE, Aksay G, Alkan C, Hajirasouliha I, Hormozdiari F, Kidd JM, Sahinalp SC, Sudmant PH, Mardis ER, Chen K, Chinwalla A, Ding L, Koboldt DC, McLellan MD, Dooling D, Weinstock G, Wallis JW, Wendl MC, Zhang Q, Durbin RM, Albers CA, Ayub Q, Balasubramaniam S, Barrett JC, Carter DM, Chen Y, Conrad DF, Danecek P, Dermitzakis ET, Hu M, Huang N, Hurles ME, Jin H, Jostins L, Keane TM, Le SQ, Lindsay S, Long Q, MacArthur DG, Montgomery SB, Parts L, Stalker J, Tyler-Smith C, Walter K, Zhang Y, Gerstein MB, Snyder M, Abyzov A, Balasubramanian S, Bjornson R, Du J, Grubert F, Habegger L, Haraksingh R, Jee J, Khurana E, Lam HY, Leng J, Mu XJ, Urban AE, Zhang Z, Li Y, Luo R, Marth GT, Garrison EP, Kural D, Quinlan AR, Stewart C, Stromberg MP, Ward AN, Wu J, Lee C, Mills RE, Shi X, McCarroll SA, Banks E, DePristo MA, Handsaker RE, Hartl C, Korn JM, Li H, Nemesh JC, Sebat J, Makarov V, Ye K, Yoon SC, Degenhardt J, Kaganovich M, Clarke L, Smith RE, Zheng-Bradley X, Korbel JO, Humphray S, Cheetham RK, Eberle M, Kahn S, Murray L, Ye K, De La Vega FM, Fu Y, Peckham HE, Sun YA, Batzer MA, Konkel MK, Walker JA, Xiao C, Iqbal Z, Desany B, Blackwell T, Snyder M, Xing J, Eichler EE, Aksay G, Alkan C, Hajirasouliha I, Hormozdiari F, Kidd JM, Chen K, Chinwalla A, Ding L, McLellan MD, Wallis JW, Hurles ME, Conrad DF, Walter K, Zhang Y, Gerstein MB, Snyder M, Abyzov A, Du J, Grubert F, Haraksingh R, Jee J, Khurana E, Lam HY, Leng J, Mu XJ, Urban AE, Zhang Z, Gibbs RA, Bainbridge M, Challis D, Coafra C, Dinh H, Kovar C, Lee S, Muzny D, Nazareth L, Reid J, Sabo A, Yu F, Yu J, Marth GT, Garrison EP, Indap A, Leong WF, Quinlan AR, Stewart C, Ward AN, Wu J, Cibulskis K, Fennell TJ, Gabriel SB, Garimella KV, Hartl C, Shefler E, Sougnez CL, Wilkinson J, Clark AG, Gravel S, Grubert F, Clarke L, Flicek P, Smith RE, Zheng-Bradley X, Sherry ST, Khouri HM, Paschall JE, Shumway MF, Xiao C, McVean GA, Katzman SJ, Abecasis GR, Blackwell T, Mardis ER, Dooling D, Fulton L, Fulton R, Koboldt DC, Durbin RM, Balasubramaniam S, Coffey A, Keane TM, MacArthur DG, Palotie A, Scott C, Stalker J, Tyler-Smith C, Gerstein MB, Balasubramanian S, Chakravarti A, Knoppers BM, Peltonen L, Abecasis GR, Bustamante CD, Gharani N, Gibbs RA, Jorde L, Kaye JS, Kent A, Li T, McGuire AL, McVean GA, Ossorio PN, Rotimi CN, Su Y, Toji LH, Tyler-Smith C, Brooks LD, Felsenfeld AL, McEwen JE, Abdallah A, Juenger CR, Clemm NC, Collins FS, Duncanson A, Green ED, Guyer MS, Peterson JL, Schafer AJ, Abecasis GR, Altshuler DL, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA.
Source
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA.
Abstract
Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.
- PMID:
- 21030649
- [PubMed - indexed for MEDLINE]
- PMCID:
- PMC3020103
Free PMC ArticleFig. 2
Validation and application. (A) Single-channel array CGH data are highly correlated (r = 0.95) with read depth–based genotypes for the highly duplicated TBC1D3 gene (copy number range 5 to 53). Note the reduced copy number of this gene family among Europeans (color coding as in Fig. 1C). (B) Heatmap of a 340-kbp region proximal to the fascioscapulohumeral muscular dystrophy (FSHD) region on chromosome 4 identifies a polymorphic segmental duplication ranging from 5 to 8 copies. In the human reference genome (build 36) this segment is annotated as a single copy (i.e., unique), but all humans carry duplications mapping to chromosomes 4, 13, 14, and 21.
Science. 2010 October 29;330(6004):641-646.
Fig. 4
Paralog-specific copy number resolution and genotyping. (A) Schematic showing SUN identifiers among four high-identity duplications. SUNs (orange) uniquely distinguish one duplicated copy from all others, in contrast to paralogous sequence variants (PSVs, blue), which may be shared among copies. (B) Resolving duplication mirror effects with paralog-specific genotyping. Total read depth and array CGH fail to distinguish the origin of copy number variation between two high-identity (98.5%) segmental duplications mapping to chromosome 1p13.1 and 7q11.23. SUN read-depth mapping, however, predicts that copy number variation is restricted to 7q11.23 and not 1p13.1. FISH on these samples confirms copy number gains and losses on 7q11.23 (fig. S51).
Science. 2010 October 29;330(6004):641-646.
Fig. 1
Landscape of human copy number variation. (A) CNV heatmap of a 734-kbp duplicated region flanking the 17q21.31 MAPT locus in 13 individuals (11 sequenced to high coverage). Read depth–based copy number (CN) estimations (3-kbp windows) are indicated by color (scale provided to the right). FISH at two separate loci validates these absolute CN predictions across five individuals (9). (B) Copy number landscape of the 17q21.31 locus across three different populations showing marked population stratification (159 genomes analyzed). A European-enriched duplication overlaps the gene KIAA1267 and is present on two haplotypes—along form (205 kbp) and a short form (155 kbp). A 210-kbp duplication of the NSF gene ranges from two to six copies with increased copy number in Asians. For validation with array CGH, see fig. S31. (C) Copy number frequency histograms of the KIAA1267 and NSF duplications based on median read depth predict discrete copies. Duplications of the KIAA1267 locus are specific to Europeans at a frequency of 72%. 25% of Asians have six copies of NSF.
Science. 2010 October 29;330(6004):641-646.
Fig. 3
Human gene family copy number diversity and evolution. (A) The genes most stratified by copy number in the human genome on the basis of Vst analysis of European, African, and Asian populations. (B) Human-specific gene family expansions.
Science. 2010 October 29;330(6004):641-646.
Fig. 5
Paralog-specific gene family copy number variation. (A) Paralog-specific copy number estimates of 990 duplicated genes show that most, on average, are diploid within the human species (median psCN = 2 ± 0.5), and nearly half show little variation in copy. Among 49.2% of duplicated genes, deviation from the median copy occurs rarely (≤5% of individuals). By contrast, genes outside of segmental duplications and other known regions of copy number variation are nearly devoid of common CNVs (blue), even when genotyping with randomly subsampled positions (gray) to mimic the restricted density of SUN markers within duplicated genes. (B) Population stratification and paralog-specific copy variability of a human expanded-gene family of unknown function, NBPF (neuroblastoma breakpoint gene family). Certain paralogs (e.g., NBPF1) are highly amplified, extremely variable, and stratified by population, whereas others are nearly fixed and diploid (e.g., NBPF7).
Science. 2010 October 29;330(6004):641-646.
Publication Types
MeSH Terms
Secondary Source ID
Grant Support
Full Text Sources
Other Literature Sources
Research Materials