• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of geneticsGeneticsCurrent IssueInformation for AuthorsEditorial BoardSubscribeSubmit a Manuscript
Genetics. Jul 2006; 173(3): 1555–1570.
PMCID: PMC1526686

Rapid Evolution of Major Histocompatibility Complex Class I Genes in Primates Generates New Disease Alleles in Humans via Hitchhiking Diversity


A plausible explanation for many MHC-linked diseases is lacking. Sequencing of the MHC class I region (coding units or full contigs) in several human and nonhuman primate haplotypes allowed an analysis of single nucleotide variations (SNV) across this entire segment. This diversity was not evenly distributed. It was rather concentrated within two gene-rich clusters. These were each centered, but importantly not limited to, the antigen-presenting HLA-A and HLA-B/-C loci. Rapid evolution of MHC-I alleles, as evidenced by an unusually high number of haplotype-specific (hs) and hypervariable (hv) (which could not be traced to a single species or haplotype) SNVs within the classical MHC-I, seems to have not only hitchhiked alleles within nearby genes, but also hitchhiked deleterious mutations in these same unrelated loci. The overrepresentation of a fraction of these hvSNV (hv1SNV) along with hsSNV, as compared to those that appear to have been maintained throughout primate evolution (trans-species diversity; tsSNV; included within hv2SNV) tends to establish that the majority of the MHC polymorphism is de novo (species specific). This is most likely reminiscent of the fact that these hsSNV and hv1SNV have been selected in adaptation to the constantly evolving microbial antigenic repertoire.

THE human major histocompatibility complex (MHC; also known as HLA), a minute (4-Mb) segment of the genome, harbors the full range of challenges awaiting a genome-scale search for predisposing loci to complex disorders (Hirschhorn and Daly 2005). The MHC is characterized by a set of highly polymorphic (>1700 alleles) antigen-presenting HLA class I and II genes, embedded within well over 230 loci, collectively associated with >100 pathologies (HLA 2004). Remarkably, recent genomewide scans have shown that, for a majority of these diseases, the MHC remains the first and foremost genetic component to pathogenesis (Onengut-Gumuscu and Concannon 2002). However, to date, with few exceptions (see below), it has been extremely difficult to identify genuine mutations/polymorphisms at the origin of the observed associations. This in large part has been due to two facts. First, the lack (until recently) of anonymous markers (that is, in addition to the highly polymorphic MHC genes themselves), i.e., microsatellites and SNPs. Second, the presence of a strong degree of linkage disequilibrium across the region, which yields to the existence of extended haplotypes (Ceppellini et al. 1955), a well-established fact that has gained recent momentum, given the initiation of the international HapMap project (Gabriel et al. 2002). To alleviate both these hurdles, it is necessary, following the report of the first human MHC sequence (MHC Sequencing Consortium 1999), to sequence in fine significant numbers of single MHC haplotypes. As medically important as the identification of the molecular basis of HLA-disease association is, a more fundamental question is, why are so many diseases linked to the MHC in the first place? The simple answer resides “somewhere” in the fact that MHC is polymorphic, which immediately raises a second question: Through which mechanism(s) has this level of diversity been created and maintained within the MHC? The answer to the second question (and by inference to the first) has occupied the field for the last 30 years. Here we aim to capitalize on sequence analysis of both human and nonhuman primate MHC haplotypes to answer these important questions.



Cell lines:

Genomic DNA was extracted from the HLA homozygous AKIBA (HLA-A24, -B52, -DR15 haplotype, IHW number 9286), LKT3 (HLA-A24, -B54, -DR4 haplotype, IHW number 9107), and JPKO (HLA-A33, -B44, -DR13 haplotype) cell lines (kindly provided by F. Numano, T. Kaneko, and Y. Ishikawa) (http://www.ecacc.org.uk/), representing the highest (8.2%), fourth highest (2.3%), and second highest (5.2%) population frequency, respectively, within the Japanese population (haplotype frequency data of the 11th International Histocompatibility Workshop; http://www.ihwg.org).

Long-range PCR amplifications:

Ninety pairs of primers were designed with the assistance of Primer Express software (Applied Biosystems, Foster City, CA) for long-range PCR (LR-PCR) amplification of 55 expressed or potentially coding genes embedded within the 1.9-Mb HLA class I region and representing the entire coding content of the region linking the centromeric LTB to the telomeric HLA-F (Table S1 at http://www.genetics.org/supplemental/; Figure 1). In brief, the 50-μl amplification reaction contained 500 ng of genomic DNA, 2.5 units of TaKaRa long amplified (LA) Taq polymerase (TaKaRa Shuzo, Othu, Shiga, Japan), 1× PCR buffer, 400 μm of each dNTP, and 0.2 μm of each primer. The cycling parameters were as followes: an initial denaturation of 98°/5 min followed by 30 cycles of 98°/10 sec and 60°, 63°, or 68°/10 min, followed by a final cycle of 72° for 10 min. Two LR-PCR products of the HCG2P7 and HCG4P6 regions were not amplified, confirming the deletion of these two genes in the HLA-A24 haplotype. The LR-PCR size is 5.7 kb on average and ranges from 1757 to 9448 bp (Table S1 at http://www.genetics.org/supplemental/). The entire nucleotide sequence of each gene, including the 5′-flanking region, promoter/enhancer region, exons, introns, and the 3′-flanking region, was determined by direct sequencing with 4403 sequencing primers (Table S2 at http://www.genetics.org/supplemental/). However, the exact sequence length of 55 poly(A) or poly(T) stretches of >10 nucleotides in length could not be determined accurately for 37 LR-PCR fragments amplified from HLA-G, ZNRD1, TRIM40-1, TRIM40-2, TRIM10-1, TRIM15-1, TRIM26-1, TRIM26-6, TRIM39-1, TRIM39-2, TRIM39-3, RANP1, HLA-E, GNL1-1, ABCF1-1, ABCF1-2, ABCF1-3, PPP1R10-1, PPP1R10-2, PPP1R10-3, MRPS18B-1, MRPS18-2, DHX16-1, DHX16-2, MDC1-1, MDC1-2, FLOT1/IER3-1, FLOT1/IER3-2, FLOT1/IER3-3, GTF2H4, CDSN-3, TCF19-1, POU5F1-2, BAT1-1, NFKB/ATP6G-1, NFKB/ATP6G-3, and NFKB/ATP6G-4. In addition, the nucleotide sequences of 15 short genomic regions (40–120 bp) could not be determined by direct sequencing for 13 LA PCR fragments amplified from HLA-F, TRIM10-2, TRIM26-1, TRIM26-4, CAT75X, ABCF1-3, PPP1R10-1, MDC1-2, FLOT1/IER3-2, CDSN-3, C6otf18-2, MICA, and NFKB/ATP6G-1. This sequencing difficulty was caused mainly by repetitive elements such as Alu and LINE sequences, and these regions, therefore, were further determined after sequencing cloned material. The total lengths of the sequenced nucleotides, including overlaps between long-range PCR products, were 535,285, 535,086, and 535,711 bp for LKT3, AKIBA, and JPKO cell lines, respectively. When the overlaps were excluded, the nonredundant nucleotide lengths in the LKT3, AKIBA, and JPKO cell lines were 475,879, 475,686, and 488,433 bp, respectively. The 59.4 kb of overlapping sequence did not have any nucleotide differences due to PCR and/or assembly errors, establishing high-quality sequence data and confirming the homozygous nature of cell lines.

Figure 1.
An operational map of long-range PCR regions spanning from the HLA-F to LTB. Solid and open boxes indicate expressed genes and potential coding regions or pseudogenes, respectively. Striped lines around the HLA-A gene indicate the deleted segment of LKT3 ...

Direct sequencing strategy, assembly, and analyses:

Direct sequencing was performed using the ABI PRISM BigDye terminator cycle sequencing kit with AmpliTaq DNA polymerase (Applied Biosystems) and 4403 custom-designed primers (Table S2 at http://www.genetics.org/supplemental/). Gaps or areas of ambiguity were resolved after sequencing subcloned material (TA cloning kit, Invitrogen, Groningen, the Netherlands). Reactions were run on ABI 377 and 3100 sequencing systems. Assembly and database analyses were performed manually and using computer software following previously established procedures (Shiina et al. 1999).

MHC genomic sequences from three Caucasian cell lines with different HLA haplotypes:

The gene sequences from the Caucasian cell lines COX (IHW 9022, South African Caucasoid consanguineous with A1, B8, Cw7, DR3 haplotype), PGF (IHW 9318; European Caucasoid consanguineous with the A3, B7, Cw7, DR15 haplotype), and QBL (IHW 9020; European Caucasoid consanguineous A26, B18, DR3 haplotype) were obtained from the Wellcome Trust Sanger Institute (ftp://ftp.sanger.ac.uk/) (Stewart et al. 2004).

Rhesus macaque (Macaca mulatta):

Bacterial artificial chromosome clones, contig map, and sequencing of the macaque MHC class I region:

One bacterial artificial chromosome (BAC) library, CHORI-250, constructed from white blood cells of the male rhesus macaque was obtained from the BACPAC Resource Center at the Children's Hospital Oakland Research Institute (Oakland, CA). Hybridization screenings were performed following the recommended protocols. Hybridization probes, ~1 kb in length, were PCR generated from the macaque MHC class I and MIC genes (exons 2–4) as well as from 15 unique non-MHC genes—LTB, POU5F1, HCR, CDSN, GTF2H4, DDR1, FLOT1, DHX16, ABCF1, CAT75X, TRIM26, TRIM10, TRIM31, C6orf12, and ETF1P1—and seven MHC-based sequence-tagged sites (STS), using cloned macaque genomic DNA as template. The final contig map was constructed by comparison with the complete sequence of the human MHC (MHC Sequencing Consortium 1999). A total of 122 BACs were thus isolated and assembled into a single contig after Southern hybridizations with clone-derived PCR products and EcoRI fragments. Of these, 25 BACs defining a minimal tiling path linking both ends of the contig were selected (Figure S1 at http://www.genetics.org/supplemental/). They were subjected to complete and bidirectional shotgun sequencing with an average 7.2× redundancy, which was sufficient for assembly and analysis of the entire sequence using previously established procedures (Shiina et al. 1999) (Figure S1 at http://www.genetics.org/supplemental/). The total length of the contig, linking BAT3 to Mamu-F, was established as 3,284,914 bp (Figure S1 and Table S3 at http://www.genetics.org/supplemental/). All clone overlaps were ascertained at the nucleotide level. The sequenced animal being MHC heterozygous, the reported sequence was derived from both chromosomes. BACs 251L06–144G11, covering 1600 kb from BAT3 to LOC285833, and BACs 118F10–25A4, spanning 400 kb from HCGII-16 to RPN2P1, were derived from one haplotype, whereas BACs 399F22–151J3, covering 400 kb linking LOC285833 to HCGII-16, as well as BACs 164G20–48M22, covering 900 kb from RPN2P1 to P5-1-44, were derived from the other haplotype. The genomic sequence was assembled into a single contig from 24 overlapping segments for the 25 BAC clones. Three overlaps located on haplotype boundaries (144G11/399F22, 151J3/118F10, and 25A4/164G20) were ascertained by significantly high nucleotide identities (>99.9% in at least 2 kb). On the other hand, the other 20 overlaps belonging to the same haplotypes were established through complete nucleotide identity. The obtained sequence was annotated using our previously published human and chimpanzee sequence data (Shiina et al. 1999; Anzai et al. 2003) as well as those publicly available at NCBI (http://www.ncbi.nlm.nih.gov/locuslink/). Sequence alignments were performed and homologies were determined using the programs contained within the Genetyx v11 (http://www.sdc.co.jp/genetyx). The calculation of nucleotide diversity was performed through pairwise sequence alignments by MAVID (http://baboon.math.berkeley.edu/mavid/) with three human (two Japanese and COX haplotypes) and chimpanzee sequences (Anzai et al. 2003). The diversity profile was then drawn using the graphics output of Microsoft Excel. All insertion/deletions (indels) were removed from the alignments to standardize the number of nucleotides examined within each window. Well after our experimental work was finished, another macaque MHC genomic sequence was published. However, among other things, this sequence is not annotated (Daza-Vamenta et al. 2004) and hence our sequence is formally the first annotated macaque MHC sequence.

Chimpanzee (Pan troglodytes):


Genomic DNA was extracted from the chimpanzee MHC heterozygous Ericka (Patr-A0601/0901, -B0101/1701, -C0401/0601 haplotype) and Borie (Patr-A0301/0401, -B0101/2401, -C0401/0901 haplotype) cell lines (kindly provided by Peter Parham at Stanford University). These individuals belong to Pan troglodytes Verus as ascertained through mitochondrial DNA D-loop region analysis (data not shown).

DNA typing of MHC class I genes:

To determine the allelic sequences of Ericka and Borie Patr-A, -B, and -C loci, long-range PCR amplifications were performed (Table S4 at http://www.genetics.org/supplemental/). The products were subsequently subcloned by the TOPO XL PCR cloning system (Invitrogen), and eight clones for each allele were sequenced (Table S5 at http://www.genetics.org/supplemental/). The genomic sequences of Patr-A, -B, and -C genes in the Ericka and Borie cell lines matched perfectly those found in DNA databases.

LR-PCR, sequencing, and analysis:

Among 98 pairs of human LR-PCR amplicons, 93 were well amplified in chimpanzees. The following 5 pairs, however, did not: TRIM26-4, PPP1R10-2, TUBB, POU5F1-1, and POU5F1-2. These pairs were redesigned with the assistance of Primer Express software (Applied Biosystems) (Table S2 at http://www.genetics.org/supplemental/). The LR-PCR procedure was the same as for humans. The entire nucleotide sequence of each gene, including the 5′-flanking region, the promoter/enhancer region, exons, introns, and the 3′-flanking region, was determined by direct sequencing with 4174 (3987 human primers and 187 chimpanzee newly designed primers) sequencing primers (Tables S2 and S5 at http://www.genetics.org/supplemental/) as described above (Human). However, the exact sequence length of poly(A) or poly(T) stretches of >10 nucleotides in length, microsatellite repeats, as well that of the following 10 short genomic regions (50–400 bp) in seven LR-PCR fragments, amplified from Patr-F, HCG4P6, C6orf12-3, TRIM26-4, CAT75X, MDC1-4, and POU5F1-2. These sequencing difficulties, in addition to gaps or areas of ambiguity, were resolved after sequencing subcloned material (TA cloning kit, Invitrogen). The nonredundant nucleotide length in the Ericka and Borie cell lines was 472,506 and 472,528 bp, respectively.

Calculation of the statistical significance of SNP numbers in hitchhiked areas:

Classical MHC class I genes such as HLA-A, -B, and -C in humans and Patr-A, -B, and -C in chimpanzees are designated as “hitchhiking attackers” (as they are the prime polymorphic loci) while the remainder of the loci are considered “hitchhiking receivers.” “Hitchhiked” areas are therefore the result of hitchhiking receivers being “attacked” by hitchhiking attackers. This allowed us to perform hierarchical statistical analyses by considering the percentage of SNPs in hitchhiking receivers in each species. Significant differences were calculated by t-test between the average and region-specific percentage of SNPs (37 in humans and 34 in chimpanzees), excluding HLA-A, -B, and -C after arc sin transformation. Subsequently, 12 regions in humans (HLA-F, HLA-G, HCG2P7, HCG4P6, HCG9, HCG8, CDSN, C6orf18, TCF19, MICA, HCP5, and 3.8-1.1) have a significantly increased percentage of SNPs (P < 0.05). Moreover, there was a significant difference (P = 0.00001) between these 12 gene regions and the other 25 gene regions. In chimpanzees, 11 gene regions (Patr-F, HCG2P7, HCG4P6, HCG9, HCG8, RNF39, TRIM31, TRIM40, CDSN, C6orf18, and TCF19) had a significantly high percentage of SNPs (P < 0.05). Finally, there was a significant difference (P < 0.0001) between these 11 gene regions and the other 23 gene regions.


Our target is the telomeric half of the MHC. This 1.9-Mb segment links the centromeric LTB to the telomeric HLA-F in humans and the syntenic segments in chimpanzees and in the rhesus macaque. This segment is known to contain the most polymorphic genes of the MHC and therefore of vertebrate genomes (HLA 2004). Moreover, it is also the sole segment of the MHC subject to intense interspecies variability, perhaps reminiscent of the selective microbial pressure facing each of these species (Kumanovics et al. 2003). In humans, there are 55 expressed or potentially expressed genes in this area, including 8 HLA class I (HLA-A/B/C/E/F/G) and class I-related (MICA/B) genes (Bahram et al. 1994; Shiina et al. 1999). Using LR-PCR on genomic DNA of three Japanese HLA homozygous individual (JPKO) or typing cell lines—AKIBA (HLA-A24, -B52, -DR15), LKT3 (A24, B54, DR4), and JPKO (A33, B44, DR13)—all these 55 loci (with the exception of 2 in LKT3 and AKIBA) were amplified within a set of 100 amplicons partitioned in 38 mini-contigs (40 in JPKO) (Figure 1; Table S1 at http://www.genetics.org/supplemental/). These were fully sequenced using 4403 primers (Table S2 at http://www.genetics.org/supplemental/) yielding nonredundant nucleotide lengths of 475,879, 475,686, and 488,433 bp for LKT3, AKIBA, and JPKO, respectively. The complete sequence of three other cell lines—COX (A1, B8, Cw7, DR3), PGF (A3, B7, Cw7, DR15), and QBL (A26, B18, DR3)—this time Caucasoid, was extracted from the Wellcome Trust Sanger Institute (ftp://ftp.sanger.ac.uk/) (Stewart et al. 2004).

Cross-comparing all six cell lines using a 487,229-bp template (once indels had been excluded) (Table 1) unveiled 2761 single nucleotide variations (SNV) (1/176 or 0.57/100 bp), with 389 embedded in coding regions among which a majority of 221 were nonsynonymous (Table 1). [Two types of nucleotide diversity are dealt with in this article: intra- and interspecies; while the former is typically called SNP, the usage of the same term to refer to the latter is incorrect as the “P” in SNP refers to polymorphism, which, by definition, is intraspecies. Hence, to avoid further confusion and achieve homogeneity within this work, SNV is used throughout to refer to both intra- and interspecies nucleotide diversity (for further subdivisions, see below)]. The SNV were not evenly distributed and showed a strong regional bias, which we dissect below. The greatest overall diversity was centered around two MHC-I-bearing islands, i.e., the centromeric HLA-B/-C/MICA and the telomeric HLA-A islands where substitutions were significantly above the 0.57 (0.24) average diversity index (DI) (the number in parentheses refers to the normalized percentage of SNVs) (Figure 2; Table 1; see materials and methods for calculation of statistical significance and Figure 3's legend for calculation of DI). Interestingly, diversity was not limited to these immune response genes (Figure 2; Table 1) (HLA 2004). For instance, while HLA-A [DI: 5.20 (2.46)] and HLA-B [DI: 4.60 (2.17)] represent the penultimate and ultimate diverse genes within the MHC class I region, respectively, 250-kb peri-HLA-A (linking HLA-F to HCG8) and 350-kb peri-HLA-B/-C (linking CDSN to 3.8-1.1) segments displayed significant (P < 0.05) above-average DIs of 0.69 (0.31)–0.83 (0.31) and 1.01 (0.44)–0.81 (0.34), respectively (Figure 2; Table 1). Intervening loci displayed a lesser-than-average diversity of 0.50 (0.21)–0.27 (0.10) (Figure 2; Table 1). At the opposite end of the spectrum were a number of loci with almost no diversity (<0.26%) (Figure 2; Table 1). Therefore, unexpectedly, the data presented here unveil the remarkable fact that MHC diversity is not limited to antigen/T-cell receptor (TCR) interacting sites of the HLA class I molecules (Bjorkman and Parham 1990), but spreads to the surrounding loci. These data document, the long suspected, but never proven, hitchhiking diversity within the MHC (Smith and Haigh 1974; Thomson 1977; Nei et al. 1997; Takahata and Satta 1998; Satta et al. 1999; Gaudieri et al. 2000; O'hUigin et al. 2000).

Figure 2.
Genomic diversity plot of the human and chimpanzee MHC class I region. Diversity plot drawn upon comparison of six human haplotypes (red) and four chimpanzee haplotypes (blue). Arrows below the plot depict the location of identified disease susceptibility ...
Figure 3.
Definition of SNV categories in a sample sequence. a, b, c, d, e, hv1, and hv2 indicate SNV generated after birth of human haplotypes; SNV generated in humans after speciation of humans and chimpanzees but before birth of human haplotypes; SNV generated ...
SNV analysis of the MHC class I region genes in six human haplotypes

To further advance on the extent of this haplotypic diversity, we sequenced the equivalent region in the rhesus macaque, a primate species of prime biomedical relevance as well as the chimpanzee, our closest primate relative. The 3,284,914-bp (3.28-Mb) Mamu class I region, linking BAT3 to Mamu-F, was significantly (1.3 and 1.45 Mb, respectively) larger than the equivalent segments in humans (1.9 Mb) (Shiina et al. 1999) and chimpanzees (1.75 Mb) (Anzai et al. 2003) (Figure S1 at http://www.genetics.org/supplemental/ and Figure 4) and contained a total of 312 genes (Table S3 at http://www.genetics.org/supplemental/)—45 expressed, 37 potentially expressed—and 230 pseudogenes (Figure S1 and Table S3 at http://www.genetics.org/supplemental/). The region contained a remarkable—at least 64—MHC-I, the reason behind its expansion (Figure 4) in comparison with 18 and 17 class I loci in HLA (humans) and Patr (chimpanzees) regions, respectively (Figure 4). Among the 64 MHC-I, 23 were expressed or putatively expressed, including 5 already known (Mamu-A1, -A2, -B12, -B15, and -AG3), as well as 18 that were previously unaccounted for, the remaining 41 being pseudogenes. These results are partially by recent independent efforts aimed at unraveling the complexity of macaque MHC (Daza-Vamenta et al. 2004; Otting et al. 2005). Among the 18 newly identified MHC-I, 12 were novel Mamu-B's, two were assigned as Mamu-E and -F, and four as Mamu-AG1, -AG2, -AG4, and -AG5 (Table S6 at http://www.genetics.org/supplemental/). No -C locus homolog was identified and, as previously reported, all Mamu-G were pseudogenes (Boyson et al. 1997) (Figure 4). Upon aligning putative peptide and/or TCR-binding sites of these new Mamu-B with HLA counterparts, one can conclude a major diversification of the peptide-binding repertoire of the species (Table S7 and supporting information at http:www.genetics.org/supplemental/), which will eventually widen the epitope-selection opportunities as related to development of simian immunodeficiency virus/simian–human immunodeficiency virus/human immunodeficiency virus vaccines and help to better understand the cytotoxic T-cell response in this important animal model (Yang 2004).

Figure 4.
Comparative genomic map of HLA, Patr, and Mamu class I regions. Lines show orthologous relationship. Mamu-B region from 300 to 1200 kb and Mamu-A, -G, -F region from 2400 to 3285 kb were shown only with respect to Mamu class I loci. Arrows show segments ...

To further our knowledge of the evolutionary descent of MHC-I alleles, we established the sequence of four chimpanzee class I haplotypes (Anzai et al. 2003) (see materials and methods; Tables S4 and S5 at http://www.genetics.org/supplemental/; Table 2). Integration of these data allowed an initial assessment of cross-species SNV content in this immunologically crucial region. The average intrahuman, human–chimpanzee, and human–macaque degrees of nucleotide differences were 0.53/0.14% (including/excluding indels), 6.90/1.29%, and 29.4/6.55%, respectively (Figures 4 and and5).5). As depicted in Figure 6A, Table 3, and Table 4, a total of 25,702 normalized SNV were uncovered following pairwise comparison of all 11 sequences (6 human, 4 chimpanzee, and 1 rhesus macaque) using a common template of 422,762 bp. Not surprisingly, the number of SNV correlated well with the evolutionary time of divergence among species (Figure 6A). Figure 6B shows a breakdown of the SNV count in three categories: non-MHC genes (left pie chart); the nonclassical MHC-E and -F genes (center pie chart, representing “internal controls”; i.e., despite being MHC genes, with at least MHC-E-binding peptides, they show little, if any, diversity); and, finally, the classical, polymorphic, antigen-presenting MHC-A and -B loci (right pie chart). Whereas the left and center pie charts in Figure 6B are very similar, the right pie chart is radically different (Figure 6B; Tables 3 and and4).4). This is due to two overrepresented categories of SNV in the MHC-A and -B genes. While the first category, “a” (red) in humans and “c” (dark blue) in chimpanzees, is “haplotype specific,” as it is new with respect to the virtual common HLA or Patr framework haplotypes, the second category (hv1 and hv2: yellow and green, respectively) corresponds here to “hypervariable SNVs” (hvSNVs) as they cannot be traced to any particular species and/or haplotype(s). hvSNV are further subdivided into hv1 and hv2. The former is defined by positions encompassing two or more nucleotides while invariant in chimpanzees and macaques and the latter encompasses the remainder (see Figure 3 for definitions). The hv2 fraction, therefore, theoretically includes the trans-species (ts) diversity. However, to formally calculate the number of tsSNV within hv2SNV, one needs to have access to the sequence of syntenic genes within our most recent common ancestor or, in more practical terms, sequence a larger number of haplotypes in all three species. In any event and whatever the fraction of tsSNV within hv2SNV, the totality of hv2SNV is still lesser than de novo polymorphism as defined by hsSNV (a + b) + hv1SNV; i.e., 17.46 < 23.4% in Figure 6. These data hence tend to settle the long-debated issue of the origin of MHC diversity where "trans-species polymorphism" (Figueroa et al. 1988; Lawlor et al. 1988) has been opposed to species-specific de novo diversity (Bergstrom et al. 1998). It is also notable that while hsSNV and hv1SNV are enriched within classical MHC-I genes with respect to non-MHC genes (×4.4 for human hsSNV, i.e., “a” in Figure 6; ×7 for chimpanzee hsSNV, i.e., “b” in Figure 6; and ×10.6 for hv1SNV), framework SNV (those common to human or chimpanzee haplotypes) are diminished by a ratio of 12.1, showing that a great deal of functional MHC diversity is species specific and likely aimed at arming each species against the specific microbiological threat that it faces (Figure 6B). Finally, in humans and chimpanzees, SNV were further divided into those located within coding or noncoding regions. In the former, synonymous (dS) vs. nonsynonymous (dN) SNV were further recognized as well as the resulting dN/dS ratios (Tables 1 and and2)2) (Nei and Gojobori 1986). These ratios, however, should be interpreted with caution, because of the weak number of haplotypes analyzed. The application of the McDonald and Kreitman test gave equally nonconclusive results (McDonald and Kreitman 1991).

Figure 5.
SNV vs. indels in the MHC class I region. The aligned sequence (excluding indels) is shown along the horizontal axis and the percentage of nucleotide differences calculated per 1 kb of nonoverlapping windows is shown along the vertical axis. (A) Human ...
Figure 6.
Classification of SNV by species and gene category unveils the origin of MHC diversity. Red designates SNVs generated after birth of four human haplotypes; purple those generated after speciation of chimpanzees and humans but prior to divergence of the ...
SNV analysis of the MHC class I region genes in four chimpanzee haplotypes
SNV analysis of 33 genomic regions within the MHC class I regions of humans, chimpanzee, and macaques
Cross-species SNV analysis

What is the biological significance of this peri-HLA hitchhiking polymorphism? Examining the MHC class I region sequenced here is revealing. Disease association is not random as most, if not all, diseases mapped to this region are linked directly to HLA-B/-C loci and their surroundings (Figure 2). Although for a few of these, the incriminating loci seem to be MHC-I themselves (e.g., ankylosing spondylitis and HLA-B27), for most it is becoming ever more clear that the hitchhiked area mentioned above is incriminated (Oka et al. 1999; Matsuzaka et al. 2002; Okamoto et al. 2003) (Figure 2). In contrast few, if any, disease(s) are found to be associated to the equally polymorphic HLA-A. In light of what has been presented here, this is no longer unexpected, as the peri-HLA-A region is considerably smaller in gene content. Indeed, the region contains only four genes (including an expressed pseudogene), as compared to 18 in the HLA-B/-C segment (Figures 1 and and2).2). This observation seems to remain valid for all the MHC as well. In fact, currently, three diseases have been unequivocally linked to a mutation or indel in the HLA region. These are (in chronological order of their identification): adrenal hyperplasia, which is due to genomic deletions of the MHC class III complement C4-linked 21-hydroxylase gene (White et al. 1985); hypotrichosis simplex of the scalp, an autosomal dominant variant of alopecia, caused by nonsense mutations in the CDSN (corneodesmosin) gene (Levy-Nissenbaum et al. 2003); and the sarcoidosis' HLA-encoded component, which is due to a splice mutation leading to a premature stop codon in the HLA-DRB-linked butyrophilin-like 2 (BTNL2) (Valentonyte et al. 2005). What is the common denominator between these three unrelated mutations? A quick look at the MHC map reveals that they are all within genes located in the immediate vicinity of polymorphic MHC genes: C4 is the single most polymorphic gene in the MHC class III region, HLA-DRB1 has the highest level of diversity among MHC-II genes, and finally, HLA-B/-C tandem genes are the most polymorphic genes in the genome (MHC Sequencing Consortium 1999).

In summary, we demonstrate that the MHC-I diversity is not limited to the antigen/TCR-binding sites but spreads to surrounding segments. Corroborating data suggest that this hitchhiking diversity, otherwise eliminated by purifying selection in most other genomic sites, has perdured within the MHC perhaps because of the strong biological incentive for constant generation and maintenance of a species-specific diverse allelic repertoire (Hill et al. 1991; Kiepiela et al. 2004). This was evidenced by the existence of a large reservoir of hvSNV, reminiscent of the fact that most MHC diversity is de novo generated and not the result of trans-species inheritance as initially thought (Figueroa et al. 1988; Lawlor et al. 1988). This result finally puts the MHC in line with the bulk of population and evolutionary genetics data, which firmly conclude that a narrow bottleneck has occurred at the origin of our species (Cann et al. 1987; Hammer 1995), a fact inconsistent with massive flow of alleles from one species to the next as required by the trans-species postulate (Ayala et al. 1994). Moreover, this fitness in fighting infections seems to have taken its toll on neighboring loci, as the most gene-rich and polymorphic segment around HLA-B is also where most MHC-I diseases are mapped to. Finally, this observation is not limited to the MHC class I region as it extends to the other MHC-disease associations and perhaps to the wider genome.


We thank P. Parham, T. Kaneko, A. Kimura, Y. Ishikawa, and F. Numano for cell lines. PGF, COX, and QBL sequences were produced by “Team 50” at the Wellcome Trust Sanger Institute (ftp://ftp.sanger.ac.uk). This work was supported by a Grant-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan, Tokai University School of Medicine, “Séquençage à Grande Echelle” [Institut National de la Santé et de la Recherche Médicale (INSRM)/Centre National de la Recherche Scientifique/Ministère de la Recherche], Association de Recherche contre le Cancer, Agence de Biomédecine, as well as an INSRM/Japan Society for the Promotion of Science grant jointly awarded to S. Bahram and H. Inoko.


Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AB088082AB088115, AB103588AB103621, AB110931AB110940, AB201549AB201552, and AB202079AB202114 (human); AB210139AB210212 (chimpanzee); AB128049, AB128833ABAB128841, AB128843AB128846, AB128848AB128849, AB128852AB128856, and AB128858AB128860 (macaque).


  • Anzai, T., T. Shiina, N. Kimura, K. Yanagiya, S. Kohara et al., 2003. Comparative sequencing of human and chimpanzee MHC class I regions unveils insertions/deletions as the major path to genomic divergence. Proc. Natl. Acad. Sci. USA 100: 7708–7713. [PMC free article] [PubMed]
  • Ayala, F. J., A. Escalante, C. O'Huigin and J. Klein, 1994. Molecular genetics of speciation and human origins. Proc. Natl. Acad. Sci. USA 91: 6787–6794. [PMC free article] [PubMed]
  • Bahram, S., M. Bresnahan, D. E. Geraghty and T. Spies, 1994. A second lineage of mammalian major histocompatibility complex class I genes. Proc. Natl. Acad. Sci. USA 91: 6259–6263. [PMC free article] [PubMed]
  • Bergstrom, T. F., A. Josefsson, H. A. Erlich and U. Gyllensten, 1998. Recent origin of HLA-DRB1 alleles and implications for human evolution. Nat. Genet. 18: 237–242. [PubMed]
  • Bjorkman, P. J., and P. Parham, 1990. Structure, function, and diversity of class I major histocompatibility complex molecules. Annu. Rev. Biochem. 59: 253–288. [PubMed]
  • Boyson, J. E., K. K. Iwanaga, T. G. Golos and D. I. Watkins, 1997. Identification of a novel MHC class I gene, Mamu-AG, expressed in the placenta of a primate with an inactivated G locus. J. Immunol. 159: 3311–3321. [PubMed]
  • Cann, R. L., M. Stoneking and A. C. Wilson, 1987. Mitochondrial DNA and human evolution. Nature 325: 31–36. [PubMed]
  • Ceppellini, R., M. Siniscalco and C. A. Smith, 1955. The estimation of gene frequencies in a random-mating population. Annu. Hum. Genet. 20: 97–115. [PubMed]
  • Daza-Vamenta, R., G. Glusman, L. Rowen, B. Guthrie and D. E. Geraghty, 2004. Genetic divergence of the rhesus macaque major histocompatibility complex. Genome Res. 14: 1501–1515. [PMC free article] [PubMed]
  • Figueroa, F., E. Gunther and J. Klein, 1988. MHC polymorphism pre-dating speciation. Nature 335: 265–267. [PubMed]
  • Gabriel, S. B., S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy et al., 2002. The structure of haplotype blocks in the human genome. Science 296: 2225–2229. [PubMed]
  • Gaudieri, S., R. L. Dawkins, K. Habara, J. K. Kulski and T. Gojobori, 2000. SNP profile within the human major histocompatibility complex reveals an extreme and interrupted level of nucleotide diversity. Genome Res. 10: 1579–1586. [PMC free article] [PubMed]
  • Hammer, M. F., 1995. A recent common ancestry for human Y chromosomes. Nature 378: 376–378. [PubMed]
  • Hill, A. V., C. E. Allsopp, D. Kwiatkowski, N. M. Anstey, P. Twumasi et al., 1991. Common west African HLA antigens are associated with protection from severe malaria. Nature 352: 595–600. [PubMed]
  • Hirschhorn, J. N., and M. J. Daly, 2005. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6: 95–108. [PubMed]
  • HLA 2004: Immunobiology of the Human MHC, 2004 Proceedings of the 13th International Histocompatibility Workshop and Congress, edited by J.A. Hansen and B. Dupont. IHWG Press, Seattle.
  • Kiepiela, P., A. J. Leslie, I. Honeyborne, D. Ramduth, C. Thobakgale et al., 2004. Dominant influence of HLA-B in mediating the potential co-evolution of HIV and HLA. Nature 432: 769–775. [PubMed]
  • Kumanovics, A., T. Takada and K. F. Lindahl, 2003. Genomic organization of the mammalian MHC. Annu. Rev. Immunol. 21: 629–657. [PubMed]
  • Lawlor, D. A., F. E. Ward, P. D. Ennis, A. P. Jackson and P. Parham, 1988. HLA-A and B polymorphisms predate the divergence of humans and chimpanzees. Nature 335: 268–271. [PubMed]
  • Levy-Nissenbaum, E., R. C. Betz, M. Frydman, M. Simon, H. Lahat et al., 2003. Hypotrichosis simplex of the scalp is associated with nonsense mutations in CDSN encoding corneodesmosin. Nat. Genet. 34: 151–153. [PubMed]
  • Matsuzaka, Y., S. Makino, K. Okamoto, A. Oka, A. Tsujimura et al., 2002. Susceptibility locus for non-obstructive azoospermia is localized within the HLA-DR/DQ subregion: primary role of DQB1*0604. Tissue Antigens 60: 53–63. [PubMed]
  • McDonald, J. H., and M. Kreitman, 1991. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652–654. [PubMed]
  • MHC Sequencing Consortium, 1999. Complete sequence and gene map of a human major histocompatibility complex. Nature 401: 921–923. [PubMed]
  • Nei, M., and T. Gojobori, 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3: 418–426. [PubMed]
  • Nei, M., X. Gu and T. Sitnikova, 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. USA 94: 7799–7806. [PMC free article] [PubMed]
  • O'hUigin, C., Y. Satta, A. Hausmann, R. L. Dawkins and J. Klein, 2000. The implications of intergenic polymorphism for major histocompatibility complex evolution. Genetics 156: 867–877. [PMC free article] [PubMed]
  • Oka, A., G. Tamiya, M. Tomizawa, M. Ota, Y. Katsuyama et al., 1999. Association analysis using refined microsatellite markers localizes a susceptibility locus for psoriasis vulgaris within a 111 kb segment telomeric to the HLA-C gene. Hum. Mol. Genet. 8: 2165–2170. [PubMed]
  • Okamoto, K., S. Makino, Y. Yoshikawa, A. Takaki, Y. Nagatsuka et al., 2003. Identification of I kappa BL as the second major histocompatibility complex-linked susceptibility locus for rheumatoid arthritis. Am. J. Hum. Genet. 72: 303–312. [PMC free article] [PubMed]
  • Onengut-Gumuscu, S., and P. Concannon, 2002. Mapping genes for autoimmunity in humans: type 1 diabetes as a model. Immunol. Rev. 190: 182–194. [PubMed]
  • Otting, N., C. M. Heijmans, R. C. Noort, N. G. de Groot, G. G. Doxiadis et al., 2005. Unparalleled complexity of the MHC class I region in rhesus macaques. Proc. Natl. Acad. Sci. USA 102: 1626–1631. [PMC free article] [PubMed]
  • Satta, Y., H. Kupfermann, Y. J. Li and N. Takahata, 1999. Molecular clock and recombination in primate MHC genes. Immunol. Rev. 167: 367–379. [PubMed]
  • Shiina, T., G. Tamiya, A. Oka, N. Takishima, T. Yamagata et al., 1999. Molecular dynamics of MHC genesis unraveled by sequence analysis of the 1,796,938-bp HLA class I region. Proc. Natl. Acad. Sci. USA 96: 13282–13287. [PMC free article] [PubMed]
  • Smith, J. M., and J. Haigh, 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35. [PubMed]
  • Stewart, C. A., R. Horton, R. J. Allcock, J. L. Ashurst, A. M. Atrazhev et al., 2004. Complete MHC haplotype sequencing for common disease gene mapping. Genome Res. 14: 1176–1187. [PMC free article] [PubMed]
  • Takahata, N., and Y. Satta, 1998. Footprints of intragenic recombination at HLA loci. Immunogenetics 47: 430–441. [PubMed]
  • Thomson, G., 1977. The effect of a selected locus on linked neutral loci. Genetics 85: 753–788. [PMC free article] [PubMed]
  • Valentonyte, R., J. Hampe, K. Huse, P. Rosenstiel, M. Albrecht et al., 2005. Sarcoidosis is associated with a truncating splice site mutation in BTNL2. Nat. Genet. 37: 357–364. [PubMed]
  • White, P. C., D. Grossberger, B. J. Onufer, D. D. Chaplin, M. I. New et al., 1985. Two genes encoding steroid 21-hydroxylase are located near the genes encoding the fourth component of complement in man. Proc. Natl. Acad. Sci. USA 82: 1089–1093. [PMC free article] [PubMed]
  • Yang, O. O., 2004. CTL ontogeny and viral escape: implications for HIV-1 vaccine design. Trends Immunol. 25: 138–142. [PubMed]

Articles from Genetics are provided here courtesy of Genetics Society of America


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...