- Journal List
- HHS Author Manuscripts
- PMC5759959

Loci associated with skin pigmentation identified in African populations
Nicholas G. Crawford
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Derek E. Kelly
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
2Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
Matthew E. B. Hansen
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Marcia H. Beltrame
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Shaohua Fan
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Shanna L. Bowman
3Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
4Department of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Ethan Jewett
5Department of Electrical Engineering and Computer Sciences, University of California–Berkeley, Berkeley, CA 94704, USA
6Department of Statistics, University of California–Berkeley, Berkeley, CA 94704, USA
Alessia Ranciaro
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Simon Thompson
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Yancy Lo
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Susanne P. Pfeifer
7School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
Jeffrey D. Jensen
7School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
Michael C. Campbell
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
8Department of Biology, Howard University, Washington, DC 20059, USA
William Beggs
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Farhad Hormozdiari
9Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
10Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA
Sununguko Wata Mpoloka
11Department of Biological Sciences, University of Botswana, Gaborone, Botswana
Gaonyadiwe George Mokone
12Department, of Biomedical Sciences, University of Botswana School of Medicine, Gaborone, Botswana
Thomas Nyambo
13Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dares Salaam, Tanzania
Dawit Wolde Meskel
14Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia
Gurja Belay
14Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia
Jake Haut
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Harriet Rothschild
15Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children’s Hospital and Dana Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
Leonard Zon
15Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children’s Hospital and Dana Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
16Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
Yi Zhou
15Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children’s Hospital and Dana Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
17Harvard Stem Cell Institute, Harvard University, Cambridge, MA 02138, USA
Michael A. Kovacs
18Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
Mai Xu
18Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
Tongwu Zhang
18Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
Kevin Bishop
19Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Jason Sinclair
19Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Cecilia Rivas
20Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Eugene Elliot
20Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Jiyeon Choi
18Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
Shengchao A. Li
21Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
22Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD 21701, USA
Belynda Hicks
21Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
22Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD 21701, USA
Shawn Burgess
19Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Christian Abnet
21Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
Dawn E. Watkins-Chow
20Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Elena Oceana
23Department of Molecular Pharmacology, Physiology and Biotechnology, Brown University, Providence, RI 02912, USA
Yun S. Song
5Department of Electrical Engineering and Computer Sciences, University of California–Berkeley, Berkeley, CA 94704, USA
6Department of Statistics, University of California–Berkeley, Berkeley, CA 94704, USA
24Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
25Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
26Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
Eleazar Eskin
27Department of Computer Science, Department of Human Genetics, University of California–Los Angeles, Los Angeles, CA 90095, USA
Kevin M. Brown
18Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
Michael S. Marks
3Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
4Department of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
Stacie K. Loftus
20Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
William J. Pavan
20Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
Meredith Yeager
21Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
22Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., Frederick, MD 21701, USA
Stephen Chanock
21Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
Sarah Tishkoff
1Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
25Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
Abstract
Despite the wide range of skin pigmentation in humans, little is known about its genetic basis in global populations. Examining ethnically diverse African genomes, we identify variants in or near SLC24A5, MFSD12, DDB1, TMEM138, OCA2 and HERC2 that are significantly associated with skin pigmentation. Genetic evidence indicates that the light pigmentation variant at SLC24A5 was introduced into East Africa by gene flow from non-Africans. At all other loci, variants associated with dark pigmentation in Africans are identical by descent in southern Asian and Australo-Melanesian populations. Functional analyses indicate that MFSD12 encodes a lysosomal protein that affects melanogenesis in zebrafish and mice, and that mutations in melanocyte-specific regulatory regions near DDB1/TMEM138 correlate with expression of UV response genes under selection in Eurasians.
Variation in epidermal pigmentation is a striking feature of modern humans. Human pigmentation is correlated with geographic and environmental variation (Fig. 1). Populations at lower latitudes have darker pigmentation than populations at higher latitudes, suggesting that skin pigmentation is an adaptation to differing levels of ultraviolet radiation (UVR) (1). Because equatorial regions receive more UVR than temperate regions, populations from these regions (including sub-Saharan Africans, South Asians, and Australo-Melanesians) have darker pigmentation (Fig. 1), which likely mitigates the negative impact of high UVR exposure such as skin cancer and folate degradation (1). In contrast, the synthesis of vitamin D3 in response to UVR, needed to prevent rickets, may drive selection for light pigmentation at high latitudes (1).

(A) Global variation in skin pigmentation indicated by Melanin index (Ml). This data was integrated with Ml data for global populations from (1, 102). (B) Mean erythemal dose rate. (C) Manhattan plot of -log10 transformed p-values from GWAS of skin pigmentation with the Illumina Omni5 SNP array. (D) Quantile-Quantile plot of observed vs expected p-values from the GWASIn both (C) and (D) significant SNPs at p < 5 × 10−8 are highlighted in purple. (E to L) Allele frequencies of genetic variants associated with skin pigmentation in global populations. African populations included are 1. Ethiopia Nilosaharan, 2. Ethiopia Omotic, 3. Ethiopia and Tanzania Cushitic, 4. Ethiopia Semitic, 5. Tanzania Nilosaharan, 6. Tanzania Hadza, 7. Tanzania Sandawe, 8. Botswana Bantu 9. Botswana San/Bantu admixed, and 10. Botswana San. The Melanesian (MEL) samples are from (12) and the Australian Aboriginal and Papua New Guinean samples (merged) are from the SGDP (PNG) (13). All other populations are from the TGP (10). NonAboriginal populations in the Americans are indicated: CEU (European ancestry), ASW (African-American Southwest US) and ACB (African Caribbean in Barbados).
The basal layer of human skin contains melanocytes, specialized pigment cells that harbor subcellular organelles called melanosomes in which melanin pigments are synthesized and stored and then transferred to keratinocytes (2). Melanosome morphology and content differs between melanocytes that synthesize mainly eumelanins (black-brown pigments) or pheomelanins (pigments which range from yellow to reddish-brown) (3). Variation in skin pigmentation is due to the type and quantity of melanins generated, melanosome size, and the manner in which keratinocytes sequester and degrade melanins (4).
While over 350 pigmentation genes have been identified in animal models, only a subset of these genes have been linked to normal variation in humans (5). Of these, there is limited knowledge about loci that affect pigmentation in populations with African ancestry (6, 7).
Skin pigmentation is highly variable within Africa
To identify genes affecting skin pigmentation in Africa, we used a DSM II ColorMeter to quantify light reflectance from the inner arm as a proxy for melanin levels in 2,092 ethnically and genetically diverse Africans living in Ethiopia, Tanzania, and Botswana (table S1 and figs. S1 and S2) (8). Skin pigmentation levels vary extensively among Africans, with darkest pigmentation observed in Nilo-Saharan speaking pastoralist populations in Eastern Africa and lightest pigmentation observed in San hunter-gatherer populations from southern Africa (Fig. 2 and table S1).

Histograms of melanin index computed from under-arm measurements with a DSM II ColorMeter (68) for all individuals in each population as described in (67). Skin tones were visualized by displaying the mean red, green, and blue values from the ColorMeter for individuals binned by melanin index.
A locus associated with light skin color in Europeans is common in East Africa
We genotyped 1,570 African individuals with quantified pigmentation levels using the Illumina Infinium Omni5 Genotyping array. After quality control, we retained ~4.2 million biallelic single nucleotide polymorphisms (SNPs) for analysis. A GWAS analysis with linear mixed models, controlling for age, sex, and genetic relatedness (9), identified four regions with multiple significant associations (p-value < 5 × 10−8) (Fig. 1, fig. S3, and tables S2 and S3).
We then performed fine-mapping using local imputation of high coverage sequencing data from a subset of 135 individuals and data from the Thousand Genomes Project (TGP) (10) (Fig. 3 and table S3). We ranked potential causal variants within each locus using CAVIAR, a fine-mapping method that accounts for linkage disequilibrium (LD) and effect sizes (11) (Table 1). We characterized global patterns of variation at these loci using whole genome sequences from West African, Eurasian and Australo-Melanesian populations (10,12,13).

Plot of - log10(p-value) versus genomic position for variants near the four regions with most strongly associated SNPs from GWAS, including annotations for genes, MITF ChIP-seq data for melanocytes (45), a CTCF ChIP-seq track for NHEK keratinocytes, and H3K27ac, DNase hypersensitive sites (DHS), and chromHMM tracks for melanocytes and keratinocytes from the Roadmap Epigenomics dataset (29). Genome-wide significant variants are highlighted in red. Circles, squares, and triangles denote noncoding, synonymous, and non-synonymous variants, respectively. (A) SLC24A5 locus, (B) MFSD12 locus, (C) DDB1/TMEM138, locus, and (D) OCA2/HERC2 locus.
Table 1
Top candidate causal variants for the four regions identified based on analysis with CAVIAR (11). For each variant, the genomic position (Location), RSID, and Ancestral>Derived alleles are shown, with the allele associated with dark pigmentation in bold. Beta and standard error (Beta(SE)) and the p-values from the GWAS (F test, linear mixed model) are given. For functional genomic data, nearest genes are given and variants overlapping DHS sites for melanocytes (E059) (DHS melanocytes) and/or other cell types (DHS other) available from Roadmap Epigenomics are indicated with X (29, 45). Variants intersecting regions with enhancer activity confirmed by luciferase expression assays are indicated with X (Luciferase Activity) as are chromatin interactions with-nearby genes measured in MCF7 or K562 cell lines as identified by ChlA-PET (Chromatin Interactions) (46, 47). Those variants where luciferase activity was tested were labeled with Y (significant enhancer activity) or N (no enhancer activity) (fig. S7). SNPs that are in strong LD (r2 > 0.7 in East Africans) are numerically labeled in the column titled LD Block.
| Location | RSID | Ancestral> Derived | Beta(SE) | p-value | DHS Melanocytes | DHS Other | Luciferase Activity | LD Block | Nearest Gene | Chromatin Interactions | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 15:48485926 | rs2413887 | T>C | 7.70(0.44) | 4.9 × 10−62 | 1 | CTXN2 | MYEF2 | ||||
| 15:48426484 | rs1426654 | G>A | 7.69(0.44) | 5.5 × 10−62 | X | X | N | 1 | SLC24A5 | ||
| 15:48392165 | rs1834640 | G>A | .56(0.44) | 3.2 × 10−61 | X | 1 | SLC24A5 | ||||
| 15:48400199 | rs2675345 | G>A | 7.62(0.44) | 6.7 × 10−61 | 1 | SLC24A5 | |||||
| 15:48460188 | rs8028919 | G>A | −4.95(0.41) | 5.0 × 10−32 | 1 | MYEF2 | |||||
| 19:3545022 | rs10424065 | C>T | 4.48(0.48) | 5.1 × 10−20 | X | X | Y | 2 | MFSD12 | CACTIN,MFSD12 | |
| 19:3544892 | rs56203814 | C>T | −4.38(0.50) | 3.6 × 10−18 | X | Y | 2 | MFSD12 | CACTIN,MFSD12 | ||
| 19:3566631 | rs111317445 | C>T | 3.51(0.42) | 1.7 × 10−16 | N | 3 | HMG20B | MFSD12 | |||
| 19:3547955 | rs10414812 | C>T | 4.38(0.53) | 3.8 × 10−16 | X | Y | 2 | MFSD12 | CACTIN,FZR1,MFSD12 | ||
| 19:3565599 | rs112332856 | T>C | 3.52(0.43) | 3.8 × 10−16 | X | X | Y | 3 | HMG20B | MFSD12 | |
| 19:3565253 | rs6510760 | G>A | 3.54(0.45) | 6.5 × 10−15 | X | X | Y | 3 | MFSD12 | MFSD12 | |
| 19:3545150 | rs73527942 | T>G | −3.58(0.47) | 4.8 × 10−14 | X | X | Y | 2 | MFSD12 | CACTIN,MFSD12 | |
| 19:3547685 | rs142317543 | C>T | 6.99(0.92) | 5.0 × 10−14 | X | X | Y | 2 | MFSD12 | CACTIN,FZR1,MFSD12 | |
| 19:3566513 | rs7254463 | C>T | 2.90(0.50) | 9.0 × 10−9 | X | N | 3 | HMG20B | MFSD12 | ||
| 19:3565357 | rs7246261 | C>T | 2.71(0.47) | 1.1 × 10−8 | X | X | 3 | HMG20B | MFSD12 | ||
| 19:3565909 | rs6510761 | T>C | 2.79(0.50) | 2.2 × 10−8 | X | 3 | HMG20B | MFSD12 | |||
| 11:61137147 | rs7948623 | A>T | −2.94(0.44) | 2.2 × 10−11 | X | X | Y | 4 | TMEM138 | TKFC,DDB1, TMEM138 | |
| 11:61148456 | rs397709980 | G/GA | −2.90(0.43) | 2.4 × 10−11 | X | N | 4 | TMEM216 | |||
| 11:61152630 | rs4453253 | C>T | −2.85(0.43) | 5.4 × 10−11 | X | X | N | 4 | TMEM216 | CYB561A3,TKFC,DDB1,TMEM138,TMEM216 | |
| 11:61153401 | rs4939520 | C>T | −2.79(0.43) | 1.4 × 10−10 | X | N | 4 | TMEM216 | CYB561A3,TKFC,DDB1,TMEM138,TMEM216 | ||
| 11:61142943 | rs4939519 | C>T | −2.47(0.39) | 2.8 × 10−10 | X | 4 | TMEM138 | TMEM138 | |||
| 11:61106525 | rs2512809 | C>T | −2.93(0.47) | 7.4 × 10−10 | N | 5 | TKFC | TKFC,DDB1, TMEM138 | |||
| 11:61046876 | rs11230658 | T>C | 3.01(0.49) | 9.4 × 10−10 | 5 | VWCE | |||||
| 11:61084180 | rs12289370 | G>A | 2.99(0.49) | 1.3 × 10−9 | X | 5 | DDB1 | TKFC,DDB1 | |||
| 11:61144652 | rs1377457 | C>A | −3.01(0.49) | 1.5 × 10−9 | X | 5 | TMEM138 | CYB561A3,TKFC,DDB1,TMEM138 | |||
| 11:61088140 | rs7934735 | G>T | 2.98(0.49) | 1.5 × 10−9 | 5 | DDB1 | |||||
| 11:61141476 | rs7394502 | G>A | −2.41(0.40) | 1.6 × 10−9 | 5 | TMEM138 | TMEM138 | ||||
| 11:61141164 | rs10897155 | C>T | −2.41(0.40) | 1.6 × 10−9 | Y | 5 | TMEM138 | TMEM138 | |||
| 11:61139869 | rs11230678 | G>A | −2.41(0.40) | 1.7 × 10−9 | 5 | TMEM138 | TMEM138 | ||||
| 11:61115821 | rs148172827 | C/CATCAA | −2.95(0.49) | 1.8 × 10−9 | X | X | Y | 5 | TKFC | CYB561A3,TKFC,DDB1,TMEM138 | |
| 11:61144707 | rs1377458 | C>T | −2.40(0.40) | 2.1 × 10−9 | X | 5 | TMEM138 | CYB561A3,TKFC,DDB1,TMEM138 | |||
| 11:61076372 | rs11230664 | C>T | 2.95(0.49) | 2.1 × 10−9 | X | N | 5 | DDB1 | DDB1 | ||
| 11:61122878 | rs7951574 | G>A | 2.92(0.49) | 2.8 × 10−9 | 5 | CYB561A3 | |||||
| 11:61054892 | rs1108769 | A>C | 2.82(0.47) | 3.0 × 10−9 | X | 5 | VWCE | TKFC,DDB1 | |||
| 11:61141259 | rs57265008 | T>C | 2.34(0.39) | 3.7 × 10−9 | Y | 4 | TMEM138 | TMEM138 | |||
| 11:61222635 | rs3017597 | G>A | −2.77(0.47) | 5.4 × 10−9 | 5 | SDHAF2 | |||||
| 11:61075524 | rs12275843 | T>C | 2.64(0.45) | 5.5 × 10−9 | 5 | DDB1 | DDB1 | ||||
| 11:61043773 | rs73490303 | G>C | 2.67(0.46) | 7.2 × 10−9 | 5 | VWCE | VWCE | ||||
| 11:61018855 | rs653173 | A>G | 2.68(0.46) | 8.2 × 10−9 | X | 5 | PGA5 | ||||
| 11:61063156 | rs10897150 | G>T | 2.79(0.48) | 8.8 × 10−9 | X | 5 | VWCE | TKFC,DDB1,VWCE | |||
| 11:61108974 | rs2260655 | G>A | 2.63(0.46) | 9.0 × 10−9 | 5 | TKFC | CYB561A3,TKFC,DDB1,TMEM138 | ||||
| 11:61152028 | rs12791961 | C>A | 2.90(0.50) | 9.7 × 10−9 | X | 5 | TMEM216 | CYB561A3,TKFC,DDB1,TMEM138,TMEM216 | |||
| 11:61055014 | GACTA/G | 2.61(0.45) | 1.1 × 10−8 | X | 5 | VWCE | TKFC,DDB1 | ||||
| 11:61080557 | rs7120594 | T>C | 2.58(0.45) | 1.2 × 10−8 | 5 | DDB1 | |||||
| 11:61044470 | rs9704187 | G>C | 2.58(0.45) | 1.3 × 10−8 | 5 | VWCE | VWCE | ||||
| 11:61106892 | rs2513329 | G>C | −2.73(0.48) | 1.6 × 10−8 | N | 5 | TKFC | CYB561A3,TKFC,DDB1,TMEM138,TMEM216 | |||
| 11:61033525 | rs2001746 | T>A | 2.56(0.45) | 1.7 × 10−8 | 5 | VWCE | |||||
| 11:61112802 | rs2305465 | C>T | 2.62(0.46) | 1.8 × 10−8 | X | X | 5 | TKFC | CYB561A3,TKFC,DDB1,TMEM138 | ||
| 11:61037389 | ATT/A | 2.51(0.45) | 3.5 × 10−8 | 5 | VWCE | ||||||
| 15:28514281 | rs4932620 | C>T | −2.85(0.48) | 3.2 × 10−9 | 6 | HERC2 | |||||
| 15:28532639 | rs1667393 | C>T | −2.82(0.48) | 6.3 × 10−9 | X | N | 6 | HERC2 | |||
| 15:28535675 | rs1635167 | C>T | −2.88(0.50) | 8.9 × 10−9 | 6 | HERC2 | |||||
| 15:28545148 | rs2905952 | A>G | −3.16(0.55) | 9.0 × 10−9 | 6 | HERC2 | |||||
| 15:28396894 | rs12915877 | T>G | −2.76(0.48) | 1.1 × 10−8 | X | 6 | HERC2 | ||||
| 15:28487069 | rs4932618 | G>A | −2.69(0.47) | 1.6 × 10−8 | 6 | HERC2 | |||||
| 15:28235773 | rs1800404 | C>T | 2.54(0.45) | 1.6 × 10−8 | X | 7 | OCA2 | ||||
| 15:28238158 | rs1868333 | G>A | −2.53(0.45) | 2.2 × 10−8 | 7 | OCA2 | |||||
| 15:28419497 | . | TA/T | −3.73(0.67) | 2.6 × 10−8 | 6 | HERC2 | |||||
| 15:28238895 | rs735066 | A>G | −2.50(0.45) | 3.5 × 10−8 | X | 7 | OCA2 |
The SNPs with strongest association with skin color in Africans were on chromosome 15 at or near the Solute Carrier Family 24 Member 5 (SLC24A5) gene (Figs. 1 and and33 and tables S2 and S3). A functional non-synonymous mutation within SLC24A5 (rsl426654) (14) was significantly associated with skin color (F-test, p-value = 5.5 × 10−62) and was identified as potentially causal by CAVIAR (Table 1). The rsl426654 (A) allele is at high frequency in European, Pakistani, and Indian populations (Fig. 1) and is a target of selection in Europeans, Central Asians and North Indians (15-17). In Africa this variant is common (28-50% frequency) in populations from Ethiopia and Tanzania with high Afroasiatic ancestry (18, 19) and is at moderate frequency (5-11%) in San and Bantu populations from Botswana with low levels of East African ancestry and recent European admixture (20, 21) (Fig. 1 and figs. S2 and S4). We observe a signature consistent with positive selection at SLC24A5 in Europeans on the basis of extreme values of Tajima’s D statistic (fig. S5).
Based on coalescent analysis with sequence data from the Simons Genomic Diversity Project (SGDP) (13), the time to most recent common ancestor (TMRCA) of most Eurasian lineages containing the rsl426654 (A) allele is 29 kya (95% credible interval (CI) 28-31 kya), consistent with prior studies (6, 16) (Fig. 4). Haplotype analysis indicates that the rsl426654 (A) variant in Africans is on the same extended haplotype background as Europeans (Fig. 5 and fig. S6), likely reflecting gene flow from western Eurasia over at least the past 3-9 ky (22). The rsl426654 (A) variant is at high frequency (28%) in Tanzanian populations, suggesting a lower bound (~5 kya) for introduction of this allele into East Africa, the time of earliest migration from Ethiopia into Tanzania (23). Further, the frequency of the rsl426654 (A) variant in eastern and southern Africans exceeds the inferred proportion of non-African ancestry (figs. S2 and S4). Estimates of genetic differentiation (FST) at the rsl426654 SNP between the West African Yoruba (YRI) and Ethiopian Amhara populations is 0.76, among the top 0.01% of values on chromosome 15 (table S4). These results are consistent with selection for the rsl426654 (A) allele in African populations following introduction, although complex models of demographic history cannot be ruled out.

Inferred genealogies for regions flanking candidate causal loci. Each leaf corresponds to a single sampled chromosome from one of 278 individuals in the Simons Genome Diversity Project (13). Leaf nodes are colored by the population of origin of the individual and sequences carrying the light allele are indicated with an open dot. Node heights and 95% credible intervals are presented for a subset of internal nodes. Gene genealogies are shown for regions flanking (A) SLC24A5, rs1426654 (15:48426484), (B) MFSD12, rs10424065 (19:3545022), (C) MFSD12, rs6510760 (19:3565253), (D) TMEM138, rs7948623 (11:61137147), (E) DDB1, rs11230664 (11:61076372), (F) OCA2, rs1800404 (15:28235773), (G) HERC2, rs4932620 (15:28514281), and (H) HERC2, rs6497271 (15:28365431).

Median joining haplotype networks of regions containing candidate causal variants. Connections between circles indicate genetic relatedness while size is relative to the frequency of haplotypes. Ancestry proportions are displayed as pie charts. Yellow/red subfigures indicate which haplotypes contain the allele associated with dark pigmentation (red) or light pigmentation (yellow). (A) 75kb region flanking the causal variant at SLC24A5. (B and C) 3kb regions flanking rs10424065 in MFSD12 and rs6510760 upstream of MFSD12. (D) 195 kb region flanking DDB1 extending from PGA5 through SDHAF2. (E, F, and G) 50 kb flanking regions 1, 3, and 2 at OCA2 and FIERC2 (ordered based on highest to lowest probability of being causal from CAVIAR analysis).
A lysosomal transporter protein associated with skin pigmentation
The region with the second strongest genetic association with skin pigmentation contains the Major Facilitator Superfamily Domain Containing 12 (MFSD12) gene on chromosome 19 (Figs. 1 and and33 and tables S2 and S3). MFSD12 is homologous to other genes containing MFS domains, conserved throughout vertebrates, which function as transmembrane solute transporters (24). MFSD12 mRNA levels are low in de-pigmented skin of vitiligo patients (25), likely due to autoimmune related destruction of melanocytes.
The MFSD12 locus is in a region with extensive recombination, enabling us to fine-map eight potentially causal SNPs (Table 1 and table S3) that cluster in two regions: one within MFSD12 and the other ~7,600–9,000 bp upstream of MFSD12 (Fig. 3). Many SNPs are in predicted regulatory regions active in melanocytes and/or keratinocytes (Table 1 and Fig. 3) and show enhancer activity in luciferase expression assays in a WM88 melanoma cell line (Table 1, table S5, and fig. S7). Within MFSD12, the two SNPs that CAVIAR identifies as having highest probability of being causal are rs56203814 (F-test, p-value = 3.6 × 10−18), a synonymous variant within exon 9, and rsl0424065 (F-test, p-value = 5.1 × 10−20), located within intron 8. They are 130 bp apart, in strong LD, and impact gene expression in luciferase expression assays (1.5 – 2.7 × higher expression than the minimal promoter; fig. S7). The SNPs upstream of MFSD12 with highest probability of being causal are rsll2332856 (F-test, p-value = 3.8 × 10−16) and rs6510760 (F-test, p-value = 6.5 × 10−15). They are 346 bp apart, in strong LD, and impact gene expression in luciferase expression assays (4.0 - 19.7 × higher expression than the minimal promoter; fig. S7).
The derived rs56203814 and rsl0424065 (T) alleles associated with dark pigmentation are present only in African populations (or those of recent African descent) and are most common in East African populations with Nilo-Saharan ancestry (Fig. 1 and fig. S4). Coalescent analysis of the SGDP dataset indicates that the rsl0424065 (T) allele predates the 300 kya origin of modern humans (26) (estimated TMRCA of 612 kya, 95% CI 515-736 kya) (Fig. 4).
At rs6510760 and rsll2332856, the ancestral (G) and (T) alleles, respectively, associated with light pigmentation, are nearly fixed in Europeans and East Asians and are common in San as well as Ethiopian and Tanzanian populations with Afroasiatic ancestry (Fig. 1 and fig. S4). The derived rs6510760 (A) and rs112332856 (C) alleles (associated with dark pigmentation) are common in all sub-Saharan Africans except the San, as well as in South Asian and Australo-Melanesian populations (Fig. 1 and fig. S4). Haplotype analysis places the rs6510760 (A) allele (and linked rsll2332856 (C) allele) in Australo-Melanesians on similar haplotype backgrounds relative to central and eastern Africans (Fig. 5 and fig. S6), suggesting they are identical by descent from an ancestral African population. Coalescent analysis of the SGDP dataset indicates that the TMRCA for the derived rs6510760 (A) allele is 996 kya (95% CI 0.82-1.2 mya; Fig. 4).
We do not detect evidence for positive selection at MFSD12 using Tajima’s D and iHS statistics [figs. S5 and S8, as expected if selection were ancient (27)]. However, levels of genetic differentiation are elevated when comparing East African Nilo-Saharan and western European (CEU) populations (e.g., FST=0.85 for rsll2332856, top 0.05% on chromosome 19), consistent with differential selection at this locus (28) (table S4).
MFSD12 is within a cluster of 10 genes with high expression levels in primary human melanocytes relative to primary human keratinocytes (29), with MFSD12 the most differentially expressed (90X; table S6). The genomic region (chr19:3541782-3581062) encompassing MFSD12 and neighboring gene HMG20B (a transcription factor common in melanocytes) has numerous DNase I hypersensitive sites and is enriched for H3K27ac enhancer marks in melanocytes (top 0.1% genome-wide; Fig. 3), suggesting this region may regulate expression of genes critical to melanocyte function (30).
Analyses of gene expression using RNA-sequencing data from 106 primary melanocyte cultures (table S7), indicates that African ancestry is significantly correlated with decreased MFSD12 gene expression (Pearson Correlation Coefficient (PCC) p-value = 5.0 × 10−2; fig. S9). We observed significant associations between genotypes at rs6510760 and rsll2332856 with expression of HMG20B (Bonferroni adjusted p-value (Padj) < 4.9 × 10−3) and MFSD12 (Padj < 3.4 × 10−2) (fig. S9). In each case, the alleles associated with dark pigmentation correlate with decreased gene expression. Allele-specific expression (ASE) analysis indicates that individuals heterozygous for either rs6510760 or rsll2332856 show increased allelic imbalance, relative to homozygotes, for MFSD12 (Mann-Whitney-Wilcoxon (MWW) test, p-value = 4.9 × 10−3 and 1.3 × 10−2, respectively), consistent with regulation of gene expression in cis. A haplotype containing the rs6510760(A)/rsll2332856(C) variants associated with dark pigmentation showed 4.9 times lower expression in luciferase assays than the haplotype containing rs6510760(G)/rs112332856(T) variants associated with light pigmentation [Kruskal-Wallis Rank Sum (KWRS) Test, p- value=7.7 × 10−7; fig. S7 and table S5]. We did not have power to detect an association between expression of MFSD12 and rs56203814 or rsl0424065 due to low frequency (~2%) of the alleles associated with dark pigmentation in the primary melanocyte cultures.
MFSD12 suppresses eumelanin biogenesis in melanocytes from lysosomes
We silenced expression of the mouse ortholog of MFSD12 (Mfsdl2) using small hairpin RNAs (shRNAs) in immortalized melan-Ink4a mouse melanocytes derived from C57BL/6J-Ink4a−/− mice (31) which almost exclusively make eumelanin (Fig. 6). Reduction of Mfsd12 mRNA by ~80% with two distinct lentivirally encoded shRNAs (Fig. 6A) caused a 30-50% increase in melanin content compared to control cells (Fig. 6B), and a higher percentage of melanosomes per total cell area in most cells compared to cells transduced with nontarget shRNA (Fig. 6, C and D). A fraction of MFSD12-depleted cells harbored large clumps of melanin in autophagosome-like structures (fig. S10). These data suggest that MFSD12 suppresses eumelanin content in melanocytes and may offset autophagy.

Immortalized melan-lnk4a melanocytes expressing non-target (sh NT) shRNA or either of two shRNA plasmid clones (#1 and 2) targeting Mfsd12 were analyzed for Mfsd12 mRNA content by (A) qRT-PCR, (B) melanin content by spectrophotometry, or (C) percent of cell area containing melanin by bright field microscopy; (D) shows quantification. Data in (A) to (C) represent mean ± s.e.m., normalized to sh NT samples, from three separate experiments; in (C), n(sh NT) = 97 cells, n (shMfsd12 #1) = 68 cells, n (shMfsd12 #2) = 71 cells. Bar in (C), 10 μm. (E to G) Melan-ink4a melanocytes transiently expressing MFSD12-HA (E) or not transfected (F) and (G) were fixed, immunolabeled for HA (E) and for LAMP2 to mark lysosomes E) and (G) or for TYRP1 to mark melanosomes (G), and analyzed by immunofluorescence and bright field microscopy. Bright field (melanin) images show pigmented melanosomes (pseudocolored red in the merged images). Insets, 4x magnification of boxed regions. Arrows, MFSD12-containing structures that overlap LAMP2 (E) or TYRP1-containing structures that overlap melanosomes; arrowheads, structures that do not overlap. Bars, 10 μm. (H) Quantification of overlap for structures labeled by MFSD12, TYRP1, LAMP2 and pigment. Data represent mean ± s.e.m. from three independent experiments, n = 17 cells (MFSD12 overlap with LAMP2 and melanin), 33 cells (TYRP1 overlap with melanin), or 23 cells (LAMP2 and melanin).
We assessed the localization of human MFSD12 isoform c (RefSeq {"type":"entrez-nucleotide","attrs":{"text":"NM_174983.4","term_id":"566559814"}}NM_174983.4) tagged at the C terminus with the HA epitope (MFSD12-HA). By immunofluorescence microscopy, MFSD12-HA localized to punctate structures throughout the cell. Surprisingly, these puncta, like those labeled by the endogenous lysosomal membrane protein LAMP2, but not the melanosomal enzyme TYRP1, overlapped only weakly with pigmented melanosomes (Fig. 6, E to G; quantified in Fig. 6H). Instead, MFSD12-HA co-localized with LAMP2 (Fig. 6E, quantified in Fig. 6H), indicating that MFSD12 protein localizes to late endosomes and/or lysosomes in melanocytes and not to eumelanosomes.
MFSD12 influences pigmentation in zebrafish xanthophore pigment cells
We targeted transmembrane domain 2 (TMD2) in the highly conserved zebrafish ortholog of mfsdl2a with CRISPR/Cas9 (Fig. 7). We focused on mfsd12a as its paralog mfsd12b is predicted to be a pseudogene (32). Although pigmentation was not qualitatively altered in melanophores, the cells that make eumelanin, compound heterozygotes of mfsd12a alleles exhibited reduced staining of xanthophores, the cells responsible for pteridine-based yellow pigmentation in wild type zebrafish (Fig. 7, A and B) (33, 34). This was not due to a failure of the xanthophores to develop in mfsd12a mutants, since GFP labeled xanthophores were robust along the lateral line in both wild-type and mfsd12a mutant zebrafish (Fig. 7, C and D). Taken together, these results suggest that MFSD12 influences xanthophore pigment production in pterinosomes.

(A and B) Representative images of methylene blue staining in wild-type TAB5. (A) and compound heterozygous mfsd12a zebrafish (6dpf). Note the absence of stained xanthophores in the mfsd12a mutant (B). (C and D) No difference was observed in the number or distribution of xanthophores detected by mosaic Tg(aox5:PALM-GFP) expression in injected wild-type TAB5 (C) or compound heterozygous mutant mfsd12a (D) zebrafish (5dpf). (E) A wild-type agouti mouse (left) is shown with a gray Mfsd12 targeted littermate (right). (F) Flair from the Mfsd12 targeted mouse has grossly normal eumelanin (lower black region of the hair shaft), however, the upper subapical yellow band in WT (E, left) appears white in the Mfsd12 mutant (E, right) due to a reduction in pheomelanin.
Functional characterization of MFSD12 in mice
CRISPR/Cas9 was used to generate a Mfsd12 null allele in a wild-type mouse background (Fig. 7E and fig. S11). Four founders were observed with a uniformly gray coat color, rather than the expected agouti coat color (fig. S11, A and B). These four gray founders harbored deletions at the targeted site (fig. S11C). Microscopic observation revealed a lack of pheomelanin, resulting in white, rather than yellow, banding of hairs in Mfsd12 mutants (Fig. 7F).
The Mfsd12 knockout coat color appeared phenotypically similar to that of grizzled (gr) mice, an allele previously mapped to a syntenic ~2 Mb region overlapping Mfsd12 (35). Like our CRISPR/Cas9 Mfsd12 knockout, homozygous gr/gr mice are characterized by a gray coat resulting from dilution of yellow pheomelanin pigment from the sub-terminal agouti band of the hair shaft. Exome sequencing of an archived gr/gr DNA sample, subsequently confirmed by Sanger sequencing in an independent colony, identified a 9 bp in-frame deletion within exon2 of Mfsd12 (fig. S12) as the sole mutation affecting a coding sequence in this mapped candidate region. The deleted amino acids for the gr/gr allele, Mfsd12 p.Leu163_Ala165del, are in the cytoplasmic loop between the transmembrane domains TM4 and TM5 within a highly conserved MFS domain (fig. S13). These results indicate that mutation of Mfsd12 is responsible for the gray coat color of gr/gr mutant mice, and that loss of Mfsd12 reduces pheomelanin within the hairs of agouti mice.
Taken together, these results indicate that MFSD12 plays a conserved role in vertebrate pigmentation. Depletion of MFSD12 increases eumelanin content in a cell-autonomous manner in skin melanocytes, consistent with the lower levels of MFSD12 expression observed in melanocytes from individuals with African ancestry. Since MFSD12 localizes to lysosomes and not to eumelanosomes, this may reflect an indirect effect through modified lysosomal function. By contrast, loss of MFSD12 has the opposite effect on pheomelanin production, reflecting a more direct effect on function of pheomelanosomes, which have a distinct morphology (3), gene expression profile (36), and, like zebrafish pterinosomes, a potentially different intracellular origin from eumelanosomes (37). While disruption of MFSD12 alone accounts for changes in pigmentation, the role of neighboring loci such as HMG20B on pigmentation remains to be explored.
Skin pigmentation associated loci that play a role in UV response are targets of selection
Another genomic region associated with pigmentation encompasses a ~195 kb cluster of genes on chromosome 11 that play a role in UV response and melanoma risk including the Damage Specific DNA Binding Protein 1 (DDB1) gene (Figs. 1 and and33 and table S3). DDB1 (complexed with DDB2 and XPC) functions in DNA repair (38); levels of DDB1 are regulated by UV exposure and MC1R signaling, a regulatory pathway of pigmentation (39). DDB1 is a component of CUL4-RING E3 ubiquitin ligases that regulate several cellular and developmental processes (40); it is critical for follicle maintenance and female fertility in mammals (41) and for plastid size and fruit pigmentation in tomatoes (42). Knockouts of DDB1 orthologs are lethal in both mouse and fruit fly development (43), and DDB1 only exhibits rare (<1% frequency) non-synonymous mutations in the TGP dataset. Genetic variants near DDB1 were associated with human pigmentation in an African population with high levels of European admixture (7).
Due to extensive LD in this region, CAVIAR identified 33 SNPs predicted to be causal (Table 1). The most strongly associated SNPs are located in a region conserved across vertebrates flanked by TMEM138 and TMEM216 (44) ~36-44 kb upstream of DDB1, and are in high LD within this cluster (r2>0.7 in East Africans) (Fig. 3, Table 1, and table S3). Among these, the most significantly associated SNP is rs7948623 (F-test, p-value = 2.2 × 10−11), located 172 bp downstream of TMEM138, which shows enhancer activity in WM88 melanoma cells (91.9 - 140.8 × higher than the minimal promoter; fig. S7 and table S5).
A second group of tightly linked SNPs (LD r2 > 0.7 in East Africans) with predicted high probability of containing causal variants spans a ~195 kb region encompassing DDB1 and TMEM138 (Table 1 and Fig. 3). Two SNPs that tag this LD block are rs1377457 (F-test, p-value = 1.5× 10−9), located ~7600 bp downstream of TMEM138, and rs148172827 (F-test, p-value = 1.8 × 10−9), an insertion/deletion polymorphism at TKFC (Triokinase And FMN Cyclase) located in an enhancer active in WM88 melanoma cells (67.6 - 76.2 × higher than the minimal promoter; fig. S7 and table S5) which overlaps a MITF binding site in melanocytes (29, 45); both SNPs interact with the promoters of DDB1 and neighboring genes in MCF-7 cells (46, 47) (Table 1 and Fig. 3). SNPs within introns of DDB1 (rs12289370, rs7934735, rs11230664, rs12275843, and rs7120594) also tag this LD block (Table 1 and Fig. 3).
RNA-Seq data from 106 primary melanocyte cultures indicate that African ancestry is significantly correlated with increased DDB1 gene expression (PCC, p-value = 2.6 × 10−5; fig. S9). Association tests using a permutation approach indicated that, of the 35 protein-coding genes with a transcription start site within 1Mb of rs7948623, expression of DDB1 is most strongly associated with a SNP in an intron of DDB1, rs7120594, at marginal statistical significance after correction for ancestry and multiple testing (Padj = 0.06; fig. S9). The allele associated with dark pigmentation at rs7120594 correlates with increased DDB1 expression. We did not have power to detect an association between expression of DDB1 and SNPs in LD with rs7948623 due to low minor allele frequencies (~2%). The role of DDB1 and neighboring loci on human pigmentation remains to be further explored.
The derived rs7948623 (T) allele near TMEM138 (associated with dark pigmentation) is most common in East African Nilo-Saharan populations and is at moderate to high frequency in South Asian and Australo-Melanesian populations (Fig. 1 and fig. S4). At SNP rs11230664, within DDB1, the ancestral (C) allele (associated with dark pigmentation) is common in all sub-Saharan African populations, having the highest frequency in East African Nilo-Saharan, Hadza, and San populations (88-96%), and is at moderate to high frequency in South Asian and Australo-Melanesian populations (12 - 66%) (Fig. 1 and fig. S4). The derived (T) allele (associated with light pigmentation) is nearly fixed in European, East Asian, and Native American populations.
In South Asians and Australo-Melanesians, the alleles associated with darker pigmentation reside on closely related, or identical, haplotypes to those observed in Africa (Fig. 5 and fig. S6), suggesting that they are identical by descent. The TMRCAs for the derived dark allele at rs7948623 and the derived light allele at rs11230664 are estimated to be older than 600 kya and 250 kya, respectively (Fig. 4).
Consistent with a selective sweep, we see an excess of rare alleles (and extreme negative Tajima’s D values) and high levels of homozygosity extending ~350-550 kb in Europeans and Asians, respectively (figs. S5 and S14). We observe extreme negative Tajima’s D values in East African Nilo-Saharans and San over a shorter distance (115 kb and 100 kb, respectively) (fig. S5). A haplotype extending greater than 195 kb is common in Eurasians and rare in Africans (Fig. 5) and tags the alleles associated with light skin pigmentation. The TMRCA of a large number of haplotypes carrying the rs7948623 (A) allele in non-Africans, associated with light pigmentation, is 60 kya (95% CI: 58-62 kya), close to the inferred time of the migration of modern humans out of Africa (48, 49) (Fig. 4). These results, combined with large FST values between Africans and Europeans at SNPs tagging the extended haplotype near DDB1 (e.g., FST = 0.98 between Nilo-Saharans and CEU at rs7948623, within the top 0.01% of values on chromosome 11, table S4) are consistent with differential selection of alleles associated with light and dark pigmentation in Africans and non-Africans at this locus.
Identification of variation at OCA2 and HERC2 impacting skin pigmentation
Another region of significantly associated SNPs encompasses the OCA2 and HERC2 loci on chromosome 15 (Fig. 3 and table S3). HERC2 was identified in GWAS for eye, hair, and skin pigmentation traits (6, 7, 50-52). The oculocutaneous albinism II gene (OCA2, formerly called the P gene) encodes a 12- transmembrane domain-containing chloride transporter protein and affects pigmentation by modulating melanosomal pH (53). The most common types of albinism in Africans are caused by mutations in OCA2 (54).
Due to extensive LD in the OCA2 and HERC2 region, CAVIAR predicted 10 potentially causal SNPs (Table 1) that cluster within three regions. We order these clusters on the basis of physical distance; region 1 is located within OCA2, and regions 2 and 3 are located within introns of HERC2 (Fig. 3).
The SNP with highest probability of being causal from CAVIAR analysis is rs1800404 (F-test, p-value = 1.0), a synonymous variant located in region 1 within exon 10 of OCA2 (Fig. 3, Table 1, and table S3). The ancestral rs1800404 (C) allele, associated with dark pigmentation, is common in most Africans as well as southern/eastern Asians and Australo-Melanesians, whereas the derived (T) allele, associated with light pigmentation, is most common (frequency >70%) in Europeans and San (Fig. 1 and fig. S4). Haplotype (Fig. 5) and coalescent analyses (Fig. 4 and fig. S6) show two divergent clades, one enriched for the rs1800404 (C) allele and the other for the rs1800404 (T) allele. Coalescent analysis indicates that the TMRCA of all lineages is 1.7 mya (95% CI: 1.52.0 mya) and the TMRCA of lineages containing the derived (T) allele is 629 kya (95% CI 426-848 kya) (Fig. 4). The deep coalescence of lineages, and the positive Tajima’s D values in this region in both African and non-African populations (fig. S5), is consistent with balancing selection acting at this locus.
The SNP with highest probability of being causal in region 3 is rs4932620 (F-test, p-value = 3.2 × 10−9) located within intron 11 of HERC2 (Fig. 3, Table 1, and table S3). This SNP is 917 bp from rs916977, a SNP associated with blue-eye color in Europeans (55, 56), and is in strong LD (r2 = 1.0 in most East African populations) with SNPs extending into region 2 of HERC2 (Table 1). The derived rs4932620 (T) allele associated with dark skin pigmentation is most common in Ethiopian populations with high levels of Nilo-Saharan ancestry and is at moderate frequency in other Ethiopian, Hadza, and Tanzania Nilo-Saharan populations (Fig. 1 and fig. S4). Haplotype analysis indicates that the rs4932620 (T) allele in South Asians and Australo-Melanesians is on the same or similar haplotype background as in Africans (Fig. 5 and fig. S6), suggesting it is identical by descent. The TMRCA of haplotypes containing the rs4932620 (T) allele is 247 kya (95% CI: 158-345 kya) (Fig. 4).
We also observe an LD block of SNPs within HERC2 that are associated with skin pigmentation independently of the SNPs described above, though they do not reach genome-wide significance (table S3). These are in a region with enhancer activity in Europeans (50). For example, SNP rs6497271 (F-test, p-value = 1.8 × 10−6), which is located 437 bp from SNP rs12913832, has been associated with skin color in Europeans (50) and is in a consensus SOX2 motif (a transcription factor which modulates levels of MITF in melanocytes) (57) (Fig. 3). The ancestral rs6497271 (A) allele associated with dark pigmentation is on haplotypes in South Asians and Australo-Melanesians similar or identical to those in Africans (Fig. 5 and fig. S6), suggesting they are identical by descent. The derived (G) allele associated with light skin pigmentation is most common in Europeans and San and dates to 921 kya (CI: 700 kya-1.3 mya) (Figs. 1 and and44 and figs. S4 and S6). SNPs associated with pigmentation at all three regions show high allelic differentiation when comparing East African Nilo-Saharans and CEU (FST = 0.72 - 0.85, top 0.5% on chromosome 15) (table S4).
Analyses of RNA-Seq data from 106 primary melanocyte cultures indicates that African ancestry is significantly correlated with increased OCA2 gene expression (PCC, Padj = 6.1 × 10−7) (fig. S9). A permutation approach identified significant associations between OCA2 expression and SNPs within an LD block tagged by rs4932620 extending across regions 2 and 3 (Padj = 2.2 × 10−2). Alleles in this LD block associated with dark pigmentation correlate with increased OCA2 expression. We did not observe associations between the candidate causal variants in region 1 and OCA2 expression despite a high minor allele frequency (34%). However, we observe a significant association between a haplotype tagged by rs1800404 and alternative splicing resulting in inclusion/exclusion of exon 10 (linear regression t test, p-value = 9.1 × 10−40). Exon 10 encodes the amino acids encompassing the third transmembrane domain of OCA2 and is the location of several albinism-associated OCA2 mutations (58, 59), raising the possibility that the shorter transcript encodes a non-functional channel. Comparing splice junction usage across individuals, we estimate that each additional copy of the light rs1800404 (T) allele reduces inclusion of exon 10 by ~20% (95% CI 17.9-21.5%; fig. S9). Homozygotes for the light rsl800404 (T) allele are, therefore, expected to produce ~60% functional OCA2 protein (compared to individuals with albinism who produce no functional OCA2 protein).
Skin pigmentation is a complex trait
To estimate the proportion of pigmentation variance explained by the top eight candidate SNPs at SLC24A5, MFSD12, DDB1/TMEM138 and OCA2/HERC2, we used a linear mixed model with two genetic random effect terms, one based on the genome-wide kinship matrix, and the other based on the kinship matrix derived from the set of significant variants. Approximately 28.9% (S.E. 10.6%) of the pigmentation variance is attributable to these SNPs. Considering each locus in turn and all significantly associated variants (p- value < 5 × 10−8), the trait variation attributable to each locus is: SLC24A5 (12.8%, S.E. 3.5%), MFSD12 (4.5%, S.E. 2.1%), DDB1/TMEM138 (2.2%, S.E. 1.5%), and OCA2/HERC2 (3.9%, S.E. 2.9%). Thus ~29% of the additive heritability of skin pigmentation in Africans is due to variation at these four regions. This observation indicates that the genetic architecture of skin pigmentation is simpler (i.e., fewer genes of stronger effect) than other complex traits, such as height (60). Additionally, most candidate causal variants are in non-coding regions, indicating the importance of regulatory variants influencing skin pigmentation phenotypes.
Evolution of skin pigmentation in modern humans
Skin pigmentation is highly variable within Africa. Populations such as the San from southern Africa are almost as lightly pigmented as Asians (1), while the East African Nilo- Saharan populations are the most darkly pigmented in the world (Fig. 1). Most alleles associated with light and dark pigmentation in our dataset are estimated to have originated prior to the origin of modern humans ~300 ky ago (26). In contrast to the lack of variation at MC1R, which is under purifying selection in Africa (61), our results indicate that both light and dark alleles at MFSD12, DDB1, OCA2, and HERC2 have been segregating in the hominin lineage for hundreds of thousands of years (Fig. 4). Further, the ancestral allele is associated with light pigmentation in approximately half of the predicted causal SNPs; Neanderthal and Denisovan genome sequences, which diverged from modern human sequences 804 kya (62), contain the ancestral allele at all loci. These observations are consistent with the hypothesis that darker pigmentation is a derived trait that originated in the genus Homo within the past ~2 million years after human ancestors lost most of their protective body hair, though these ancestral hominins may have been moderately, rather than darkly, pigmented (63, 64). Moreover, it appears that both light and dark pigmentation has continued to evolve over hominid history.
Individuals from South Asia and Australo-Melanesia share variants associated with dark pigmentation at MFSD12, DDB1/TMEM138, OCA2 and HERC2 that are identical by descent from Africans. This raises the possibility that other phenotypes shared between Africans and some South Asian and Australo-Melanesian populations may also be due to genetic variants identical by descent from African populations rather than convergent evolution (65). This observation is consistent with a proposed southern migration route out of Africa ~80 kya (66). Alternatively, it is possible that light and dark pigmentation alleles segregated in a single African source population (13, 48) and that alleles associated with dark pigmentation were maintained outside of Africa only in the South Asian and Australo-Melanesian populations due to selection.
By studying ethnically, genetically, and phenotypically diverse Africans, we identify novel pigmentation loci that are not highly polymorphic in other populations. Interestingly, the loci identified in this study appear to affect multiple phenotypes. For example, DDB1influences pigmentation (42), cellular response to the mutagenic effect of UVR (39) and female fertility (41). Thus, some of the pigmentation-associated variants identified here may be maintained due to pleiotropic effects on other aspects of human physiology.
It is important to note that genetic variants that do not reach genome-wide significance in our study might also impact the pigmentation phenotype. Indeed, the 1000 most strongly associated SNPs exhibit enrichment for genes involved in pigmentation and melanocyte physiology in the mouse phenotype database and in ion transport and pyrimidine metabolism in humans (table S8). Future research in larger numbers of ethnically diverse Africans may reveal additional loci associated with skin pigmentation and will further shed light on the evolutionary history, and adaptive significance, of skin pigmentation in humans.
Materials and Methods
Individuals in the study were sampled from Ethiopia, Tanzania and Botswana. Written informed consent was obtained from all participants, and research/ethics approval and permits were obtained from all relevant institutions (67). To measure skin pigmentation we used a DSM II ColorMeter to quantify reflectance from the inner under arm. Red reflectance values were converted to a standard melanin index score (68). DNA was extracted from whole blood using a salting out procedure (PureGen).
A total of 1,570 samples were genotyped on the Illumina Omni5M SNP array (5M dataset) that includes ~4.5 million SNPs. Genotypes were clustered and called in Genome Studio software. Variant positions are reported in hg19/37 coordinates. The overall completion rate was 98.8%. Each individual’s sex was verified based on X chromosome inbreeding coefficients. We used Beagle 4.0 (69) to phase the Illumina 5M SNP array data merged with SNPs from the TGP dataset that were filtered to exclude related individuals.
High coverage (>30 X) Illumina Sequencing was performed on a subset of the genotyped individuals (N=135). Variants were called following the approach described in (13). Adapter sequences were trimmed with trimadap. Reads were aligned using bwa mem to the human reference sequence build 37 (hg19). After alignment we marked duplicate reads prior to calling variants with GATK HaplotypeCaller (70). To select high quality variants we employed a two-set filtering strategy. First, we used the GATK variant quality score recalibration to score variant sites. We used TGP, OMIM, and our curated genotypes from the Illumina Omni 5M SNP array as training data. After recalibration we discarded sites with the lowest scores. In addition, we discarded sites in low-complexity regions listed in (71) and duplicate regions identified with Delly (72).
We performed local imputation around each of the regions showing significant associations with skin pigmentation from GWAS using the Illumina Omni 5M SNP dataset. We extracted array genotypes within 1Mb (500 kb upstream and 500 kb downstream) of top GWAS variants from each region and phased them using SHAPEIT2 (73, 74). The reference panel came from two datasets: filtered variants from the 135 African genomes and TGP (10). After phasing, imputation was performed using Minimac3 (75). Imputation performed very well at most loci (R2 > 0.91 with MAF ≥ 0.05) (table S3).
To identify SNPs associated with pigmentation, GWAS was performed first on the Illumina Omni 5M SNP dataset, and independently with imputed variants at candidate regions, using linear mixed models implemented in EMMAX software (9). Age and sex were included as covariates, and we corrected for genetic relatedness with an IBS kinship matrix. We used CAVIAR to identify variants in the imputed dataset most likely to be causal (11). Ontology enrichment for genes near the top 1000 most strongly associated variants from the 5M dataset was obtained using the annotation tool.
GREAT (76)
We estimated the contribution to the variation in melanin index from the top candidate causal variants with a restricted maximum likelihood (REML) analysis implemented in the Genome-wide Complex Trait Analysis (GCTA) software (77). The variance parameters for two genetic relationship matrices (GRMs) are estimated: one GRM is constructed (78) from genome-wide background variants with MAF>0.01, and one GRM is constructed from the set of 8 top pigmentation-associated variants. The contribution of each locus to the melanin index variation is estimated similarly, using all genome-wide significant (p-value <5 × 10−8) variants within each locus to construct the pigmentation-associated GRM (table S3). REML iterations are based on maximizing the Average Information matrix.
To test for neutrality in the regions flanking our top GWAS variants we calculated Tajima’s D, FST, and extended haplotype homozygosity using iHS (79-81). Tajima’s D was measured along chromosomes 11 and 15 using 50 kb sliding windows. Due to a high recombination rate observed near the MFSD12 locus, we used 10 kb windows in that region (chromosome 19). Vcftools was used to calculate both Tajima’s D and FST (82). To calculate extended haplotype homozygosity (iHS) we used Selscan (83). Unstandardized iHS scores were normalized within 100kb bins according to the frequency of the derived allele. We then identified the signals of positive selection by calculating the proportion of SNPs with |iHS| >2 in non-overlapping windows (80). To identify outlier windows we calculated 5th and 95th percentiles. Population differentiation was assessed with Weir and Cockerham’s fixation index FST (79) between each pair of populations. Outliers were identified using empirical p-values.
Median joining haplotype networks (84) for the Illumina Omni 5M SNP dataset were constructed and visualized at genomic regions of interest using NETWORK (85). In addition, we constructed genealogies of regions flanking candidate causal SNPs using a hierarchical clustering approach with sequence data from the Simons Genome Diversity Project (13). Briefly, we considered a single copy of each chromosome from each of the 279 individuals from (13). We inferred recombination breakpoints within a symmetric window surrounding each locus using the program kwarg (86) and identified the longest shared haplotype between each pair of sequences in which no recombination events occurred. We then computed the expected coalescence time between each pair of sequences, conditional on the observed number of mutations in the non-recombining region. Genealogies were constructed by applying the WPGMA hierarchical clustering algorithm to the estimated pairwise coalescence times. Our estimator accounts for recombination events and the population size history. However, simulation studies indicate that accounting for time-varying population size has relatively little effect on our estimates when the size changes according to previously inferred histories for human populations (67, 87). Because the true population sizes and relationships among the populations we considered are complex and imprecisely known, we assumed a constant population size of N = 104 in our analyses. The robustness analysis presented in (67) describes how our time estimates would change under different demographic histories and selective pressures.
To identify candidate causal GWAS variants altering gene expression we visualized and intersected variants with chromHMM tracks (88), DNase-1 hypersensitivity peaks, H3K27ac signal tracks for keratinocytes and melanocytes (29), CTCF signal tracks from keratinocytes (89), and ChIP-seq signal tracks from MITF (45). Variants were intersected with chromatin annotations using bedtools (90). Functional consequences of variants were also assessed using deepSEA (91) and deltaSVM (92). The effect of genetic variants on transcription factor binding was predicted using the MEME suite (93) for all transcription factors in the JASPAR 2016 CORE Vertebrate motif set (94).
To test for associations between gene expression, genetic variation, and ancestry we used eQTL and allele-specific expression (ASE) analyses on transcriptomes and genotype data from primary cultures of human melanocytes, isolated from foreskin of 106 individuals of assorted ancestries. All 106 individuals were genotyped on Illumina OmniExpress arrays and genotypes were subsequently imputed using the Michigan Imputation Server (75) based on the TGP reference panel and using SHAPEIT for phasing (73). RNA sequencing was performed to a mean depth of 87 million reads per sample. STAR (95) was used for aligning reads, and RSEM (96) was used to quantify the gene expression. Quantile normalization was applied in all samples to get the final RSEM value. To account for hidden factors driving expression variabilities, a set of covariates were further identified using the PEER method (97 and applied to calculate the normalized expression matrices. Principal components analysis was performed using genotypic data to capture population structure and ancestry using the struct.pea module of GLU. Using the normalized expression values and principal components, Pearson correlation between gene expression levels and ancestry was calculated, and associations between GWAS variant genotypes and gene expression levels were evaluated using ordinary least squares regression.
To identify associations between our GWAS candidate causal variants and expression of nearby genes (using the 106 melanocyte transcriptomes), we first found all protein-coding genes with transcription start site (TSS) within 1 Mb of the top GWAS variant for each locus and RSEM values greater than 0.5 in the primary melanocyte cultures. Pearson correlation was used to measure the association between ancestry and gene expression.
For each locus, we tested whether any genes with a transcription start site within 1Mb of the top SNP had an eQTL amongst the set of pigmentation QTLs using an additive linear model with the first two principal components of ancestry as covariates. To identify significant variant-gene associations we used a permutation approach for each locus independently (67). This was repeated for each gene and focal variants across all genes were adjusted for multiple testing using the Bonferroni correction (98).
We also carried out allele-specific expression (ASE) analyses for each significant eQTL SNP. Sites with at least 30 mapped reads, <5% mapping bias, and ENCODE 125 bp mappability score >=1 were retained for further analysis. For genes with a heterozygous coding variant amongst the melanocyte transcriptomes, allelic expression (AE) was computed as AE= |0.5 - NA/(NA+ NR)|, where NR is the number of reads carrying the reference allele and NA is the number of reads containing the alternative allele. For each GWAS variant, differences in gene AE between GWAS heterozygotes and homozygotes was evaluated by Wilcoxon rank-sum test. This was repeated for all possible genes and GWAS variants and Bonferroni-corrected p-values less than 0.05 were considered significant. For several genes, including HMG20B and DDB1, ASE could not be measured for some or all variants of interest because no individuals were heterozygous for both the test- variant and a coding variant.
For OCA2 we tested for an association between inclusion rates of exon 10, which contains our top candidate causal variant in the region, rs1800404, and individual genotypes at rs1800404. For each melanocyte transcriptome, reads spanning the exon9-exon10 and exon9-exonll junctions were extracted from the splice-junction files output by STAR. For each individual, a percent spliced in (PSI) value was calculated. To estimate the effect of variation at rs1800404 on exon 10 inclusion, ordinary least squares regression was carried out between PSI and dosage of the alternative allele for rs1800404 across individuals. A two-sided t test was used to calculate a p-value.
To test the functional impact of a subset of GWAS variants on gene expression, predicted regulatory sequences containing variants were cloned into a pGLA.23 firefly luciferase reporter vector. Vectors were transfected into a WM88 melanocytic melanoma cell line and co-transfected with renilla luciferase control vector (pRL-CMV) in a dual luciferase assay. Relative luciferase activity (firefly/renilla luminescence ratio) is presented as fold change compared to cells transfected with the empty pGL4.23 vector. Data were analyzed with a modified Kruskal-Wallis Rank Sum test and pairwise comparisons between groups were performed using the Conover method. P values were corrected for multiple comparisons with the Benjamini-Hochberg method using the R package PMCMR, and p-values less than 0.05 were considered significant.
We characterized the function of MFSD12 in vitro in immortalized melanocytes and in vivo in both zebrafish and mice. Immortalized melan-Ink4a melanocytes from C57BL/6 Ink4a-Arf1−/− mice were cultured as described (31). To deplete MFSD12, cells were infected with recombinant lentiviruses – generated by transient transfection in HEK293T cells – to express Mfsd12-targeted shRNAs or non-target controls. Cells resistant to puromycin (also encoded by the lentiviruses) were analyzed 5-7 days after infection. Knockdown efficiency of Mfsd12 mRNA in cells expressing Mfsd12-specific shRNAs relative to non-target shRNA was quantified by reverse transcription/quantitative PCR (detected with SYBR Green; Applied Biosystems) relative to Tubb4b (encoding β-tubulin) as a reference gene. Melanin content in cell lysates was determined by a spectrophotometric assay as described (99), and melanin coverage in intact cells was determined by bright field microscopy and analysis using the “Analyze Particles” plug-in in ImageJ (National Institutes of Health). To analyze MFSD12-HA localization, melan-Ink4a melanocytes were transiently transfected with MFSD12-HA expression plasmids and analyzed 48 hours later by bright field and immunofluorescence microscopy as described (100) using the TA99 monoclonal antibody to TYRP1 (American Type Culture Collection) to detect melanosomes, rabbit anti-LAMP2A (Abcam) to detect lysosomes, and rat anti-HA (Roche) to detect the transgene. Percent signal overlap in the cell periphery was determined from background-subtracted, thresholded, binary images using the “Analyze Particles” plug-in in ImageJ. Statistical significance was determined using unpaired, twotailed student’s t tests: p < 0.05: *, p < 0.01: **, p < 0.001: ***, p < 0.0001: ****. Details are provided in (67).
Zebrafish mutagenesis using CRISPR/Cas9 was performed to target mfsd12a. Compound heterozygous mutant fish for analysis were generated from F1 incrosses of mutant founder fish. For methylene blue staining, embryos were collected following fertilization and placed in zebrafish system water containing 0.0002% methylene blue and analyzed at 6 dpf. For GFP analysis, embryos were injected with 25 pg Tg(aox5:PALM-GFP) and 80 pg tol2 mRNA and GFP expression was evaluated in mosaic injected fish at 5 dpf. Larvae were anesthetized in sub-lethal 1× tricane solution and placed in 100ul of a low melt agarose solution (0.8%).
In mice two targets for CRISPR/Cas9 cleavage were selected within Exon 2 of Mfsd12 to generate a 134bp deletion resulting in a null allele of Mfsd12. A mixture of Cas9 mRNA (TriLink BioTechnologies) and each of the two synthesized gRNAs was used for pronuclear injection into C57BL/6J × FVB/N F1 hybrid zygotes. Mutation carrying mice were viable and presented with grey coat color distinct from littermates. Hairs were plucked from postnatal day 18 mice and individual awl hairs were mounted in permount and imaged with a stereomicroscope (Zeiss SteREO Discovery.V12) at the base of the sub apical yellow band where the switch from eumelanin to pheomelanin is visible.
To characterize Mfsd12 in grizzled mice, Illumina generated whole genome sequences of grizzled, JIGR/DN (gr/gr) reads were mapped using bwa mem to GRCm38/mm10 (available at SRA Accession SRR5571237). Sequence variants between JIGR/DN gr/gr and C57BL/6J reference genome within the gr/gr candidate region were identified using SnpEff(101). Validation of a 12 bp deletion within Mfsd12 was performed using samples from an independently maintained gr/gr colony provided by the laboratory of Dr. Margit Burmeister.
Acknowledgments
We thank J. Akey, R. McCoy for Melanesian genotype data, A. Clark, C. Brown, YS Park for critical review of the manuscript, members of the Tishkoff lab for helpful discussion, Dr. A. Weeraratna at Wistar Institute, Philadelphia for providing the WM88 melanocytic cell line, Dr. D. Parichy for the aox5:palmGFP plasmid, Dr. G. Xu and Dr. R Yang at University of Pennsylvania for technical assistance, Dr. M. Burmeister at University of Michigan for grizzled mouse samples, L. Garrett at the Embryonic Stem Cell and Transgenic Mouse Core (NHGRI), R. Sood in the Zebrafish Core (NHGRI), and the African participants. We acknowledge the contribution of the staff members of the Cancer Genomics Research Laboratory (NCI), the NIH Intramural Sequencing Center, NCI Center for Cancer Research Sequencing Facility, the Yale University Skin SPORE Specimen Resource Core and the Botswana-University of Pennsylvania Partnership. This work utilized computational resources of the NIH HPC Biowulf cluster. This research was funded by the following grants: NIH grants 1R01DK104339-0, 1R01GM113657-01 and NSF grant BCS-1317217 to SAT. NIH grant R01 AR048155 from NIAMS to MM. NIH grant R01 AR066318 from NIAMS to EO, NIH grants 5R240D017870-04 and 1U54DK110805-01 to LZ and YZ, NIH grant R01-GM094402 to YSS, NIH K12 GM081259 from NIGMS to SB. MHB was partly supported by a “Science Without Borders” fellowship from CNPq – Brazil. YSS is a Chan Zuckerberg Biohub investigator. This work was supported in part by the Center of Excellence in Environmental Toxicology (NIH P30-ES013508. T32-ES019851 to MH) (NIEHS). the Intramural Program of the National Human Genome Research Institute and the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, federal funds from NCI under Contract HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. L.I.Z. is a founder and stock holder of Fate Therapeutics, Marauder Therapeutics, and Scholar Rock. Data are available at dbGAP accession number phs001396.v1.p1 and SRA BioProject PRJNA392485.
Footnotes
SUPPLEMENTARY MATERIALS
www.sciencemag.org/cgi/content/full/science.aan8433/DC1)
Materials and Methods
NISC Comparative Sequencing Program Collaborator List