Allele frequencies for 40 autosomal SNP loci typed for US population samples using electrospray ionization mass spectrometry

Aim To type a set of 194 US African American, Caucasian, and Hispanic samples (self-declared ancestry) for 40 autosomal single nucleotide polymorphism (SNP) markers intended for human identification purposes. Methods Genotyping was performed on an automated commercial electrospray ionization time-of-flight mass spectrometer, the PLEX-ID. The 40 SNP markers were amplified in eight unique 5plex PCRs, desalted, and resolved based on amplicon mass. For each of the three US sample groups statistical analyses were performed on the resulting genotypes. Results The assay was found to be robust and capable of genotyping the 40 SNP markers consuming approximately 4 nanograms of template per sample. The combined random match probabilities for the 40 SNP assay ranged from 10−16 to 10−21. Conclusion The multiplex PLEX-ID SNP-40 assay is the first fully automated genotyping method capable of typing a panel of 40 forensically relevant autosomal SNP markers on a mass spectrometry platform. The data produced provided the first allele frequencies estimates for these 40 SNPs in a National Institute of Standards and Technology US population sample set. No population bias was detected although one locus deviated from its expected level of heterozygosity.

The forensic community has addressed the application of autosomal single nucleotide polymorphism (SNP) markers for human identification (1)(2)(3)(4). SNPs may be of utility when working with highly degraded DNA because they can be assayed with very small polymerase chain reaction (PCR) amplicons. Over the past 10 years, various SNP assays and candidate marker panels have been described (5)(6)(7)(8)(9)(10). One set of interest is a panel of 40 autosomal SNP markers intended as a universal individual identification panel. These markers were selected for high heterozygosity and low F st values in studies of 40 populations to complement CODIS STR loci (8). Initially these markers were screened and typed for world populations by singleplex TaqMan-based assays. More recently, there have been attempts to develop multiplex assays for typing the 40 SNP panel (11). One of these is the PLEX-ID SNP-40 comprised of 8 unique 5plex PCRs developed by Abbott Molecular. The PLEX-ID instrument platform is a commercial electrospray ionization mass spectrometer capable of automated analysis of short PCR amplicons (less than 140 bp) generated by proprietary assays (see SNP-40, mtDNA 2.0). The instrument desalts each PCR reaction through the use of magnetic bead chemistry and injects the desalted PCR reaction into the mass spectrometer. The peaks are separated and resolved based on time-of-flight analysis. With the emerging development of ultra-high throughput sequencing applied to forensics it will be more commonplace to utilize these "core" SNP maker sets. Here we report the assay performance and allele frequencies for a subset of our National Institute of Standards and Technology (NIST) US population samples.

Methods
For this study, samples (n = 194) were selected from three population groups representative of major population segments in the United States (African Americans = 74, Caucasians = 75, and Hispanics = 45).
Whole blood with anonymous identifiers and self-described ancestry was obtained from commercial blood banks (Interstate Blood Bank, Memphis, TN and Millennium Biotech, Fort Lauderdale, FL, USA). Blood samples were subjected to bulk DNA extraction using a modified salt-out procedure as described previously (12). DNA concentrations in extracts were determined using Quantifiler Human DNA Quantification kit (Life Technologies, Carlsbad, CA, USA) on an Applied Biosystems model 7500 (Life Technologies) real-time PCR instrument. Quantification values were then used to normalize all DNA extracts to a final concentration of 0.1 ng/µL for PCR amplification. All samples were previously examined with 15 autosomal short tandem repeats and the amelogenin sex-typing marker using the AmpFl-

sNP typing
The 40 SNP markers typed by the PLEX-ID SNP-40 assay were previously selected and characterized on multiple world populations (8). The following data were obtained for the 40 SNP markers typed in the assay: dbSNP reference SNP (rs) number, nucleotide position (according to the Human February 2009 (GRCh37/hg19) assembly), chromosomal band, and physical distance from adjacent markers located on the same chromosome (Table 1).
PCR amplification was performed as recommended by the manufacturer by adding a total of 0.5 ng in a 5 µL-volume of template DNA to each of eight wells in a column of a pre-fabricated SNP-40 assay plate (Abbott Molecular, Des Plaines, IL, USA). In total eight unique 5plex reactions were run per sample requiring approximately 4 ng of DNA template per sample. Template DNA was added to each well by using a pipette tip to pierce the foil seal covering the well to which sample was added. On each 96-well plate, ten unique templates were run in parallel with a no-template control and a positive amplification control, 9947a DNA, (Promega Corp., Madison, WI, USA) at 0.1 ng/µL. After template addition, the PCR plate was re-sealed using PCR Foil seals (Abbott Molecular) on an ALPS 50V Heat Sealer (ThermoFisher Scientific, Waltham, MA, USA) by compressing the foil seal and PCR plate for 4 seconds at 180°C. The prepared 96-well plate was then briefly centrifuged and placed in a Mastercycler ProS thermal cycler (Eppendorf AG, Hamburg, Germany) for thermal cycling with the following program: initial denaturation at 96°C for 10 minutes; 40 cycles of denaturation at 96°C for 20 seconds, annealing at 58.5°C for 2 minutes, and extension at 72°C for 10 seconds; followed by a final extension step at 72°C for 4 minutes and a 4°C hold.
Following PCR amplification, the 96-well plate was briefly centrifuged and placed in the input stacker of the PLEX-ID instrument for automated desalting and mass determination as per manufacturer's recommended procedure.
PCR products were purified by the PLEX-ID instrument using a proprietary magnetic bead chemistry to remove salts, enzymes, unincorporated nucleotides, and any other PCR components that might interfere with collection of mass spectra. Purified PCR product was eluted in a buffer containing two peptide standards with masses of 727.4 Da and 1347.7 Da, which act as calibrants to facilitate data processing. The electrospray ionization source operates in negative mode at approximately -4000 V (depending on the individual instrument's tuning parameters, which are not user configurable) and 300°C. PCR products were sprayed into the ionization source at a flow rate of 280 µL per hour with dry compressed air used as a countercurrent to aid in analyte desolvation. The time-of-flight analyzer collects 5000 scans per second, for a period of approximately 28 seconds. Masses were resolved based on differences in time elapsed to traverse the flight tube due to mass-to-charge ratio (m/z). Resultant mass spectra were processed by proprietary software (Ibis Track version 2.7), which performs several steps to produce a background-subtracted, deconvolved representation of the mass spectral data as if only the singly charged mass peak were detected, with mass (Daltons) on the x-axis and signal strength (arbitrary units) on the y-axis. Successfully detected masses were stored in a table that resides in the Ibis Track database. The resulting mass spectra were inspected visually in IbisTrack software and any masses not correctly assigned by the software were manually added or deleted.
Following review, genotypes for each sample group were exported from Ibis Track to Microsoft Excel 2010 for formatting and further analysis with Power Marker, version 3.25 and Arlequin software, version 3.5.1.2 (14,15). Allele frequencies, expected and observed heterozygosity values, and P values (Fisher exact test for Hardy-Weinberg equilibrium) for each marker were calculated for the three US sample groups. The combined random match probabilities (RMP) for each sample were calculated using Excel 2010. Figure 1 illustrates an example mass spectrum obtained from this study. Each spectrum contains the products of a 5plex PCR reaction. Four signal peaks are typically observed for heterozygous loci, two forward and two reverse strands of DNA (see markers rs2272998 and rs445251 in Figure 1). Only two signals peaks are observed for homozygous loci (see markers rs6591147, rs321198, and rs3780962). Of the 194 samples examined in this study, incomplete or partial genotypes were observed for 21 loci (21/7760 = 0.27%). Ten of the failures were due to data not transferring to the PLEX-ID server due to a communication error. The remaining incomplete genotypes coincided with amplification reactions that exhib-ited poor signal over the entire 5plex. This may have been due to inefficient PCR or desalting in those specific amplification reactions. There was no evidence of a single locus dropping out due to underlying SNPs that would affect PCR primer binding.

ResUlts
The genotype data for the 194 samples was evaluated for the following parameters: allele frequencies, observed heterozygosity, expected heterozygosity, and P value (Table 2). The combined RMP for each sample was calculated based on the determined allele frequencies for the corresponding sample group (Table 3).

dIsCUssIoN
A total of 6 of the 120 tests (40 loci ×3 sample groups) for Hardy-Weinberg equilibrium indicated a deviation from the expected result. Three were observed in the Caucasian sample group (rs1019029, rs1358856, and rs6811238), 2 in the African American group (rs1523537 and rs447818), and 1 in the Hispanic group (rs13182883). It was shown that it can be expected to observe approximately 5%, or 6 out of 120, of the comparisons to deviate from Hardy-Weinberg equilibrium (16,17). Significant values at the 95% confidence level were those less than 0.05 (5%). The Bonferroni correction of prob-ability for each population was 0.05/40 = 0.00125. Using this criterion only the SNP marker rs1019029 would still be considered significant.
Typically the minimum number of samples needed to provide a robust estimate for allele frequencies with loci containing 5 to 15 alleles is 100 to 150 samples for each population (18). Since in this study we measured bi-allelic markers that only had three possible genotypes (AA, BB, or AB), a smaller number of samples is deemed to be sufficient -provided that a minimum allele frequency of 5/2N is utilized (19). An examination of the data for each sample group did not find any frequency measurements (out of the 360 total) below the 5/2N threshold.
In the Caucasian sample group, the SNP marker rs1019029 exhibited a low P value (<0.0001) as well as a high observed heterozygosity of 0.733. The same marker gave an observed heterozygosity of 0.472 in the study by Pakstis et al over the 40 populations examined (8). An additional review of the mass spectral data did not reveal an obvious error with the genotyping assay. The high heterozygosity was not observed in the African American or Hispanic sample groups, suggesting that testing additional Caucasian samples and/or an alternate typing method would be needed to confirm this result.   This is the first demonstration of a multiplex assay for typing this specific panel of 40 autosomal SNPs. We found the PLEX-ID instrument and SNP-40 assay to be a robust and automated method to type SNP markers. The time required to genotype 40 SNPs for 10 samples from start to finish (PCR to amplicon detection) was approximately 4.5 hours. The average time required to review a plate (10 samples plus positive and negative controls) in the IbisTrack software was approximately 15 minutes. The allele frequencies calculated for the US sample groups were found to be in agreement with published values, with the possible exception of rs1019029. The allele frequencies are the first derived from the NIST US populations sample set for this panel of 40 SNP markers intended as a universal panel for individual identification (8).
acknowledgment Thomas Hall (Abbott) for assistance with IbisTrack software.
Funding Provided by the FBI Biometrics Center of Excellence. Project title: DNA as a Biometric Tool.
NIst disclaimer Certain commercial equipment, instruments and materials are identified in order to specify experimental procedures as completely as possible. In no case does such identification imply a recommendation or endorsement by the National Institute of Standards and Technology or the Federal Bureau of Investigation nor does it imply that any of the materials, instruments or equipment identified are necessarily the best available for the purpose. ethical approval All samples had an Institutional Review Board exemption. declaration of authorship KMK performed all of the experiments and initial data analysis and manuscript preparation. PMV analyzed the data and wrote the manuscript.

Competing interests All authors have completed the Unified Competing
Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.