Evaluation of OpenArray™ as a Genotyping Method for Forensic DNA Phenotyping and Human Identification

A custom plate of OpenArray™ technology was evaluated to test 60 single-nucleotide polymorphisms (SNPs) validated for the prediction of eye color, hair color, and skin pigmentation, and for personal identification. The SNPs were selected from already validated subsets (Hirisplex-s, Precision ID Identity SNP Panel, and ForenSeq DNA Signature Prep Kit). The concordance rate and call rate for every SNP were calculated by analyzing 314 sequenced DNA samples. The sensitivity of the assay was assessed by preparing a dilution series of 10.0, 5.0, 1.0, and 0.5 ng. The OpenArray™ platform obtained an average call rate of 96.9% and a concordance rate near 99.8%. Sensitivity testing performed on serial dilutions demonstrated that a sample with 0.5 ng of total input DNA can be correctly typed. The profiles of the 19 SNPs selected for human identification reached a random match probability (RMP) of, on average, 10−8. An analysis of 21 examples of biological evidence from 8 individuals, that generated single short tandem repeat profiles during the routine workflow, demonstrated the applicability of this technology in real cases. Seventeen samples were correctly typed, revealing a call rate higher than 90%. Accordingly, the phenotype prediction revealed the same accuracy described in the corresponding validation data. Despite the reduced discrimination power of this system compared to STR based kits, the OpenArray™ System can be used to exclude suspects and prioritize samples for downstream analyses, providing well-established information about the prediction of eye color, hair color, and skin pigmentation. More studies will be needed for further validation of this technology and to consider the opportunity to implement this custom array with more SNPs to obtain a lower RMP and to include markers for studies of ancestry and lineage.


Introduction
Short tandem repeats (STRs) are genetic variants widely used in forensic investigations for the identification of offenders and in cases concerning relationship testing [1,2]. However, the multiplexing capacity of STRs is relatively low because of the need for capillary electrophoresis and inability to accommodate large numbers of markers. In the genomic era, it is well known that STRs are not the only source of genetic variation in humans [3]. Single-nucleotide polymorphisms (SNPs) are the most common type of genetic variation among people and can provide additional information such as the inferred ancestry of a DNA sample and phenotypes (eye and hair color) [4]. From a technical point of view, SNPs can be used to support forensic DNA analyses because of an abundance of potential markers, amenability to automation, and potential reduction in required fragment length to only 60-80 bp [5][6][7].
Forensic analysis usually involves comparisons between genetic profiles extracted from biological samples collected from a crime scene, and objects and suspects which are thought to be associated to that crime. In the absence of suspects, the power of forensic DNA analysis has been enhanced through the development of DNA databases, allowing the identification of unknown offenders and serial offenders by linking different crimes [1,8,9]. However, in criminal cases without an STR profile match, police investigation can be aided by different kinds of genetic tests.
In this context, the analysis of SNPs for the study of phenotypic variability (forensic DNA phenotyping, FDP) and for the determination of biogeographical origin is particularly promising [1,10]. The main advantage of such analysis is to narrow down the number of potential crime scene trace donors to a smaller group of people who most likely have the externally visible characteristics and biogeographic ancestry that were inferred from the crime scene DNA [11].
Forensic DNA phenotyping is explicitly regulated and permitted by law in two EU member states (the Netherlands and Slovakia) and practiced in compliance with existing laws in six more (Poland, Czech Republic, Sweden, Hungary, Austria, and Spain) and the United Kingdom [11].
In the "omic era", different approaches have been developed to test thousands of genetic variations for forensic purposes (phenotype, ancestry, and identification). Real-time PCR using TaqMan chemistry is a well-known technology routinely used in the workflow of forensic laboratories to quantify the DNA extracted from biological samples [12,13]. The OpenArray™ system (Thermo Fisher Scientific, Waltham, MA, USA) applies TaqMan chemistry for the simultaneous and massive study of samples and SNPs [14,15].
Here we report a pilot study aimed at evaluating the applicability of OpenArray™ technology (Thermo Fisher Scientific, Waltham, MA, USA) for the testing of SNPs for human identification and for inferring phenotypes. To this end, we selected the entire panel of 41 SNPs for eye, hair, and skin color prediction from the validated tool HIrisPlex-S and 19 SNPs for individual identification from the Precision ID Identity SNP Panel (Thermo Fisher Scientific, Waltham, MA, USA) and ForenSeq DNA Signature Prep Kit (Verogen Inc, San Diego, CA, USA). A total of 314 samples were typed to test the genotyping accuracy of OpenArray™ through comparison with resequencing data.

Selection of SNPs
In this work we selected 41 SNPs for eye, hair, and skin color prediction from the HIrisPlex-S system and 19 SNPs from a universal individual identification SNPs panel validated for forensic purposes [16][17][18][19]. The SNPs for human identification were selected based on their chromosomal location (each SNP is mapped to a different chromosome to avoid linkage and linkage disequilibrium effects). These genetic variants were included in a customized panel of 60 SNPs able to provide both human identification and phenotyping information in the same analytical session. The TaqMan probe related to rs1805008 could not be developed. Therefore, rs11538871 was chosen because of its complete linkage disequilibrium (LD) with rs1805008 [20].
The selected SNPs are summarized in Table 1.

Ethical Committee Approval
These investigations were carried out following the rules of the Declaration of Helsinki of 1975, revised in 2013. According to Point 23 of this declaration, approval from the ethical committee of Policlinico Tor Vergata (protocol number: Prot. CE/PROG.177/20) was obtained.

Reference DNA Samples
A total of 314 reference DNA samples were used. In particular, 114 genomic DNA samples were obtained from the 1000 Genomes Project (Coriell Institute for Medical Research, Camden, NJ, USA) and 200 blood samples were obtained from the Laboratory of Genomic Medicine UILDM (Italian Muscular Dystrophy Association) Santa Lucia Foundation [21]. The 114 samples from 1000 Genomes were selected from a small town near Florence in the Tuscany region of Italy. The genotype data of genomic DNA samples from 1000 Genomes were extracted from the Ensembl genome browser [22]. An additional four volunteer donors were used for the sensitivity testing.

Biological Samples from Real Cases
We analyzed a total of 21 samples of biological evidence that had single STR profiles generated during the routine workflow and that were assigned to their reference perpetrator. This evidence was from different tissues (10 from saliva, 2 from semen, 7 from blood, 2 from urine) and was attributed to 8 different subjects from different tissues [23,24]. All samples were marked with a specific internal code, as shown in Table 2 [2]. The forensic samples were loaded in the same plate twice with the purpose of verifying the reproducibility of genotyping assignation.

DNA Purification and Quantification
In accordance with the quality standards of forensic laboratories that require the use of automatized instruments, the genomic DNA of 225 samples were extracted using the Maxwell ® 16 MDx Instrument (Promega, Madison, WI, USA) and the DNA IQ Casework Pro Kit for Maxwell ® 16 (Promega, Madison, WI, USA).
Each DNA sample was quantified using the Quantifiler ® Trio DNA Quantification Kit (Thermo Fisher Scientific, Waltham, MA, USA).

OpenArray™ Technology
OpenArray™ Technology (Thermo Fisher Scientific, Waltham, MA, USA) is a highthroughput real-time PCR genotyping method that allows for rapid screening of several TaqMan assays in many samples [13]. This real-time method involves the use of an array composed of 3072 through-holes running on the QuantStudio 12K Flex Real Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA) with an OpenArray™ block. The OpenArray™ system is composed of a specific plate (OpenArray™ plate) enabling 192 samples to be typed in a single run [14].
For each sample, 5 ng (volume of 3 µL) of extracted DNA from the reference samples and 3 µL of 2× TaqMan OpenArray™ Genotyping Master Mix were manually loaded into 384 well-plates according to the manufacturer's instructions (Thermo Fisher Scientific, Waltham, MA, USA). A negative control was obtained by adding 3 µL of pure distilled water to the Master Mix. The QuantStudio 12K Flex OpenArray™ AccuFill System transferred the previously generated mix to the TaqMan OpenArray™ plate. The amplification was performed using the QuantStudio 12K Flex Real Time PCR System (Thermo Fisher Scientific, Waltham, MA, USA) instrument, and the results were analyzed using TaqMan Genotyper Software (Thermo Fisher Scientific, Waltham, MA, USA) [14]. For each SNP, the call rate and concordance rate were calculated as described in Section 2.7 [25].

Sensitivity
For sensitivity studies, genotyping results from four volunteer donors were assessed from serial dilutions of DNA consisting of 10 ng, 5 ng, 1 ng, and 0.5 ng. These four volunteer donors provided a total of one sample (buccal swab) each.

Resequencing Analysis
The genotyping accuracy was assessed through comparison with resequencing data. Resequencing data were available from the Ensembl genome browser for the 1000 Genomes

Statistical Analysis
At the end of the analytical assay, TaqMan Genotyper Software indicated the percentage of successful genotyping, reported as the call rate. The call rate is defined as the percentage of SNPs that were assigned a genotype call by the software out of the total number of SNPs typed. According to the software's instructions, a successful call rate is more than 90%. The concordance rate is defined as the proportion of SNPs typed by OpenArray™ confirming the resequencing data. This value was determined by comparing the allele calls generated by OpenArray™ technology with the genotypes resulting from resequencing data [25]. Concordance is the percentage of SNPs with identical results using both OpenArray™ and resequencing.
For phenotypic characterization of biological forensic samples (n = 17), we constructed a statistical prediction model using HIrisPlex-S webtool systems, available at https://hirisplex.erasmusmc.nl/ (accessed on 20 January 2021). Each phenotype was predicted by applying established algorithms for the color of the eyes, hair, and skin, which returns a p-value (prediction probability) [19]. In terms of eye color, the highest p-value is assigned to the most probable phenotype [19,26]. For the interpretation of hair color, we considered the hair color prediction guide developed by Walsh S. et al. [23,27]. This guide examines categorical hair color probabilities in combination with light/dark hair color shade probabilities as obtained from genotype data [23,27]. For the interpretation of skin color, we used the guide developed by L. Chaitanya et al. [16]. This guide considers the highest p-value in combination with the second highest p-value [16,28]. For example, if in the pale category we obtained a p-value greater than 0.7, this phenotypic characteristic depends on the effect of the second highest category if this p-value is greater than 0.15. In particular, it will appear darker if the second most likely category is intermediate, whereas it will appear lighter if the second most likely category is pale [16].
Prior visual phenotyping was performed by two blind assessments (two independent individuals from our laboratory not involved in this study) to verify the validity of the software prediction.
In order to verify the discrimination power of 17 human identification SNPs (not including rs9786184 and rs2032652 on the Y chromosome), we determined the random match probability (RMP) [29]. The random match probability is the estimated frequency of a genetic profile in the reference population and is calculated using allele frequencies from that population group. Therefore, the RMP is a useful value for evaluating the discriminating power of a DNA profiling system, and it indicates the probability of obtaining a match between two distinct and unrelated individuals [29].

Genotyping of the Control Sample and Concordance
The OpenArray™ System (Thermo Fisher Scientific, Waltham, MA, USA) successfully analyzed 100% of the reference DNA samples. The average call rate and the average concordance rate were 96.9 ± 1.72 and 99.8 ± 0.56, respectively [25]. The call rate was also calculated for each SNP. The average SNP call rate was 98.9% ± 1.67, with values from 91.3-100% (Table 1).

Sensitivity Study
The results of the sensitivity study are shown in Table 3. A call rate higher than or equal to 90% was obtained for all samples. The results obtained demonstrated that the system is able to correctly type DNA evidence in quantities starting from 0.5 ng.

Genotyping of Biological Evidence from Real Cases
A total of 21 single biological evidence samples from different tissues (10 from saliva, 2 from semen, 7 from blood, 2 from urine), previously assigned to their reference perpetrator, were analyzed using the OpenArray™ System. This evidence was attributed to eight different subjects [23,24] (Table 2). A total of 17 samples presented call rates higher than 90%, with only 4 samples showing call rates below 90% ( Table 2). As expected, these 4 samples were characterized by an extremely low starting concentration (Table 2).
We typed two replicates of each sample to verify the genotype call reliability. We obtained concordant results between the two replicates for all samples.

Predicting Human Appearance from Forensic Samples
Once the correct allelic call was ascertained for all subjects, it was possible to carry out predictive analysis on the HIrisPlex-S System software [19].
The data reported in Supplementary Table S1 describe the outputs (p-values) obtained by the HIrisPlex-S System Software for each subject [19]. Table 2 reports the phenotype predictions given by the HIrisPlex-S System software [19].

RMP Calculation
Calculation of the random match probability (RMP) was performed. The calculation considered 17 SNPs located on the autosomal chromosomes. The frequencies relating to the European (EUR), African (AFR), and East Asian (EAS) populations of the 1000 Genomes Project, available on the Ensembl genome browser, were used [22]. Table 4 shows the RMP values obtained for each subject. These values show that the discriminatory power of the analyzed SNPs is not sufficient to uniquely identify a subject in the world population but can be used to verify genetic compatibility with suspects to be confirmed by traditional STR analysis [30,31].

Discussion
To the best of our knowledge this is the first application of the OpenArray™ System in the forensic field. We designed a panel of 60 SNPs for phenotype estimation and personal identification [2]. SNPs for eye, hair, and skin color prediction were directly selected from the HIrisPlex-S system, while SNPs for human identification were selected from already validated subsets (the Precision ID Identity SNP Panel (Thermo Fisher Scientific, Waltham, MA, USA) and ForenSeq DNA Signature Prep Kit (Verogen Inc, San Diego, CA, USA) [16][17][18][19]. These SNPs were first tested on 314 reference samples, revealing an average call rate of 96.9% and a concordance rate near 99.8%.
Sensitivity testing performed on serial dilutions demonstrated that samples with 0.5 ng of total input DNA can be correctly typed.
Of 21 biological evidence samples from different tissues (10 from saliva, 2 from semen, 7 from blood, 2 from urine), a total of 17 were correctly typed using OpenArray™, revealing a call rate higher than 90%.
Regarding identification purposes, since most SNPs are biallelic, they are less informative for identity testing than STR analyses [32]. Thus, many more SNPs will be needed to achieve the same level of discrimination afforded by the commonly used STR loci [33]. However, our system is not intended to replace the present STR analysis for human identification. Despite the reduced discrimination power of this system compared to STR based kits, the OpenArray™ System can be used to exclude suspects, to prioritize samples for downstream analyses providing well-established information about the prediction of eye color, hair color, and skin pigmentation.
It should be used primarily as a tool for phenotype prediction, while identification should also be supported by traditional methods. A total of 17 SNPs for human identification were used in this study, yielding a mean RMP of approximately 10 −8 .
It is important to outline that our work was intended to provide a new tool, based on previously validated SNPs, for phenotype prediction and prioritizing samples for downstream analyses for human identification [16][17][18][19]34,35]. The HIrisPlex-S panel and the related software were previously validated elsewhere [16,19]. Thus, our work was aimed at verifying the ability of OpenArray™ to correctly type the selected SNPs.
The main strengths of the OpenArray™ System are the speed of the analysis (less than four hours against the current 24-48 h), the standardized interpretation of the results, the low cost for the analysis of a single sample (about 10 euros, in the region of 20% of the current average) and the opportunity to provide both a first evaluation of biological compatibility and phenotypic information. Moreover, TaqMan chemistry is a well-known technology routinely used in the workflow of forensic laboratories to quantify the DNA extracted from biological samples.
Current weaknesses are that it is necessary to analyze the samples with a second system based on traditional STRs to determine biological compatibility. Furthermore, the OpenArray™ System is not used in the workflow of forensic genetics laboratories, limiting its availability for routine use.

Conclusions
To the best of our knowledge, this study represents the first application of the OpenAr-ray™ System as a genotyping tool in forensic genetics. In particular, the results obtained suggest that the OpenArray™ System is robust and reliable and is suitable for use in forensic genotyping. Herein, we developed a panel for phenotypic prediction and human identification. Although the 19 SNPs used provide useful information for personal identification, a greater number of SNPs is required for the reliable identification of a subject.
The quality of the results obtained from biological evidence also promotes the application of this system in forensic laboratories to assist investigators in resolving cases.
Finally: these data suggest the applicability of this system as a genotyping method in the forensic field. In particular, it could be suitable as a first line method to verify the presence of genetic compatibility between evidence and suspects and to achieve rapid investigative information also providing well-founded predictions of eye color, hair color, and skin pigmentation. More studies will be needed for further validation of this technology and to consider opportunities to implement this custom array with more SNPs to lower the RMP and to include markers for studies of ancestry and lineage.