Identification and characterization of the highly polymorphic locus D14S739 in the Han Chinese population

Aim To systemically select and evaluate short tandem repeats (STRs) on the chromosome 14 and obtain new STR loci as expanded genotyping markers for forensic application. Methods STRs on the chromosome 14 were filtered from Tandem Repeats Database and further selected based on their positions on the chromosome, repeat patterns of the core sequences, sequence homology of the flanking regions, and suitability of flanking regions in primer design. The STR locus with the highest heterozygosity and polymorphism information content (PIC) was selected for further analysis of genetic polymorphism, forensic parameters, and the core sequence. Results Among 26 STR loci selected as candidates, D14S739 had the highest heterozygosity (0.8691) and PIC (0.8432), and showed no deviation from the Hardy-Weinberg equilibrium. 14 alleles were observed, ranging in size from 21 to 34 tetranucleotide units in the core region of (GATA)9-18 (GACA)7-12 GACG (GACA)2 GATA. Paternity testing showed no mutations. Conclusion D14S739 is a highly informative STR locus and could be a suitable genetic marker for forensic applications in the Han Chinese population.

Aim To systemically select and evaluate short tandem repeats (STRs) on the chromosome 14 and obtain new STR loci as expanded genotyping markers for forensic application.
Methods STRs on the chromosome 14 were filtered from Tandem Repeats Database and further selected based on their positions on the chromosome, repeat patterns of the core sequences, sequence homology of the flanking regions, and suitability of flanking regions in primer design. The STR locus with the highest heterozygosity and polymorphism information content (PIC) was selected for further analysis of genetic polymorphism, forensic parameters, and the core sequence.
Conclusion D14S739 is a highly informative STR locus and could be a suitable genetic marker for forensic applications in the Han Chinese population.
Short tandem repeats (STRs) comprise the repeat units of 2 base pairs (bp) to 7 bp in length (1). Due to a high degree of length polymorphism as a result of variation in the number of repeat units and a short size of amplification products, they have become the most popular genetic markers for the identification of individuals and paternity testing (2). However, only a small number of STRs with high degree of length polymorphism is suitable for use as genotyping markers. Multiplex assays commonly include non-coding tetranucleotide and pentanucleotide repeats, which enables high combined power of discrimination (CPD) and combined power of exclusion (CPE) in a single test. Currently, commercial kits, such as PowerPlex ® Fusion System (Promega, Madison, WI, USA) and GlobalFiler ® Express Kit (Thermofisher Scientific Inc., Waltham, MA, USA) allow simultaneous amplification of more than 20 autosomal STR loci, which simplifies forensic DNA profiling (3,4).
STRs are prone to mutation in meiosis, which might result in a false maternal or paternal exclusion due to gain or loss of repeat units. Therefore, additional genetic information is required to increase the combined paternity index (CPI), which allows the detection of true parental relationships in a pedigree and reduces the chances of false exclusion. Currently, commercially available kits include some STR loci with a low power of discrimination (PD) and low power of exclusion (PE), such as TPOX. Furthermore, STRs included in Combined DNA Index System (CODIS) and European Standard Set (ESS) belong to only 18 of the 22 autosomal chromosomes (5). Therefore, some new multiplex STR typing systems were developed to provide additional information for paternity testing, such as 26plex STR assay (6). However, most STR loci used in the expanded assays, such as D14S1434, also have low PD and PE (7).
The development of six dyes permits a simultaneous detection of more STR loci in a multiplex STR typing system (4). CPD and CPE can be increased if an STR locus with low PD and PE in a multiplex STR typing system is replaced by a new STR locus with high PD and PE from the same chromosome, or if such a locus is added to the multiplex STR typing system. This is especially important for new STR loci with high PD and PE from the chromosomes that are not included in multiplex STR typing systems. The addition of these may help to avoid linkage potential between STR loci. Therefore, it is necessary to systemically select and evaluate new STR loci as genotyping markers for forensic application (8). For this purpose we intended to identify STR loci with high degree of polymorphism on chromosome 14. In fact, no STR locus on chromosome 14 has been included in common multiplex STR typing systems, even the latest PowerPlex ® Fusion System and GlobalFiler ® Express Kit. Although several STR loci on chromosome 14 have been used as expanded genotyping markers, including D14S1434 and D14S608, the use of these loci has several disadvantages. D14S1434 has been reported to have low PD and PE (9) and while D14S608 has relatively high PD, its allele frequency does not show normal distribution in all tested populations (10)(11)(12)(13)(14). D14S608 was also observed to have significant deviation from Hardy-Weinberg equilibrium (HWE) in German population (11).
In this study, STR loci on chromosome 14 were filtered from the Tandem Repeats Database (TRDB) (15) and their core and flanking sequences were further evaluated. D14S739 was shown to be highly polymorphic in a small sample size and was further characterized in the Han Chinese population.

Selection of STr loci
A total of 386 repeats on chromosome 14 were preliminarily filtered from TRDB using the following rules: 'Pattern Size' was equal to 4; 'Copy Number' was ≥8 and ≤30; 'the content of GC' was 20%-55%, '%Indels' was equal to 0, and '%Matches' was ≥90%. A set of 26 STR loci was selected based on the positions on the chromosome, repeat patterns of core sequences, sequence homology of flanking regions, and suitability of flanking regions in primer design.

Primer design, amplification, and electrophoresis
Primers were designed by using Primer v5.0 (Premier Biosoft Interpairs, Palo Alto, CA, USA). The amplification of STR loci was performed by polymerase chain reaction (PCR) including 2.5 μL 10 × PCR buffer (with MgCl 2 ), 2.0 μL deoxynucleotide mixture (2.5 mM), 1.0 μL FAM TM -labeled or unlabelled primer set (100 μM, Sangon Biotech., Shanghai, China), 1.0 μL rTaq DNA polymerase (5U/μL), and 1.0 μL sample DNA in a 25 μL final reaction volume. After an initial denaturation at 94°C for 3 minutes, PCR was carried out for 31 cycles under the following conditions: denaturation at 94°C for 30 seconds, annealing at 58°C for 35 seconds, extension at 72°C for 30 seconds, and a final extension at 72°C for 25 minutes. PCR products were separated by agarose gel electrophoresis or capillary electrophoresis in ABI PRISM 3130xL Genetic Analyzer (Thermofisher Scientific Inc.). Croat Med J. 2015;56:482-9 www.cmj.hr naming of the alleles and allelic ladder The pilot investigation of genetic polymorphism was performed with 35 individual DNA samples. The number of alleles of each STR locus was determined and the forensic parameters were evaluated. The PCR products of each allele were cloned in plasmid vectors and sequenced by 3130xL Genetic Analyzer. The alleles were named according to the sequencing results and the recommendations of the DNA Commission of the International Society of Forensic Genetics (ISFG) (16). The alleles were amplified, and then the products were diluted, mixed together, analyzed, and balanced to produce the allelic ladder (17). Panel and bin files for GeneMapper ID software v3.2 were programmed by using fixed size of allelic ladder.

Population investigation and data analysis
The bloodstains were collected from 511 unrelated individuals after informed consent had been obtained and the DNA samples were prepared by 10% Chelex-100 solution (Bio-Rad Laboratories, Hercules, CA, USA) and proteinase K (18). The allelic ladder, panel, and bins were updated when new alleles were observed. The values for allele frequencies, observed heterozygosity (Ho), expected heterozygosity (He), polymorphism information content (PIC), PD, PE were calculated, and the exact test of HWE was performed using the PowerStats v1.2 software (19) and PowerMarker software v3.25 (20). The study was approved by the ethics committee of Shanghai Medical College, Fudan University.

Selection of STr loci on chromosome 14
From a total of 27 552 loci in TRDB, we obtained 386 STR loci. The sequence homology of flanking regions was evaluated by the Blat tool (http://genome.ucsc.edu/cgi-bin/ hgBlat) and the suitability of flanking regions in primer design was assessed by Oligo v. 7.0 software (Molecular Biology Insights, West Cascade, CO, USA). A set of 26 STR loci with a spacing of about 3 Mb from each other was selected for further investigation ( Figure 1 and Table 1).

Pilot investigation of genetic polymorphism
The specificity of primer sets for the 26 STR loci was tested by PCR amplification and agarose gel electrophoresis (Table 1 and Figure 2) and further evaluated by capillary electrophoresis. Pilot investigation of genetic polymorphism showed that the locus with the highest heterozygosity, PIC, PD, and PE locus No. 20 with 9 alleles. University of Cal-    ifornia Santa Cruz (UCSC) Genome browser analysis (http:// genome.ucsc.edu) showed that the locus No. 20 had an identical location on chromosome 14 as D14S739. Therefore, D14S739 was further analyzed.

The population analysis of d14S739
The allelic ladder with 9 alleles of D14S739 was prepared and the genetic diversity of D14S739 in the Han population was investigated. In all tested samples we observed 14 alleles. Insertion-deletion polymorphisms (Indels), which result in microvariants, were not observed. The forensic parameters of D14S739 including allele frequencies, Ho, He, PIC, PD, and PE were calculated and no deviation from HWE was observed (Table 2). Compared with the polymorphism and forensic parameters of CODIS STRs obtained from our laboratory (21), D14S739 was comparable to the FGA locus and superior to other loci.
The core sequence analysis of d14S739 We next analyzed the sequence of D14S739 in the human genome version 19 (Hg19). In its core region, there are two repeat motifs GTCT and ATCT. However, D14S739 was originally cloned with the oligonucleotide probe of GATA repeats (22). According to the nomenclature for STR alleles, the repeat motifs of D14S739 should be defined as GATA motif and GACA motif (16). To further determine the nucleotide sequences of all 14 alleles, the representative samples containing the alleles of D14S739 were used to amplify the target region and PCR products were cloned into pMD TM  Because of the combination of two repeat motifs in the core region, alleles with the same size had different repeat patterns ( Figure 3A and B). The single nucleotide variation in alleles was also observed. The transition of cytosine to thymine in the GACA motif led to the appearance of GATA motif ( Figure 3C). Other alleles might have a similar pattern although we did not sequence all the alleles in the population. In fact, the single nucleotide polymorphism in the core region of D14S739 was confirmed by the UCSC Genome Browser Database (http://genome.ucsc.edu).

detection of d14S739 in paternity testing
The allelic ladder with 14 alleles of D14S739 was prepared and the performance of D14S739 in paternity testing was investigated in 200 trio paternity tests using PowerPlex ® 21 System (Promega, Madison, WI, USA). The transmission of alleles from parents to their offspring conformed to Mendelian laws and no mutation was observed. The representative genotypes of one trio paternity test together with the allelic ladder are shown in Figure 4.

diSCuSSion
In this study, we performed a comprehensive screening of STR loci on chromosome 14 and identified D14S739 as a highly polymorphic STR locus in the Han Chinese population. Generally, the degree of polymorphism for a genetic locus can be measured by two distinct parameters -heterozygosity and PIC (24). Our results showed that D14S739 had higher heterozygosity (0.8691) and PIC (0.8432) in the Han Chinese population than D14S1434 (0.682 and 0.645, respectively) (9) and D14S608 (0.8110 and 0.8399, respectively) (14). Similarly, D14S739 had higher PD (0.9615) and PE (0.7328) than D14S1434 (0.863 and 0.378, respectively) (9) and D14S608 (0.9504 and 0.6659, respectively) (14). Therefore, the inclusion of D14S739 in multiplex STR typing systems could help to achieve high CPD and CPE.
The addition of independent STR loci with high degree of polymorphism in multiplex STR typing systems could mini- mize adventitious matches in forensic casework. However, not all STR loci are suitable for the forensic purposes, so thorough evaluation is needed to filter out unsuitable ones (8). In this study, 26 STR loci with a spacing of 3 Mb on the chromosome were used as candidates. It is possible that we missed some of the suitable loci. In fact, it is difficult to evaluate a large number of STRs by manual screening. Therefore, a primary alignment using results from the whole genome sequencing against the reference genome could provide an overall view of the variation of all STR loci in a population, which can decrease the chance of missing polymorphic STR loci during the screening.
D14S739, also known as GATA65G10, was first cloned with the oligonucleotide probe of GATA repeats and subse-quently used for the construction of human genetic maps (22,25). The transition of cytosine to thymine creates GATA motif in the repetitive GACA region having as a consequence that alleles with the same size have different DNA sequences. The sequence variants make it difficult to accurately determine the DNA sequence of alleles with the same size. In these cases sequencing of a large number of alleles should be performed. The sequence polymorphism in the repeat motif was also observed in other STR loci, such as vWA (26). The internal allele variation might not be an important consideration in forensic casework since STR variation is primarily size-based and alleles of several STR loci with the same size, such as D21S11 and FGA, contain variable repeat blocks in the core region (26). In this study, the alleles of D14S739 were named Figure 4. electrophoretograms of d14S739 in a trio paternity test. The serial numbers 015a, 015b, and 015C represent father, son, and mother, respectively.
based on the size of the core region. The size-based variation of D14S739 leads to a high degree of polymorphism, and therefore has enough discriminating power for forensic purposes. Besides the fragment length, the sequence variation of D14S739 can provide additional information for the application of next-generation sequencing in forensic practice.
During meiosis an STR locus might lose or gain one or more repeat units, which affects the interpretation of paternity testing results. Previous studies showed the highest mutation rate for FGA and D21S11, which can be derived from the relatively large number of repeat units (27). However, FGA and D21S11 alleles with incomplete repeat units were widely observed (28). We did not observe incomplete repeat units at D14S739 locus, although D14S739 has 21-34 repeat units. We also did not observe mutation of D14S739 in 200 trio paternity tests. Therefore, D14S739 might have a relatively low mutation rate during meiosis. Since this study had a relatively small sample size, studies with larger sample sizes are needed to further determine the mutation rate of D14S739.
acknowledgments The authors thank Prof. Ziqin Zhao, Dr Huaigu Zhou, Prof. Chengtao Li, and Prof. Shilin Li for their valuable help with experiment design and manuscript preparation.
Funding This work was supported by the National Natural Science Fund (81571853 and 31270862).
ethical approval This study was conducted in accordance with the ethical guidelines of the Declaration of Helsinki 2008 and was approved by the ethics committee of Shanghai Medical College, Fudan University.
authorship declaration CS designed the experiments, performed the experiments, and analyzed the data. YZ, YZ, WZ, HX, and ZL provided technical expertise necessary for completion of this study. QT designed the experiments and reviewed the manuscript. YS and JX designed the experiments, analyzed the data, and wrote the manuscript.

Competing interests All authors have completed the Unified Competing
Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.