Episodes of Diversification and Isolation in Island Southeast Asian and Near Oceanian Male Lineages

Abstract Island Southeast Asia (ISEA) and Oceania host one of the world’s richest assemblages of human phenotypic, linguistic, and cultural diversity. Despite this, the region’s male genetic lineages are globally among the last to remain unresolved. We compiled ∼9.7 Mb of Y chromosome (chrY) sequence from a diverse sample of over 380 men from this region, including 152 first reported here. The granularity of this data set allows us to fully resolve and date the regional chrY phylogeny. This new high-resolution tree confirms two main population bursts: multiple rapid diversifications following the region’s initial settlement ∼50 kya, and extensive expansions <6 kya. Notably, ∼40–25 kya the deep rooting local lineages of C-M130, M-P256, and S-B254 show almost no further branching events in ISEA, New Guinea, and Australia, matching a similar pause in diversification seen in maternal mitochondrial DNA lineages. The main local lineages start diversifying ∼25 kya, at the time of the last glacial maximum. This improved chrY topology highlights localized events with important historical implications, including pre-Holocene contact between Mainland and ISEA, potential interactions between Australia and the Papuan world, and a sustained period of diversification following the flooding of the ancient Sunda and Sahul continents as the insular landscape observed today formed. The high-resolution phylogeny of the chrY presented here thus enables a detailed exploration of past isolation, interaction, and change in one of the world’s least understood regions.


Introduction
Island Southeast Asia (ISEA) and Near Oceania form a diverse interlocking geographical and cultural region with a population of over 400 million people. Comprising the Indonesia, Philippine, and Taiwan archipelagoes, ISEA hosts three of the five largest island states in the world, with Indonesia alone covering an area equivalent to that from Ireland to the Caspian Sea. Oceania is of course even larger, stretching from the eastern edge of ISEA, to New Guinea, and out to the furthest islands of the remote Pacific. Within Oceania, this study focuses on the Near Oceania region, particularly New Guinea and the Bismarck Archipelago, including their potential connections to Australia.
Over time, climatic changes have substantially transformed the physical landscape of this region from islands to continents and back. During the Pleistocene, long periods $65-15 thousand years ago (kya) saw many of today's islands connected into substantial land masses, with the two largest called Sunda and Sahul ( fig. 1) (Voris 2000). Deep sea trenches around Wallacean Islands kept these land masses separate (Hanebuth et al. 2000), but within them, lakes and river systems likely served as potential pathways for communication and movement, as they still do today. The Sunda and Sahul continents reached their maximum extent when sea levels were lowest during the last glacial maximum (LGM) $25 kya. Water levels subsequently rose with bursts of flooding taking place $14 kya, and the land links between modern islands and mainland Asia were largely drowned by $9 kya (Hanebuth et al. 2000;Voris 2000).
Archaeology shows that modern humans reached Australia $65-52 kya (O'Connell and Allen 2015; Clarkson et al. 2017), mainland New Guinea $49-40 kya (Summerhayes et al. 2010;Groube 1986), with the easternmost fringes of the Bismarck Archipelago being settled soon after, $45-35 kya (Leavesley and Chappell 2004;Torrence et al. 2004). Much later during the Holocene (<10 kya), the advent and widespread adoption of farming in mainland Asia triggered influential movements of people. This led to the spread of Austronesian languages and Neolithic practices through ISEA and out into Oceania, starting around 6 kya in Taiwan and reaching eastern  Karmin et al. . doi:10.1093/molbev/msac045 MBE Indonesia by $3.5 kya (Bellwood 2006;Xu et al. 2012;Kusuma et al. 2015;Brucato et al. 2016;Kirch 2017;Deng et al. 2020). Movements within the region, in part stimulated by independent instances of plant cultivation (Denham and Haberle 2008), were also extensive (Hudjashov et al. 2017). The result is that Indonesia and Papua New Guinea today host the richest linguistic diversity in the world, with over 700 and 800 languages spoken in those two modern nations. Coupled with this linguistic richness is an astonishing diversity of sociocultural practices-post marital residence patterns, kinship systems, social norms, and community structures, many of which are only now transitioning away from traditional practices (Godelier 1982;Lansing et al. 2008;Guillot et al. 2013;Tumonggor et al. 2014).
Despite this extraordinary human diversity, knowledge of genetic diversity has lagged substantially behind other global regions. Of particular note is the Y chromosome (chrY), whose phylogeny has been resolved elsewhere in the world (Hallast et al. 2015;Karmin et al. 2015;Poznik et al. 2016), but is still only partially known in ISEA and Oceania (Mona et al. 2007;Karafet et al. 2010;Bergström et al. 2016). Despite the trend toward whole genome data (Hudjashov et al. 2017;Jacobs et al. 2019;Brucato et al. 2021;Larena et al. 2021), the lack of a robust regional chrY phylogeny has limited the identification and molecular dating of sex-specific historical processes. By undertaking extensive new regional sampling and chromosomal-level sequencing, this study presents the first fully resolved regional chrY phylogeny. Here, we fit newly identified lineages into the broader global phylogeny of chrY diversity and characterize patterns that reflect the specific regional history of ISEA and Near Oceania.

Results and Discussion
High-Resolution chrY Lineages for One of the Last Understudied Global Regions To develop a fully resolved picture of chrY diversity in this region, we extracted chrY sequences from complete human genomes in a set of 152 samples sequenced on the Illumina platform that have not been analyzed before for their chrY diversity. This includes 14 newly reported samples (3 from New Guinea, 7 from Mentawai, and 4 from Sumba), 112 men from ISEA (Jacobs et al. 2019) and 26 men from Papua New Guinea (Brucato et al. 2021) (supplementary table S1, Supplementary Material online). Samples were chosen explicitly to fill gaps in earlier geographical coverage and lineage distributions. We then combined this new data set with previously published chrY sequences from the target region, as well as from broader geographical and phylogenetic contexts. The final global data set consisted of 795 full chrY sequences extracted from complete human genomes or from targeted sequencing, including samples from neighboring East Asia, Mainland Southeast Asia (MSEA), and other world regions ( fig. 1 and supplementary Guinea typically fall into  the major chrY haplogroups C-M130, M-P256, S-B254, and O-M175, respectively sampled here from 77, 50, 66, and 103  individuals (fig. 1 and supplementary table S1, Supplementary Material online). Distinct sublineages of C-M130 and O-M175 that occur in ISEA, New Guinea, and Australia largely fall within previously described diversity (Kayser et al. 2001;Karafet et al. 2010;Bergström et al. 2016;Nagle et al. 2016;Bergström et al. 2017). In contrast, M-P256 and S-B254 are typically restricted to New Guinea and neighboring islands and have much less well known phylogenies. We identify new phylogenetic structures and subhaplogroups, and resolve a substantial number of lineages that were previously classified only as paragroup K*-M526 (Kayser et al. 2006;Bergström et al. 2016;Nagle et al. 2016;Bergström et al. 2017). We also provide additional resolution to the rare lineage NO-M2313, previously classified as K2a1*-M2313 . The result is a fully resolved phylogeny of regional chrY diversity, with all individuals uniquely assigned and no paragroups ( fig. 1 and supplementary figs. S2-S4, Supplementary Material online). Sequence data allows dating of diversifications and expansions. Coalescence times were inferred using the Bayesian algorithms of BEAST (Drummond et al. 2012) and dates were calibrated using previously published split times and clock rates (supplementary table S2, Supplementary Material online) (Karmin et al. 2015).
Previous studies have highlighted the critical importance of robust filtering due to the complex repeat structure of the chrY (Karmin et al. 2015;Poznik et al. 2016). We employed a series of filters and sequence masks to limit the data set to high quality variant calls (detailed in Materials and Methods and supplementary table S9, Supplementary Material online), resulting in $9.7 Mb of complete chrY sequence per sample. Across the whole data set, there is an average of 648 (SD ¼ 18) mutations from tips to the common BT node, with a unimodal and nonskewed distribution (supplementary fig. S1, Supplementary Material online). Each chrY lineage was assigned to a haplogroup according to the alternating letter/ number rules of the Y Chromosome Consortium (Hammer 2002), taking account of additional modifications proposed subsequently to accommodate the volume of variant data from full chrY sequencing (Karmin et al. 2015). Because our study resolves several previously unresolved paragroups, we define a number of new clades in the global chrY tree, and necessarily relabel some previously known clades while keeping as close as possible to historic nomenclature.

Major Radiations around 50 kya Reflect First Settlement in the Region
In concordance with previous findings from global chrY phylogenies (Karmin et al. 2015;Poznik et al. 2016), rapid diversification of lineages C-M130, M-P256, and S-B254 indicate human settlement of ISEA, New Guinea, and Australia over a very short time frame close to 50 kya. These major haplogroups arose in the region with few distinguishing mutations, indicating rapid diversification ( fig. 2   , which also arose during this same time period of 50-40 kya, are geographically restricted. Of these, four Australian individuals belong to S7 0 8-FGC38729 and one to S12-Z41926, whereas S1-B273 is found    Existing lineages in C-M130, M-P256, and S-B254 survived through this gap, but most show no evidence of further radiation during this time. This striking lack of new diversity between 40 and 25 kya is also observed in the regional mitochondrial DNA (mtDNA) lineages (Pedro et al. 2020). ChrY haplogroup S4 0 6-PR495.2, which is mostly found in New Britain (eastern Papua New Guinea), is the only lineage that split during the hiatus, $30 kya (95% CI: 28.0-33.2 kya).

MS-PR2099
Noting the limitation that modern sampling may miss lineages that have gone extinct, the observed widespread pause in phylogenetic diversification appears to be characteristic of low population size and persistent population structure during much of the Upper Pleistocene, in agreement with maternal lineage patterns (Pedro et al. 2020), the relatively small numbers of Late Pleistocene archaeological sites identified (Williams 2013;Summerhayes et al. 2017) and the strong territoriality and geographical stability seen in modern ethnographic studies of New Guinea and Indigenous Australian societies (Muke 1993;Sutton 2003). In notable contrast, haplogroup O-M175 on Mainland Asia, which is now the dominating lineage in Eastern Eurasians and comprises more than a quarter of all male lineages in the world (Yan et al. 2014), continued to radiate throughout the Pleistocene. About 33 kya (95% CI: 30.6-36.6 kya), O-M175 began to diversify rapidly, and during the next 10,000 years, six major lineages arose ( fig. 2 and supplementary fig. S4, Supplementary Material online), which are now spread across Asian mainland. Some of these sublineages later swept into ISEA and out into the Pacific (Kayser et al. 2006;Karafet et al. 2010). Therefore a distinction exists between Mainland Asia, with extensive population expansions and mobility throughout the Upper Pleistocene, and ISEA, New Guinea with Australia, where phylogenies reflect much smaller population sizes and more restricted male mobility, which may have been enhanced both by geography and local cultural practices.
A Second Bout of Diversification 25-15 kya Included a Split between New Guinean and Australian Lineages After 25 kya, major periods of lineage radiation are observed across ISEA, New Guinea, and Australia. These lineages stem from different chrY subbranches arising from northern Sahul diversity ( fig. 3 and supplementary fig. S3 fig. S2, Supplementary Material online). M1-M106 split $21 kya (95% CI: 19.5-23.9 kya) into M1a 0 g-Page1 and M1h-MY6 (42 and 4 men, respectively), and is found today in New Guinea and New Britain, but also radiated westward ( The prevailing view of regional history is of two major human movements-the initial settlement (Leavesley and Chappell 2004;Summerhayes et al. 2010;Clarkson et al. 2017;O'Connell et al. 2018) and the arrival of Austronesian speakers 3-3.5 kya (Bellwood 2004(Bellwood , 2006, implicitly suggesting that little occurred in this region during the Upper Pleistocene. However, the more resolved chrY tree with denser sampling uncovers a far more complex paternal demographic history, consistent with some of the newer studies in genetics (Gomes et al. 2015;Pedro et al. 2020;Purnomo et al. 2021), linguistics (Schapper 2017), and archaeology (Summerhayes 2007;Summerhayes et al. 2017;Bellwood 2019). In particular, the increase in genetic lineage diversification correlates well with increasing population interactions and population sizes postulated from the archaeological record around 25-20 kya, based on animal, plant, and object (e.g., obsidian) translocation between Northern Sahul regions, changes in the subsistence economy, increasing occupation of Highland New Guinea, and settlement of distant islands (e.g., Manus), all of which suggest population size increases and connection of local dynamics to larger regional networks (Leavesley and Chappell 2004;Specht 2005;Summerhayes 2007;Summerhayes et al. 2017).
We see these connections within a number of haplogroups. In S-B254, the first split within the S4 0 6-PR495.2 lineages of New Britain occurred earlier, $30 kya (95% CI: 28.0-33.2 kya). Next, S5 0 6-FT17887 diversified first $23 kya (95% CI: 20.9-25.3 kya) and then $16 kya (95% CI: 14.4-17.7 kya) ( fig. 3 and supplementary fig. S3, Supplementary Material online). Its sister lineage, S4-P315, appears to have been Karmin et al. . doi:10.1093/molbev/msac045 MBE isolated for some time, only diverging $18 kya (95% CI: 15.9-20.3 kya) when it split between the New Britain and Queensland (Australia) groups. A much more recent split $3.6 kya (95% CI: 2.7-4.7 kya) is also seen between men from these two locations ( fig. 3 and supplementary fig. S3, Supplementary Material online). Hence, despite perceptions of long-term isolation between New Guinea and Australia Nagle et al. 2016), joint analyses of chrY from Papua New Guinea (Vernot et al. 2016) and Australia ) reveals phylogenetic relationships between lineages from New Britain and Australia that are much older than previously thought. This phylogenetic connection is beyond simply the Torres Strait region and is similar to that seen in autosomal data (Brucato et al. 2021). The coalescence date suggests an upper estimate of the Late Pleistocene for possible contact between these regions ( fig. 3  and supplementary fig. S2, Supplementary Material online).
Although no archaeological evidence supports direct contact between New Britain and Australia, the Torres Strait Islands (between New Guinea and Australia) may have served as a bridge between these two regions. Indeed, trading and intermarriage has been reported between Torres Strait Islanders and Indigenous Australians (Beckett 1987), and some New Guinean lineages (mtDNA Q1, P2 and P3, and chrY M1c6-Z42300) are shared by men with Torres Strait ancestry (supplementary fig. S3, Supplementary Material online) Pedro et al. 2020). This genetic affinity between Bismarck Islanders and Indigenous Australians may also result from genetic exchanges predating the divergence between these populations during the postglacial period (18-10 kya) ( fig. 3 and supplementary fig. S3 An Explosion of Activity Occurred at the Pleistocene-Holocene Boundary 15-10 kya The sea level began rising $14 kya with flooding bursts over the next five millennia leading to the current island landscape (Hanebuth et al. 2000;Voris 2000). During this time of major geologic changes, the autosomal genetic data indicates a marked increase of expansion and diversification in the Philippines (Larena et al. 2021), gene flows between islands in the region (Brucato et al. 2021), and the maternal mtDNA data from Sahul and Wallacea shows population expansion (Pedro et al. 2020). Similarly, archaeological data suggest a strong population increase from the Holocene transition onwards, as part of a longer process that started in the late Pleistocene (25-20 kya) and culminated with the Austronesian dispersal across the Northern Sahul region (Specht 2005;Bellwood 2006;Summerhayes 2007;Donohue and Denham 2010;Williams 2013;Summerhayes et al. 2017). We also see a marked increase in the radiation of chrY lineages, coupled with extensive mobility across the region. During this time, haplogroup O-M175 diversification patterns are similar to those of other chrY haplogroups in the region. Between 14 and 10 kya, major splits occurred in O1a-F31, O2-PK4, and many other O-M175 lineages, some of which are shared between Mainland and ISEA. Of particular note, O3f-M7 diversified $15 kya (95% CI: 11.2-17.4 kya), giving rise to branches found on Mainland Asia, Taiwan, and the Philippines, with the latter two lineages diverging from each other $10 kya (95% CI: 7.6-13.0 kya) within O3f-Y26403. O3f-M7 occurs widely across western Indonesia at low frequency today, and had previously been thought to reflect connections with mainland China during the historic era (Karafet et al. 2010). Although sampling remains low, the current O3f-M7 phylogeny suggests a potentially much earlier upper limit to the movement of people carrying this lineage, either in the Late Pleistocene or starting $15 kya ( fig. 3 and supplementary fig. S4 and

Holocene Dispersals Carried Few External Lineages from Mainland East Asia
The sea had risen to near its current level by 8-7 kya, leading to the creation of rich coastal floodplains (Terrell et al. 2011), and rising temperatures gradually led to the higher altitudes of New Guinea being more suitable for permanent human occupation (Summerhayes et al. 2017). Indigenous food plant cultivation in the highlands of New Guinea began $10 kya ( (Yan et al. 2014). Within ISEA, O3i-B451 and O2a2-F789 stem from this time of major haplogroup O-M175 expansion. However, the lineage with the greatest impact was O1-M119, which arose $33 kya (95% CI: 24.5-36.9 kya), diverged into sublineages $17 kya (95% CI: 12.3-19.3 kya), but underwent a striking period of radiation during the mid-Holocene ( fig. 4 and supplementary fig.  S4 and table S5, Supplementary Material online).
O1-M119 contains three main sublineages: O1a-F31, found in 35 men from across Mainland Asia, ISEA, and in New Guinea; O1b-F819 found in nine men from the Philippines, Sunda, and the Wallacean islands; and O1c-M110 found in 16 Haplogroup O-M175 sublineages in ISEA, New Guinea, and Oceania. A number of distinct sublineages (black) of O-M175 have penetrated from the Asian mainland into ISEA and beyond, and are likely associated with the expansion of Austronesian speakers. Most of these lineages coalesce $6-5 kya within subsets of the wider Asian diversity, except for O1b-F819 and O3f-M7 sublineages, which have earlier diversification times (details in supplementary fig. S4 and table S1 and S5, Supplementary Material online). In contrast, lineage NO2-MY1506, found today on Sunda and Sulawesi, split very early from its South Asian sister lineage. Karmin et al. . doi:10.1093/molbev/msac045 MBE men from the Philippines, western Indonesia, and New Guinea. O1a-F31 radiated rapidly between $8 and 4 kya, with a subbranch in Taiwan and Malaysia separating $4 kya (95% CI: 2.7-4.9 kya). O1b-F819 lineages in the Philippines and Indonesia split $9 kya (95% CI: 7.0-11.0 kya), but radiated rapidly $4 kya (95% CI: 3.1-5.0 kya). Today, O1c-M110 is found only within ISEA and New Guinea, where it shows evidence of rapid expansion 6-5 kya ( fig. 4 and supplementary fig. S4 and table S5, Supplementary Material online). We note the limitation that all molecular dates have some level of uncertainty (supplementary tables S3-S5, Supplementary Material online), and not all known O1-M119 diversity is included here (Sun et al. 2021). Nevertheless, the striking radiations observed are characteristic of a diverse population of Mainland Asian origin moving into ISEA and expanding rapidly. Due to their timing and current geographical distribution, these genetic lineages are typically associated with Austronesian-speaking peoples and Neolithic practices, which spread over a period of time, starting around 6 kya in Taiwan and reaching eastern Indonesia and New Guinea $3.5 kya, before subsequently moving out into Oceania (Bellwood 2006;Karafet et al. 2010;Xu et al. 2012;Kirch 2017;Deng et al. 2020).

Holocene Expansions in Indigenous chrY Lineages
Although arrivals from Mainland Asia have been the traditional focus of mid-Holocene studies in ISEA, we identify a series of simultaneous radiations in lineages with longstanding local connections in C-M130, M-P256, and S-B254. In addition, C8-F725 seeded many new lineages in a short time period between $6.5 and 5 kya, including radiating into the Philippines, western Indonesia (Borneo and Mentawai), and the Wallacean islands of eastern Indonesia (supplementary fig. S2, Supplementary Material online). A second wave of expansions occurred around $4-3 kya when lineages in Borneo split from their neighbors in Malaysia or Sulawesi. Local interisland radiations on Flores and Mentawai also began $2.5-1.5 kya (supplementary fig. S2, Supplementary Material online). In haplogroup M1a 0 g-Page1, multiple lineages experienced radiations between 7.5 and 2 kya (downstream of M1c-B256, M1b-MY39, M1a-MY26, and M1g-P87), mostly within New Guinea, Wallacea, and the Torres Islands ( fig. 3 and supplementary fig. S3, Supplementary Material online). In haplogroup S-B254, the previously mentioned split within S4c-Z41912 also appeared between Queensland (Australia) and New Britain $3.6 kya (95% CI: 2,730-4,719 kya) ( fig. 3 and supplementary fig. S3, Supplementary Material online). Many of these expansions and population movements that occurred after $4 kya were likely triggered by the introduction of agriculture from Austronesian-speaking cultural communities (Deng et al. 2020;Alam et al. 2021), perhaps coupled in some places with the growing influence of local Papuan agricultural practices that had commenced as early as $10 kya (Denham and Haberle 2008).

Population Mobility and Lineage Expansion Has Continued to the Present
Of course, chrY lineage evolution continues, with local expansions observed across nearby islands within the last two millennia. On Flores, sublineages that shared a common ancestor as far back as 49 kya expanded rapidly and locally within the past few thousand years (such as C8-F725, C7-B67, and M1c6-Z42300). Similar patterns of recent expansion are seen in western Indonesia (C8-F725, O3i-B450), eastern Indonesia (M1a3-MY33, O3i-B452), and New Guinea (C2a-M208), including the concomitant expansion of this same C2 lineage into and within Polynesia. Many of the fundamental driving forces observed during the earlier periods of ISEA, New Guinea, and Australian history have therefore continued right up to the present.

Conclusions
Phylogenetic relationships of the major paternal genetic lineages of ISEA and Near Oceania are among the last in the world to be resolved. Analyzing $9.7 Mb of chrY sequence from a geographically and culturally diverse set of men has enabled us to resolve and date the phylogeny from this region. Two well-known population expansions are confirmed, with multiple rapid diversifications between $50 and 40 kya reflecting rapid early settlement, and lineages expanding $6-5 kya indicating extensive movements and interactions. Interestingly, in the period $40-25 kya, between these major bursts of chrY diversity, we observe almost no branching events, with old lineages persisting but not diversifying. A similar pause in diversification is apparent in maternal mtDNA lineages from the region (Pedro et al. 2020). Starting from the LGM and intensifying in the subsequent warming period, multiple diversification events follow the flooding of the Sunda and Sahul continents as the insular landscape we see today formed (Voris 2000) and local populations dispersed.
The improved chrY topology also highlights more localized events with important historical implications. O3-M7 lineages from Taiwan and the Philippines diverged from mainland Asian groups $15 kya, earlier than previously thought (Karafet et al. 2010). This coalescence date provides an upper limit to the movements of people, possibly supporting the growing view from other evidence of early movements between MSEA and ISEA well predating the Neolithic period (Karafet et al. 2010;Trejaut et al. 2014;Vall ee, Luciani and Cox 2016). Despite the scarcity of chrY sequence data from Australia , we find that some Australian lineages are nested within New Britain diversity, indicating that Australia experienced at least some interaction with the Papuan world. These and other contacts emphasize the potential of less common chrY lineages to illuminate the subtleties of regional contacts. The resolved phylogenetic framework of the chrY presented here will thus enable future exploration of isolation, interaction, and change in one of the world's least understood regions. Diversification and Isolation in ISEA and Near Oceanian Male Lineages . doi:10.1093/molbev/msac045

Human Subjects
To develop a fully resolved picture of chrY diversity in this region, we sequenced 14 new male samples ( fig. 1 and supplementary table S1, Supplementary Material online) on the Illumina platform, and combined these with 26 chrY sequences from Papua New Guinea (Brucato et al. 2021) and 112 chrY sequences from ISEA (Jacobs et al. 2019) that were not previously analyzed for their chrY diversity. We then combined this new data set with all available chrY sequences from the region, supplemented with samples from a broader geographic and phylogenetic context taken from previously published full chrY studies on neighboring East Asia, Mainland Southeast Asia, and other world regions (Abecasis et al. 2010;Drmanac et al. 2010;Wong et al.2013;Carmi et al. 2014;Ball et al. 2014;Karmin et al. 2015;Bergström et al. 2016;Ilum€ ae et al. 2016;Malaspinas et al. 2016;Mallick et al. 2016;Poznik et al. 2016;Vernot et al. 2016;Zheng-Bradley et al. 2017;Hudjashov et al. 2018;Bergström et al. 2020;Byrska-Bishop 2021 Sequencing, Mapping, and Genotyping All newly generated chrY sequences were generated using the Illumina technology (Illumina, San Diego, CA, USA) on the HiSeq instrument (PCR-free protocol) with 30Â genomewide coverage. We employed a series of filters and sequence masks to limit the data set to high quality variant calls, resulting in 9.7 Mb of complete chrY sequence per sample. We used the same processing pipeline for all Illumina data. FASTQ files were mapped with BWA-MEM v0.7.12 (Li and Durbin 2009) to the human reference hs37d5 (http://ftp. 1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_ reference_assembly_sequence; last accessed March 8, 2022). Read duplicates were removed with Picard v2.12.0 (http:// broadinstitute.github.io/picard/; last accessed March 8, 2022) and the remaining unique reads were realigned around known indels, followed by base quality score recalibration (BQSR) using GATK v3.8 (Poplin et al. 2018). Variant calling was performed with GATK HaplotypeCaller in haploid mode. All-sites VCF files were filtered with bcftools v1.9 (Li 2011). The Illumina data and previously filtered data from the Complete Genomics technology (supplementary table S1, Supplementary Material online) were merged with GATK CombineVariants v3.8 (Poplin et al. 2018). We extracted the effective overlap between the two data sets by masking out all positions with a 5% or higher proportion of missing genotypes in either the Illumina or Complete Genomics data sets. We additionally excluded regions with poor mappability as described previously (Karmin et al. 2015;Poznik et al. 2016), resulting in a total of 9.7 Mb of analyzed chrY sequence. Variant positions used for phylogenetic reconstruction in each haplogroup are listed in supplementary tables S6-S8, Supplementary Material online.

Phylogeny Reconstruction and Dating
For the complete data set of 795 samples, a maximum-likelihood (ML) tree was constructed with GTRCAT using 200 rapid bootstrap inferences followed by a thorough ML search executed by RaXML v8.0.0 (Stamatakis 2014) ( fig. 1 and supplementary fig. S1, Supplementary Material online). Initial haplogroup labels were assigned with yHaplo (Poznik 2016). All identified variants for shared nodes were annotated to the ML trees and curated manually (supplementary tables S6-S8, Supplementary Material online). We then used the software package BEAST v1.7.5 (Drummond et al. 2012) to simultaneously reconstruct phylogenies and estimate coalescence times. We ran three separate lineage-based analyses for haplogroups C-M130, MS-PR2099, and O-M175. For each haplogroup we used the same parameters: a Bayesian skyline coalescent tree prior, the general time reversible (GTR) substitution model with gamma-distributed rates, a relaxed lognormal clock and the piecewise-constant coalescent model with the number of groups set to 10. For each haplogroup, four parallel runs with different random number seeds were performed. Each run had at least 60 million chains logged every 3,000 steps. The results were visualized and checked for effective sample size above 200 using Tracer v1.4. Four parallel runs were combined with LogCombiner and the initial 25% discarded as burn-in. Coalescence time estimates were computed with normally distributed age priors from a previously published phylogeny that had used a mutation rate of 0.74 Â 10 À9 (95% CI: 0.63-0.95 Â 10 À9 ) per base per year (Karmin et al. 2015). The standard deviation (SD) was set to cover the published confidence intervals of the calibration nodes (supplementary table S2 (Karmin et al. 2015), with SD set to 61,835 in BEAST. We used Newick Utilities (Junier and Zdobnov 2010) for tree processing, statistical analyses and plotting was conducted in R v4.1.2 (R Core Team, 2014) with the package ggplot2 (Wickham, 2009

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.