Logo of gigasciLink to Publisher's site
Gigascience. 2019 Sep; 8(9): giz117.
Published online 2019 Sep 18. doi: 10.1093/gigascience/giz117
PMCID: PMC6904868
PMID: 31531675

High-coverage genomes to elucidate the evolution of penguins

Associated Data

Supplementary Materials

Abstract

Background

Penguins (Sphenisciformes) are a remarkable order of flightless wing-propelled diving seabirds distributed widely across the southern hemisphere. They share a volant common ancestor with Procellariiformes close to the Cretaceous-Paleogene boundary (66 million years ago) and subsequently lost the ability to fly but enhanced their diving capabilities. With ∼20 species among 6 genera, penguins range from the tropical Galápagos Islands to the oceanic temperate forests of New Zealand, the rocky coastlines of the sub-Antarctic islands, and the sea ice around Antarctica. To inhabit such diverse and extreme environments, penguins evolved many physiological and morphological adaptations. However, they are also highly sensitive to climate change. Therefore, penguins provide an exciting target system for understanding the evolutionary processes of speciation, adaptation, and demography. Genomic data are an emerging resource for addressing questions about such processes.

Results

Here we present a novel dataset of 19 high-coverage genomes that, together with 2 previously published genomes, encompass all extant penguin species. We also present a well-supported phylogeny to clarify the relationships among penguins. In contrast to recent studies, our results demonstrate that the genus Aptenodytes is basal and sister to all other extant penguin genera, providing intriguing new insights into the adaptation of penguins to Antarctica. As such, our dataset provides a novel resource for understanding the evolutionary history of penguins as a clade, as well as the fine-scale relationships of individual penguin lineages. Against this background, we introduce a major consortium of international scientists dedicated to studying these genomes. Moreover, we highlight emerging issues regarding ensuring legal and respectful indigenous consultation, particularly for genomic data originating from New Zealand Taonga species.

Conclusions

We believe that our dataset and project will be important for understanding evolution, increasing cultural heritage and guiding the conservation of this iconic southern hemisphere species assemblage.

Keywords: genomics, Sphenisciformes, comparative evolution, phylogenetics, speciation, biogeography, demography, climate change, Antarctica, evolution

Data Description

Context

Penguins (Sphenisciformes) are a unique order of seabirds distributed widely across the southern hemisphere (Fig. 1). Approximately 20 extant penguin species are recognized across 6 well-defined genera (Aptenodytes, Pygoscelis, Eudyptula, Spheniscus, Eudyptes, and Megadyptes [1–3]). Debate has surrounded species/lineage boundaries in a few key areas:

  1. Divisions between New Zealand little blue (Eudyptula minor minor), New Zealand white-flippered (Eudyptula minor albosignata), and Australian fairy penguins (Eudyptula novaehollandiae) [4–6].

  2. Divisions between northern rockhopper (Eudyptes moseleyi), western rockhopper (Eudytes chrysocome), and eastern rockhopper penguins (Eudyptes filholi) [3, 7, 8].

  3. Divisions between Fiordland crested (Eudyptes pachyrhynchus) and Snares crested penguins (Eudyptes robustus) [9, 10].

  4. Divisions between macaroni (Eudyptes chrysolophus chrysolophus) and royal penguins (Eudyptes chrysolophus schlegeli) [3, 8, 11].

An external file that holds a picture, illustration, etc.
Object name is giz117fig1.jpg

Locations of breeding colonies of penguins and sampling sites for the final genomes, adapted from Ksepka et al. [1]. Sampling locations are shown with a small white ellipse. Note that the sampling location of the humboldt penguin (Spheniscus humboldti) is unclear because this individual was bred in the Copenhagen zoo, with ancestors imported from Peru and Chile in 1972. AMS: Amsterdam Island; ANT: Antipodes Islands; AUC: Auckland Islands; BOU: Bouvet; CAM: Campbell Island; CHA: Chatham Islands; CRZ: Crozet; FAL: Falkland Islands/Malvinas; GAL: Galapagos Islands; GOU: Gough Island; HEA: Heard Island; KER: Kerguelen; MAC: Macquarie Island; NZ: New Zealand; PEI: Prince Edward/Marion Island; SG: South Georgia; SNA: The Snares; SO: South Orkney Islands; SS: South Sandwich Islands.

Penguins have an extensive fossil record, with >50 extinct species documented to date [3, 12, 13], extending back >60 million years [12]. Extant penguins span a modest range of sizes [14, 15], with the emperor penguin (Aptenodytes forsteri) the largest (30 kg) and Eudyptula penguins the smallest (1 kg). In contrast, the fossil record reveals that many extinct penguin species were giants (surpassing 100 kg in body mass [13]).

The radiation of penguins provides an excellent case study for researching biogeographic impacts on speciation processes. Penguins inhabit every major coastline in the southern hemisphere, and almost every island archipelago in the Southern Ocean [16]. Their range extends to unique ecological niches, from the tropical Galápagos Islands (Galápagos penguin, Spheniscus mendiculus) to the oceanic temperate forests of New Zealand (Eudyptes pachyrhynchus), rocky coastlines of the sub-Antarctic islands (E. filholi), and the sea ice around Antarctica (Aptenodytes forsteri) [17]. For this reason, penguins have evolved many unique adaptations, specific to the variety of ecological environments. Previous studies have suggested that global climate change during the Eocene [18, 19], substantial oceanographic currents [7], and geological island uplift [3] were key drivers of penguin diversification. Although the phylogenetic relationships within penguins are relatively well understood [1, 3, 18, 20], it remains uncertain which lineage first diverged from other penguins. Molecular analyses have differed on whether Aptenodytes, Pygoscelis, or both together represent the sister taxa to all other extant penguins [3]. Both of these genera are endemic to coastal Antarctica and Antarctic and subantarctic islands, and thus a sequential branching pattern would suggest a polar ancestral area for extant penguins. In contrast, morphological data and the fossil record suggest that the more temperate-adapted genus Spheniscus was the first to diverge [3, 20]. Understanding the evolutionary diversification of penguins in respect to geological and climatic changes remains a substantial gap in understanding the biogeographic history of these iconic birds.

Although penguins are tied to landmasses for breeding and nesting [21], all species spend most of their lives at sea [22] and are therefore important components of terrestrial, coastal, and marine ecosystems [23]. While some taxa inhabit environments with strong winds and extreme cold temperatures, experiencing seasonal fluctuations in the length of daylight across the breeding and chick-rearing seasons [24], others inhabit relatively temperate or even tropical climates, with little variation in day length. The unique morphological and physiological adaptations that have evolved within penguins include the complete loss of aerial flight, where penguins instead use their flipper-like wings in wing-propelled diving [25], densely packed waterproof and insulating feathers [26, 27], visual sensitivity of the eye lens for underwater predation [28–30], dense bones, stiff wing joints and reduced distal wing musculature to overcome buoyancy in water [31–33], enhanced thermoregulation for extreme low temperatures, long-term fasting, ability to digest secreted food, delayed digestion [34–40], different plumage [41] and crest ornaments [42], and catastrophic moult [43]. As such, penguins are an excellent system to study comparative evolution of adaptive traits.

Penguins are also sentinels of the Southern Ocean [16], being particularly sensitive to human and environmental change [44, 45]. Extensive demographic monitoring programs have indicated that many penguin species are declining in response to global warming [44–46], pollution, environmental degradation, and competition with fisheries, which are considered key drivers of these population declines [47–50]. Demographic coalescent models have demonstrated dramatic population declines during the Pleistocene ice ages, followed by rapid population expansions in response to global warming [51–54]. Future global warming is predicted to cause significant population declines [44, 55–57]. Understanding past demographic histories and inferring future demographic trajectories therefore remain important steps for predicting ecosystem-wide changes in this rapidly warming part of the planet.

Although penguins are a relatively well-studied group, previous evolutionary studies have been limited by the genetic markers used, such as short mitochondrial [2, 10, 58–60] or nuclear sequences [1, 8, 61, 62], microsatellites [63, 64], partial mitochondrial genomes [3, 65], or single-nucleotide polymorphisms [11, 53, 54, 66–68]. Several studies have hinted at associations between biological patterns and climate change [51–54, 60, 69]. Only a few studies have explored genome-wide evolutionary processes among penguins [51, 70] or between penguins and other birds [71–73], and these studies have focussed on just 2 Antarctic taxa: the Adélie penguin (Pygoscelis adeliae) and Aptenodytes forsteri. These previous studies have created a basic framework to understand the timing of penguin diversification, identify population fluctuations during past climate cycles, and have hinted at the molecular basis for a range of physiological and morphological adaptations [51]. The molecular genomic basis for the unique morphological and physiological adaptations of penguins, compared to other aquatic and terrestrial birds, remains largely unknown. No previous study has attempted to explore the evolution of all penguins under a comparative genomic or evolutionary framework. In this Data Note, we present 19 new high-quality genomes that, together with the 2 previously reported genomes [51], encompass all extant penguin species. We demonstrate the quality and application of this new dataset by constructing a well-supported phylogenomic tree of penguins. These data provide a critical resource for understanding the drivers of penguin evolution, the molecular basis of morphological and physiological adaptations, and demographic characteristics. For species naming, we follow standard nomenclature; however, for Eudyptula we follow Grosser et al. [5, 74] and for Eudyptes and Megadyptes we follow Cole et al. [3].

Methods

Sample collection, library construction, and sequencing

While it is possible to recover genome sequences from historical museum samples [75], such genomes are often of low quality and/or fragmented [76], limiting the ability of downstream analyses. Our project design (see below) relies on high-coverage genomes with little missing data (see Li et al. [51]). Therefore, we designed our sample collection to include only high-quality blood samples. We collected 94 blood samples spanning 19 different penguin species (1–28 samples per species; Supplementary Table 1). Samples were derived from the wild, zoological parks, or wildlife hospitals and were obtained according to strict permitting procedures, animal ethics, and consultation with indigenous representatives (Supplementary Table 1).

DNA was extracted from each sample at 1 of 3 laboratories as follows: we used the HiPire Blood DNA Midi Kit II at BGI (Hong Kong), the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA, USA) at the University of Oxford (United Kingdom), and the KingFisher Cell and Tissue Kit in combination with the KingFisher Duo Prime Purification System at the University of Copenhagen (Denmark). All downstream methods were conducted at BGI. We diluted each DNA extraction to 20 μL using Tris-EDTA buffer. The quality and quantity of each DNA extraction was assessed by first estimating the concentration of 1 μL DNA extraction on a Microplate Reader, and DNA fragment size was evaluated by pulse gel electrophoresis or 1% agarose gel electrophoresis. Following quality control, a single sample per species was chosen for genomic library construction (Table 1).

Table 1:

Sample collection information for the 21 penguin genomes (including 2 obtained in Li et al. (51)

Latin nameCommon nameSample typeSampling locationSample labelDate extracted
Eudyptes chrysolophus schlegeli RoyalWildGreen Gorge, Macquarie Island4458October 2017
Eudyptes chrysolophus chrysolophus MacaroniWildMarion Island, Prince Edward IslandsMP PEI 1October 2017
Eudyptes pachyrhynchus Fiordland-crestedWildHarrison Cove, Milford Sound, New Zealand South IslandMS 9May 2017
Eudyptes robustus Snares-crestedDunedin Wildlife HospitalThe Snares, New Zealand sub-Antarctic68M 28/09/13September 2018
Eudyptes sclateri Erect-crestedWildAntipodes Island, New Zealand sub-AntarcticAnt 5September 2018
Eudyptes filholi Eastern rockhopperWildCrozet IslandGS 12May 2016
Eudyptes chrysocome Western rockhopperWildFalkland Islands/MalvinasRH 110–1May 2016
Eudyptes moseleyi Northern rockhopperWildAmsterdam IslandNRP 118–1May 2016
Megadyptes antipodes antipodes Yellow-eyedWildOtago Peninsula, New Zealand South IslandOT 2 9/2/18August 2018
Spheniscus magellanicus MagellanicWildChiloe Island, ChileAH 6May 2016
Spheniscus demersus AfricanWildLuderitz, NamibiaAP 173July 2018
Spheniscus mendiculus GalápagosWildGalápagos IslandsGAPE 212October 2017
Spheniscus humboldti HumboldtCopenhagen ZooPeru and Chile lineageZ-67–15October 2016
Eudyptula minor albosignata White-flipperedChristchurch Antarctic CentreBanks Peninsula, Canterbury, New Zealand South IslandFredJuly 2018
Eudyptula minor minor Little blueNational Aquarium of New ZealandNew Zealand North IslandGonzoAugust 2018
Eudyptula novaehollandiae FairyWildPhillip Island, Victoria, Australia10/9/18–1October 2018
Pygoscelis adeliae AdélieWildInexpressible Island, Antarctica[51]NA
Pygoscelis papua GentooWildWest Antarctic Peninsula, AntarcticaGentoo penguin DNA -4January 2018
Pygoscelis antarctica ChinstrapWildThule Island, South Sandwich IslandsCP TH 060November 2017
Aptenodytes patagonicus KingWildFortuna Bay, South GeorgiaKP FORT 001November 2017
Aptenodytes forsteri EmperorWildEmperor Island, Antarctica[51]NA

We constructed 1 or more genomic libraries for each of the 19 penguin species depending on the DNA quality. For species that we could obtain high molecular weight DNA with the main band longer than 40 kb, we constructed 10X Genomics genomic libraries to produce 100× coverage sequencing data (Table 2). To do this, we attached a specific unique barcode to 1 end of short DNA fragments that are broken from 1 long DNA fragment, using standard protocols provided by Chromium™ Genome Solution. Because this protocol encompasses >1 million specific barcodes in a single solution, it decreases the chance of short DNA fragments with the same barcode being derived from unrelated long DNA fragments. For those species with shorter DNA fragments (<40 kb), we constructed genomic libraries following Illumina (San Diego, CA, [77]) or BGIseq 500 [78] protocols. Those protocols resulted in several paired-end libraries with insert sizes of either 250 or 500 bp, in addition to several mate-pair libraries with insert sizes ranging from 2 to 10 kb (Table 2). We further generated 100–320× coverage sequencing data for these species. Furthermore, we did not find any significant difference in the assembly quality between Illumina and BGIseq, while the 10x strategy normally produced better assembly than the other strategy with multiple insert-sized libraries (Table 3). Following sequencing, we generated 3.24 Tb sequencing reads encompassing all 19 penguin species, obtaining >111 Gb data per species (Table 2).

Table 2:

Details of the sequencing platform used and the data statistics for 21 penguin genomes

SpeciesLibrary construction strategySequencing platformRaw data (Gb)Clean data (Gb)
Eudyptes chrysolophus chrysolophus 10XBGIseq500145.9126.9
Megadyptes antipodes antipodes 10XBGIseq500111.9104.1
Spheniscus demersus 10XBGIseq500141.1131.3
Spheniscus mendiculus 10XBGIseq500112.2104.4
Eudyptula minor albosignata 10XBGIseq500132.5124.8
Eudyptula minor minor 10XBGIseq500121.4112.7
Eudyptula novaehollandiae 10XBGIseq500180.4168.5
Pygoscelis papua 10XBGIseq500134.5124.0
Pygoscelis antarctica 10XBGIseq500154.5139.7
Aptenodytes patagonicus 10XBGIseq500147.6134.0
Eudyptes chrysolophus schlegeli 250 bp, 2 kb, 5 kb, 10 kbBGIseq500402.6296.6
Eudyptes pachyrhynchus 250 bp, 2 kb, 5 kb, 10 kbHiSeq X ten and HiSeq 4000146.4104.7
Eudyptes robustus 250 bp, 2 kbHiSeq X ten and HiSeq 4000171.2107.6
Eudyptes sclateri 250 bp, 2 kb, 5 kbHiSeq X ten and HiSeq 4000156.2103.2
Eudyptes filholi 250 bp, 2 kb, 5 kb, 10 kbHiSeq X ten and HiSeq 4000195.0146.8
Eudyptes chrysocome 250 bp, 2 kb, 5 kbHiSeq X ten and HiSeq 4000195.1111.6
Eudyptes moseleyi 250 bp, 2 kb, 5 kb, 10 kbHiSeq X ten and HiSeq 4000173.6133.1
Spheniscus magellanicus 250 bp, 2 kb, 5 kb, 10 kbHiSeq X ten and HiSeq 4000212.6150.7
Spheniscus humboldti 250 bp, 2 kb, 5 kb, 10 kbHiSeq X ten and HiSeq 4000208.8137.2

HiSeq X ten was used for sequencing small insert size libraries; HiSeq 4000 was used for sequencing mate-pair libraries.

Table 3:

Assembly statistics and BUSCO results for 21 penguin genomes within a total of 4,915 conserved avian orthologs

Library construction strategySpeciesContig N50 (bp)Scaffold N50 (bp)Genome size (bp)CompleteDuplicationFragmentedMissing
10x Eudyptes chrysolophus chrysolophus 163,84813,794,8371,368,663,69585.40%7.70%4.40%2.50%
Megadyptes antipodes antipodes 83,95423,315,1171,317,732,92391.80%1.20%4.20%2.80%
Spheniscus demersus 101,40815,386,3641,278,371,92491.30%0.90%4.70%3.10%
Spheniscus mendiculus 72,552380,9501,300,348,60988.90%1.60%5.70%3.80%
Eudyptula minor albosignata 95,77321,866,5431,374,338,38185.60%7.40%4.20%2.80%
Eudyptula minor minor 88,19021,127,6461,466,686,83184.00%8.60%4.60%2.80%
Eudyptula novaehollandiae 122,46129,280,2091,357,427,56089.00%4.70%3.80%2.50%
Pygoscelis papua 93,7852,780,8371,309,329,55390.70%1.50%5.00%2.80%
Pygoscelis antarctica 118,3366,180,2601,265,661,67691.30%1.20%4.60%2.90%
Aptenodytes patagonicus 116,7692,903,8101,256,739,11891.50%1.10%4.20%3.20%
Multi-libraries Eudyptes chrysolophus schlegeli 24,1911,877,5481,310,605,48893.20%1.50%3.30%2.00%
Eudyptes pachyrhynchus 33,3198,795,0331,310,923,78880.20%7.70%4.30%7.80%
Eudyptes robustus 29,712363,3101,248,618,55387.30%1.10%5.10%6.50%
Eudyptes sclateri 69,5621,921,2441,211,737,89993.60%1.10%3.20%2.10%
Eudyptes filholi 74,2806,429,2211,223,976,46893.20%1.00%3.60%2.20%
Eudyptes chrysocome 66,0051,949,3231,231,067,97093.80%1.00%3.00%2.20%
Eudyptes moseleyi 21,3622,248,0881,306,699,57593.60%1.20%3.00%2.20%
Spheniscus magellanicus 41,45512,679,4691,262,636,73893.10%1.30%3.50%2.10%
Spheniscus humboldti 19,8496,229,8191,243,403,14293.30%1.10%3.50%2.10%
Pygoscelis adeliae 22,1955,118,8961,216,600,03392.80%0.60%4.00%2.60%
Aptenodytes forsteri 31,7305,071,5981,254,347,44093.20%0.80%3.60%2.40%

Genome assembly and quality evaluation

Sequences obtained from the 250-bp insert size libraries and the 10x libraries were used to evaluate the genome size for each penguin using a k-mer approach [79]. Reads were scanned using a 17-bp window with 1 bp sliding and the frequency of each 17 k-mer was recorded. After all the reads were scanned, the k-mer frequency distributions were plotted and the depth with the highest frequency (K_dep) was defined. The genome size was estimated as the read number * (read length – 17 + 1)/K_dep. The filtered reads for the 10x libraries were only used for estimating the genome size with 17 k-mer, while all reads were used for Supernova assembly.

Sequencing errors have a major effect on subsequent genome assembly because they both introduce mistakes in the assembly and also decrease the assembly continuities. Several features can be linked to sequencing noise, including low-quality bases, adaptor contamination, and duplication [80]. To remove the potential biases introduced by sequencing noise, we filtered our raw sequencing reads prior to genome assembly, following strict standards including (i) discarding paired-end reads containing overlaps, (ii) removing reads with >20% low-quality bases as the quality score was <10, (iii) removing reads with >5% ambiguous N bases, (iv) removing paired-end reads containing identical sequences likely to be PCR duplicates, and (v) removing reads with adaptor sequences. Following filtering, each genome contained >104 Gb data. Overall, we obtained a total of 2.56 Tb high-quality data for all 19 penguin genomes (Table 2).

Both SOAPdenovo v. 2–2.04 (SOAPdenovo2, RRID:SCR_014986) [81] and Allpaths-LG (ALLPATHS-LG, RRID:SCR_010742) [82] were used to assemble the genomic libraries from the various insert sizes. For SOAPdenovo, paired-end reads from small insert size libraries were used to construct de Bruijn graphs, with various k-mer ranging from 23 to 47. Contigs were subsequently constructed using contig modular with the “-D 1 -g” parameter to remove edges containing coverages no larger than 1. Following this, “map -k 35 -g” was used to map mate-pair reads into contigs, with k-mer size 35. Finally, we conducted scaffolding with parameters “scaff -g -F” to assemble the contigs into longer linkages. The best version, in terms of various k-mer in the graph construction step, was chosen as the SOAPdenovo representative for each species. In addition, we also assembled genomic libraries from various insert sizes using Allpaths-LG following the default parameters. By comparing the assemblies from both SOAPdenovo and Allpaths-LG, according to both the scaffold N50 and the total length, we chose the best assembler as a representative for each of the 19 penguin species. Supernova v. 2.0 [83], recommended for 10x genomic data [83], was used to assemble those species with 10x genomic libraries, following the default parameters. The optimal assembly strategy chosen for each penguin species is listed in Supplementary Table 2. For each assembly, we used GapCloser v. 1.12 (GapCloser, RRID:SCR_015026) [81] to locally assemble and close gaps within each scaffold following the default parameters.

All penguins (including those obtained in Li et al. [51]) were estimated to have a ∼1.3-Gb genome (Fig.   2), containing little variances. Most assemblies have both a longer scaffold N50 and contig N50 than the Aptenodytes forsteri and Pygoscelis adeliae assemblies obtained by Li et al. [51] (Fig. 2). In total, the 21 genomes contained a scaffold N50 >1 Mb, and of those, 13 genomes contained a scaffold N50 >3 Mb. All penguin genomes contain a contig N50 >19 kb and 15 of the genomes are >30 kb. The maximum contig N50 extends to 163 kb for the macaroni penguin (Eudyptes chrysolophus chrysolophus) (Fig. 2). The highest-quality genome is Eudyptula novaehollandiae, encompassing a 29.3-Mb scaffold N50. Therefore, our results demonstrate consistency and high quality among all 21 penguin genomes (Fig. 2).

An external file that holds a picture, illustration, etc.
Object name is giz117fig2.jpg

Genome assembly statistics of all penguin species. A, Dot plot of the quality of each index showing contig N50 (maximum is Eudyptes chrysolophus chrysolophus with 163,848 bp; minimum is Spheniscus humboldti with 19,849 bp) and scaffold N50 (maximum is Eudyptula novaehollandiae with 29,280,209 bp; minimum is Eudyptes robustus with 363,310 bp). Each symbol indicates a penguin species, the x-axis indicates the scaffold N50, and the y-axis indicates the contig N50 for each species. B, Genome size for each penguin species (maximum is Eudyptula minor with 1,466,686,831 bp; minimum is Eudyptes sclateri with 1,211,737,899 bp). C, BUSCO assessments of all penguin genomes, showing the percentage of complete, duplicated, fragmented, or missing data. See Table   3 for more details. The symbols for each penguin species correspond to the symbols used in Fig. 1. and Fig. 3.

The genome assembly completeness provides an evaluation of the assembly quality. We used BUSCO v. 3.0.2 (BUSCO, RRID:SCR_015008) [84] to evaluate our newly assembled penguin genomes with the avian database aves_odb9, which encompasses 4,915 conserved avian orthologs (Table 3). Only ∼3% of the core genes in aves_odb9 could not be annotated on the 21 penguin genomes (ranging between 2% and 7.8%). This demonstrates that all 21 penguin genomes are near-complete, containing only a few gaps. We identified an average of 90% complete core genes on each of the 21 penguin genomes, with the richest being 93.8% on Eudyptes chrysocome. Furthermore, when several genes were annotated in >1 copy, we considered them to be duplications. Duplication rates among the 21 penguin genomes varied only between 0.6% and 8.6%. In addition, only ∼4% of the core genes were partly annotated on each of the 21 penguin genomes (Fig. 2). Overall, we obtained almost-complete, high-quality genomes. Our genomic dataset (including those obtained in Li et al. [51]) encompasses all extant penguin species, representing a comprehensive dataset.

Repeat annotation

We used RepeatMasker v. 4.0.7 (RepeatMasker, RRID:SCR_012954) [85, 86], TRF v. 4.09 [87], and RepeatModeler v. 1.0.8 (RepeatModeler, RRID:SCR_015027) [88, 86] to identify repetitive sequences in each of the penguin genomes. We compared our genomes to 5 avian outgroups: wedge-rumped storm petrel (Hydrobates tethys), Wilson's storm petrel (Oceanites oceanicus), Atlantic yellow-nosed albatross (Thalassarche chlororhynchos), zebra finch (Taeniopygia guttata), and chicken (Gallus gallus). Genome sequences were aligned to RepBase23.04 [89] through RepeatMasker, and each hit was further classified into detailed categories. Tandem repeats, which are a series of DNA sequences containing >2 adjacent copies, were identified with TRF using the default parameters. In addition, we used RepeatModeler in a de novo repeat family identifying approach. All identified repeat elements were classified into 7 categories (DNA, long interspersed nuclear element [LINE], short interspersed nuclear element [SINE], long terminal repeat [LTR], other, unknown, tandem repeat) according to classification in repeat databases. Repeat annotations using the 3 methods were combined into a non-redundant repeat annotation for each penguin genome and the 5 outgroups.

Approximately 10% of the genome sequences were identified as repeat elements on each penguin genome, which is similar to the 5 outgroups (Table 2). Although all penguin genomes had similar repeat content, they varied in content for each category. In all penguins and outgroups, the most abundant repeat category was LINE. E. moseleyi has the richest tandem repeats of 3.52%, which is substantially greater than Aptenodytes forsteri, which has a richest tandem repeats of 2.24% and contains the second richest tandem repeats repeat in all penguins. Eudyptula minor minor had the most genome sequences identified as LTR (4.26%). See Table 4 for specific details on repeat annotations for each species.

Table 4:

Repeat annotation results for 21 penguins and 5 outgroups

 SpeciesDNALINESINELTROtherUnknownTRFTotal
Length (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genomeLength (bp)% in genome
Eudyptes chrysolophus schlegeli 10,967,9930.8456,600,2584.321,886,0420.1423,772,8201.811,7090.000137,181,8430.5527,041,0732.06122,778,3149.37
Eudyptes chrysolophus chrysolophus 9,840,5770.7281,007,8975.922,325,6300.1742,950,4883.142,1090.000156,349,6690.467,624,7520.56147,221,28310.80
Eudyptes pachyrhynchus 9,700,5490.7457,537,4114.391,761,6710.1326,951,8712.067,1630.000558,778,9950.6715,315,1091.17115,154,4998.78
Eudyptes robustus 10,035,1610.8054,876,9084.401,694,8960.1421,900,2401.751,1970.0000966,793,7840.5413,082,3501.05105,161,0388.42
Eudyptes sclateri 9,603,1060.7957,388,3364.741,648,5340.1422,555,2831.862,1550.000185,455,8960.457,045,8580.58101,615,9428.39
Eudyptes filholi 9,447,8240.7758,471,1854.781,894,9150.1623,146,9531.892,6620.000228,146,7130.677,812,6340.64104,766,9148.56
Eudyptes chrysocome 9,067,9620.7458,040,2644.711,608,6440.1322,515,8091.832,0950.000177,321,7220.607,332,6110.60103,276,4478.39
Eudyptes moseleyi 9,367,9540.7258,805,4254.501,990,4690.1523,593,7671.812,6640.000209,786,6330.7545,959,2933.52141,103,33010.80
Megadyptes antipodes antipodes 9,608,3490.7378,978,6185.991,728,5240.1346,464,4183.531,0590.0000808,168,7850.627,802,0480.59148,977,69311.30
Spheniscus magellanicus 10,393,3490.8265,351,0675.181,812,3550.1426,759,5432.121,5460.000129,851,2370.7810,398,9340.82118,099,1799.35
Spheniscus demersus 9,811,4670.7772,969,2935.711,610,1710.1334,709,6832.721,5090.0001220,385,5571.596,712,6980.53130,219,70910.2
Spheniscus mendiculus 10,792,0370.8380,340,7736.181,694,4280.1343,906,0263.382,2650.0001713,023,3351.007,421,9790.57147,721,43111.4
Spheniscus humboldti 9,850,5230.8063,427,9715.102,095,4390.1726,032,1872.092,6100.000217,051,3640.5710,846,5630.87115,794,6799.31
Eudyptula minor albosignata 10,287,2540.7586,732,4466.312,230,4420.1649,548,7593.612,2850.0001710,370,6410.768,661,2850.63160,541,23911.70
Eudyptula minor minor 10,691,1410.7395,293,4826.501,790,4480.1262,515,5344.262,2450.000158,460,2990.589,083,7820.62183,740,28412.5
Eudyptula novaehollandiae 10,542,9980.7887,757,4666.461,654,9000.1253,144,6573.921,5220.0001112,914,7200.958,531,8300.63164,989,80112.20
Pygoscelis adeliae 8,905,9650.7352,089,8164.281,643,6840.1417,580,6861.451,6850.000146,938,9500.578,565,4830.7093,839,1287.71
Pygoscelis papua 10,878,0360.8379,578,5036.081,683,5740.1347,004,7883.592,1630.000178,393,8770.647,857,9580.60151,240,87711.60
Pygoscelis antarctica 10,021,1090.7975,467,7825.961,660,0230.1336,515,9882.891,6450.000135,649,5210.456,850,7330.54133,620,72810.60
Aptenodytes patagonicus 9,883,8300.7972,143,8445.741,669,2480.1333,210,7182.642,2730.000185,987,8570.486,868,1650.55126,913,55410.10
Aptenodytes forsteri 9,648,9880.7747,421,2283.781,755,2520.1414,998,9791.201,0550.0000845,984,1140.4828,075,5182.24103,411,4678.24
Hydrobates tethys 10,174,8350.8543,642,7503.651,593,2480.1313,363,1321.121,7800.000156,044,0780.5110,375,0340.8782,871,3656.93
Oceanites oceanicus 8,172,7570.6953,982,1744.581,518,2130.1319,561,6011.662,2020.000196,101,2430.5210,501,1410.8997,111,6238.24
Thalassarche chlororhynchos 10,390,4490.9341,856,1393.741,766,0940.1614,374,6961.292,0350.000185,822,9590.526,943,8030.6279,491,4037.11
Taeniopygia guttata 5,985,0510.4951,144,9024.15883,3240.07250,817,6044.124,7130.0003813,099,8291.0625,800,7762.09137,289,21711.10
Gallus gallus 13,929,7891.3378,779,2797.52571,0670.05521,043,1142.011,6380.0001620,514,5321.9610,603,8611.01129,394,28812.40

Protein-coding gene annotation

We used the annotation methods developed by The Bird 10,000 Genomes (B10K) consortium [90] to annotate the 21 penguin genomes. Prior to annotating the protein-coding genes, a non-redundant avian reference gene set, consisting of protein sequences from Taeniopygia guttata and Gallus gallus, was generated [71]. Whole-genome protein sequences of Ensembl gene sets (release-85) of Taeniopygia guttataand Gallus gallus were then used to identify 12,337 orthologs based on whole-genome synteny relationships that were downloaded from the UCSC Genome Browser [91]. For both Taeniopygia guttata and Gallus gallus, we compared the 2 proteins in each ortholog and chose the longer homologous sequence with the human ortholog protein sequence in the reference gene set. Within 12,337 orthologs, 6,888 from Taeniopygia guttata and 5,449 from Gallus gallus were selected as the reference gene set. Following this, specific genes of Taeniopygia guttata or Gallus gallus were added to the reference gene set. This reference gene set comprised 5,084 Taeniopygia guttata genes without Gallus gallus orthologs and 3,158 G. gallus genes that had not been identified as ortholog genes to Taeniopygia guttata. Finally, protein sequences were filtered if they contained <50 amino acids, consisted of function as transposons/retrotransposons, or contained only a single non-functional exon. The final avian reference gene set therefore contained 20,181 protein-coding genes.

To annotate the protein-coding genes from the penguin genomes, protein sequences from the avian reference gene set were then mapped to each of the 21 penguin genomes. First, protein sequences were aligned to each penguin genome using TBLASTN v. 2.2.2 (TBLASTN, RRID:SCR_011822) [92] with a 1e−5 e-value cut-off. Multiple adjacent hits from the same protein were then linked together using genBlastA v. 1.0.4 [93] to obtain the candidate gene boundary. A candidate hit was removed if a protein had <30% amino acids aligned to the penguin genome. For each candidate hit for each protein, we extracted genomic sequences covering this hit with 2 kb upstream and downstream of the extension. Extracted genome sequences and corresponding homologous protein sequences were then prepared as input for GeneWise v. 2.4.1 (GeneWise, RRID:SCR_015054) [94] to the annotated protein-coding gene models, which included exon and intron boundaries. Coding sequences for each annotated gene model were extracted from each genome according to the annotated gene model, and then each coding sequence was translated into the protein sequence. This annotated protein sequence was then aligned with the corresponding homolog protein sequence using MUSCLE v. 3.8.31 (MUSCLE, RRID:SCR_011812) [95], while removing annotated proteins with <40% identity with the corresponding homolog protein sequence. Annotated proteins with <30 amino acids and annotated proteins containing >2 frame shifts or 1 premature stop codon were then removed. If a genome locus had been annotated using several gene models, the gene model with the highest identity with the corresponding homolog protein was selected. Therefore, the annotated gene set for our penguin genomes contained no overlapping genes.

Protein sequences from human (hg38) and avian transcripts were also mapped to each penguin genome and the annotated gene models (as above). For the avian transcripts dataset, we obtained 71 avian transcriptomic samples from NCBI [96] (Supplementary Table 3) and assembled those into transcripts using either Newbler v2.9 [97] for 454 sequencing assemblies or Trinity v20140717 [98] for Illumina sequencing assemblies. We used ORFfinder [96] to identify open reading frames (ORFs) for transcripts, and the protein sequences were then translated from the ORF. The protein sequences translated from the transcripts were then mapped to the avian reference gene set and the human protein sequences, while removing those with similarity to the avian reference gene set or the human protein sequences. Transcripts with ORF length <150 bp were also removed. Protein sequences from 5,257 transcripts were then used for annotation. Three gene model sets annotated from the avian reference gene set, the human protein sequences, and transcriptome were then combined into a final non-redundant gene set. We prioritized 3 gene model sets in the following order: avian reference gene set > human protein > transcriptome.

After applying the above methods, we annotated the 19 newly assembled penguin genomes, as well as the 2 previously published penguin genomes [51]. We identified ∼16,000 genes on each penguin genome, which is similar to the genomes of Taeniopygia guttata and Gallus gallus. The average gene length and coding sequence length are ∼19 and 1.3 kb, respectively. Each gene encompasses ∼8 exons, with an average length of 170 bp. Intron lengths are an average length of 2.6 kb (Table 5).

Table 5:

Protein-coding gene statistics of all 21 penguin genomes and 5 outgroups

SpeciesNumber of protein-coding genesMean gene length (bp)Mean coding sequence length (bp)Mean exons per geneMean exon length (bp)Mean intron length (bp)
Eudyptes chrysolophus schlegeli 17,19118,8601,3517.91712,540
Eudyptes chrysolophus chrysolophus 16,31120,2481,3928.21702,623
Eudyptes pachyrhynchus 19,17017,3941,3067.41782,535
Eudyptes robustus 17,12616,2541,2957.41742,329
Eudyptes sclateri 15,78619,6271,4028.21712,527
Eudyptes filholi 15,96319,9591,4078.21712,562
Eudyptes chrysocome 16,28019,4361,3828.11712,555
Eudyptes moseleyi 16,81219,7671,3708.01712,621
Megadyptes antipodes antipodes 16,56318,5091,3347.81712,533
Spheniscus magellanicus 16,79519,3111,3818.11712,535
Spheniscus demersus 16,13419,0291,3447.81712,584
Spheniscus mendiculus 16,39017,0971,3117.61722,382
Spheniscus humboldti 16,58719,6421,3878.11702,558
Eudyptula minor albosignata 17,42418,8371,3387.81722,574
Eudyptula minor minor 17,80219,0781,3497.81722,598
Eudyptula novaehollandiae 17,18819,2711,3557.91722,609
Pygoscelis adeliae 14,46320,5951,3858.31682,648
Pygoscelis papua 16,69818,2761,3337.81722,503
Pygoscelis antarctica 15,48819,5201,3818.11712,558
Aptenodytes patagonicus 15,19519,5961,3848.11702,552
Aptenodytes forsteri 15,59319,8441,3818.11702,584
Hydrobates tethys 15,91517,8981,3448.11652,323
Oceanites oceanicus 16,05517,9361,3568.01702,377
Thalassarche chlororhynchos 13,34710,0291,1106.41751,667
Taeniopygia guttata 19,17414,7871,1967.21672,198
Gallus gallus 17,88316,9651,4148.31712,135

Gene function annotation

To assign functions to each gene, we aligned each gene to 3 functional databases: Swiss-Prot release-2019_03 [99], InterPro v. 68.0 (InterPro, RRID:SCR_006695) [100], and KEGG v89.1 (KEGG, RRID:SCR_012773) [101]. Protein sequences of each gene were aligned to Swiss-Prot database using BLASTP [92], and the function of the best hit was selected as the function annotation for this gene. We then searched InterPro databases that encompass ProDom, PRINTS, Pfam, SMART, PANTHER, ProSiteProfiles, and ProSitePatterns to obtain the motifs and domains for each gene. Gene Ontology [102] terms for each gene were obtained from the corresponding InterPro entry. To identify the pathways in which the gene might be involved, protein sequences for each gene were then aligned against the KEGG database using BLASTP. For each penguin genome, a total of >99% of the protein-coding genes were assigned ≥1 function annotation in each penguin, which is similar to the 5 outgroups (Table 6). Overall, >95% of the protein genes were assigned a Swiss-Prot function, demonstrating high-quality gene sets.

Table 6:

Function annotation results for protein-coding genes for 21 penguins and 5 outgroups

SpeciesSwissprotKEGGInterproOverall
Number%Number%Number%Number%
Eudyptes chrysolophus schlegeli 16,73997.3715,34789.2716,91698.4017,06499.26
Eudyptes chrysolophus chrysolophus 15,86397.2514,64689.7916,05198.4116,19199.26
Eudyptes pachyrhynchus 18,68097.4417,25089.9818,87398.4519,02899.26
Eudyptes robustus 16,58096.8115,50090.5116,81698.1916,98899.19
Eudyptes sclateri 15,38397.4514,17289.7815,54098.4415,66499.23
Eudyptes filholi 15,55597.4414,36289.9715,69698.3315,84099.23
Eudyptes chrysocome 15,69296.3914,73290.4915,97798.1416,14899.19
Eudyptes moseleyi 16,37797.4115,15390.1316,54098.3816,68899.26
Megadyptes antipodes antipodes 15,75595.1214,99390.5216,26498.1916,44599.29
Spheniscus magellanicus 16,37197.4815,13690.1216,53298.4316,67099.26
Spheniscus demersus 15,38895.3814,57990.3615,83998.1716,00199.18
Spheniscus mendiculus 15,71495.8814,80190.3116,09098.1716,25499.17
Spheniscus humboldti 16,17297.5014,95490.1516,31998.3816,46099.23
Eudyptula minor albosignata 16,61595.3615,77890.5517,09898.1317,29799.27
Eudyptula minor minor 16,99495.4616,07390.2917,47698.1717,66399.22
Eudyptula novaehollandiae 16,42395.5515,56190.5316,89298.2817,06099.26
Pygoscelis adeliae 13,96496.5513,05490.2614,22098.3214,34899.20
Pygoscelis papua 15,93195.4115,09790.4116,37898.0816,55399.13
Pygoscelis antarctica 15,05097.1713,85389.4415,22498.3015,36099.17
Aptenodytes patagonicus 14,80897.4513,49388.8014,95498.4115,06399.13
Aptenodytes forsteri 15,05396.5414,11290.5015,30898.1715,47899.26
Hydrobates tethys 15,49397.3514,27389.6815,62898.2015,77599.12
Oceanites oceanicus 15,62297.3014,41289.7715,77598.2615,91999.15
Thalassarche chlororhynchos 12,95897.0911,88189.0213,07297.9413,21999.04
Taeniopygia guttata 18,36795.7917,11589.2618,53796.6818,91898.66
Gallus gallus 16,76093.7215,58587.1517,07995.5017,26396.53

Phylogenomic reconstruction

To understand the evolutionary history of all extant penguins, we created a phylogeny of penguins using the genomic-level orthologs with coalescent-based ExaML and concatenation-based methods MP-EST and ASTRAL [103–105]. We first applied rigorous filtering steps to obtain 7,235 high-quality orthologs. This was achieved by filtering ∼13,214 orthologs (BLAST reciprocal best hits [RBHs]) that were present in the Taeniopygia guttata genome and the 21 penguins/5 avian outgroup genomes (described above), retaining orthologs with no missing data, and removing sequences containing internal stop codons. We aligned and filtered our alignment data using several methods: (i) protein sequences were aligned using MAFTT v. 7.313 [106] following “linsi” parameters for local, iterative progressive alignment; (ii) we also applied column-based alignment filtering using trimAl v. 1.4.rev22 [107], using the parameter “automated1” to heuristically choose trimming parameters based on input alignment characters; (iii) nucleic acid alignments were also obtained using trimAl, using the parameter “backtrans” to obtain a back-translation for a given amino acid alignment. Alignment filtering was applied to (i) the column-based alignments, by removing all missing data, and retaining alignment lengths >50 bp (resulting in 7,229 orthologs, the “TrimAl data” set); and (ii) applying a full-matrix occupancy to the no missing dataset (retaining 7,011 orthologs, the “No missing data” set) following the pipeline published previously [108]. Loci containing no missing taxa were then retained, by removing alignment columns containing gaps, undetermined bases (Ns), or ambiguity characters and loci with a post-filtering alignment length <200 bp.

We constructed gene trees for each locus using RAxML v8.2.12 (RAxML, RRID:SCR_006086) [109] and then constructed phylogenomic trees using 2 coalescent-based methods, MP-EST v. 2.0 and ASTRAL-III, based on the gene trees. First, we used RAxML v. 8.2.12 to infer the highest-scoring maximum likelihood tree from unpartitioned alignments for each locus using a GTR+GAMMA substitution model, 20 independent tree searches beginning from random starting tree topologies, and 500 bootstrap replicates for each locus. Resulting gene trees were rooted with Gallus gallus using the “ape” package in R v. 3.5.2 [110]. We then created a coalescent-based phylogenetic tree using MP-EST v. 2.0 [104] by estimating trees from a set of rooted gene trees by maximizing a pseudo-likelihood function. Species tree and bootstrap topology searches were achieved over 3 independent replicates, using a different starting seed and with 10 independent tree searches per run. The highest-scoring tree in 10 tree searches was kept as the result for each replicate. Because the 3 final trees from MP-EST replicates shared the same tree topology, we kept the highest-scoring tree as the final tree for further analysis. Branch lengths were re-estimated in coalescent units of substitutions per site by constraining alignments to the MP-EST tree topology using the “-f E” option in ExaML v.3.0.21 [103]. Bootstrap values were plotted using RAxML based on the bootstrap replicates, and trees were outgroup-rooted with G. gallus. In addition, we used the coalescent-based method ASTRAL-III [105] with default parameters to obtain the tree with the maximum number of shared induced quartet trees in the set of unrooted gene trees, constrained by the set of bipartitions in the tree based on a predefined set of partitions. The inferenced trees also shared the same tree topology with the MP-EST results. Then, the concatenation-based phylogenomic inference was conducted using ExaML v3.0.21. This was achieved using a GTR+GAMMA substitution model on the partitioned (each locus as a separate partition), concatenated alignments, and inferring the topology from 21 full maximum likelihood tree searchers: 20 beginning with random starting trees, and a single search beginning with the random stepwise addition order parsimony tree conducted using RAxML. For each dataset, 100 ExaML bootstrap replicates were conducted and convergence was assessed according to the bootstrapping analysis and applying a majority-rule consensus tree criterion in RAxML with option “-I autoMRE”. We then compared the resulting trees obtained using the “TrimAl data” and the “No missing data” from coalescent-based MP-EST and ASTRAL with concatenation-based ExaML (Supplementary Fig. 1).

While the resulting topologies of the outgroups Hydrobates tethys, Oceanites oceanicus, and Thalassarche chlororhynchos are slightly different between coalescent-based and concatenation-based methods, the topologies of our penguin genomes are identical using both methods (Fig. 3). Our final phylogeny (Fig. 3) encompassing all extant penguin genomes is slightly different to a recent phylogenetic study using mitochondrial genomes [3]. Specifically, while the mitochondrial phylogeny suggested that Aptenodytes+Pygoscelis are sister to all other penguins, our full genome phylogeny suggests that Aptenodytes alone is sister to all other penguins. This result confirms earlier results combining data from a small set of mitochondrial genes and the nuclear RAG-1gene [1, 62] and provides intriguing new evidence on the historical biogeographical and evolutionary patterns of adaptation to Antarctica. We expect this novel genomic dataset to provide further important insights into the evolution of penguins in the southern hemisphere.

An external file that holds a picture, illustration, etc.
Object name is giz117fig3.jpg

Phylogenomic reconstruction of penguins inferred by the ExaML method with no missing data. The topology of all clades was strongly supported (bootstrap support: 100). The topology and support were identical using the MP-EST and ASTRAL methods (with no missing data) except for the outgroup (bootstrap support for the split between Hydrobates tethys and Oceanites oceanicus: 37) and within the penguin genus Spheniscus (bootstrap support for the split between the African penguin [Spheniscus demersus] and the magellanic penguin [S. magellanicus]: 97).

Re-use Potential

Consortium organization and further research plans

The 19 high-coverage genomes presented here, along with the Aptenodytes forsteri and Pygoscelis adeliae genomes presented by members of our consortium in 2014 [51], provide an exciting resource for understanding evolutionary diversification, the molecular basis for unique functional adaptation, and demographic histories of penguins. The Penguin Genome Consortium is an international team of scientists with backgrounds in marine ornithology, ecology, molecular biology, evolutionary and comparative genomics, phylogenetics, physiology, palaeontology, veterinary science, and bioinformatics. The diverse skills encompassed within our highly collaborative consortium will be essential to study these genomes under comparative genomic and evolutionary frameworks. In doing so, we will expand on [51] by investigating 3 key areas related to penguin evolution and adaptation.

Evolutionary relationships and taxonomic boundaries

With a deep evolutionary history, and diverse radiation, penguins provide an exciting system to understand the evolutionary drivers of diversification [3]. Moreover, robust taxonomic frameworks can be crucial for directing limited conservation resources for maximum gains. Significant uncertainty remains regarding species/lineage boundaries between some closely related penguin taxa. The genomes generated here therefore provide an exciting new dataset to examine taxonomic, phylogenomic, and biogeographical patterns for understanding penguin evolution.

Comparative genomics and adaptation

Penguins provide an excellent system to study comparative evolutionary adaptation [51]. We will use our genomes to explore comparative evolution among penguins, and between penguins and other avian orders. By examining loci under positive selection, we shall reveal the molecular basis for the unique physiological and morphological adaptations to different environments and ecologies that are exhibited by penguins.

Penguins in a changing world

Penguins are sensitive indicators of environmental change [44, 45]. It is predicted that future climate change will lead to significant declines in many penguin populations [47–50]. Conservation management decisions can be guided by demographic assessments. However, there remains a substantial gap in predicting ecosystem-wide changes to future climate change. As such, demographic analyses of these genomes will be critical for conservation management of penguins and other Southern Ocean assemblages.

Cultural significance

The context in which wildlife research in New Zealand is undertaken is evolving rapidly and heading into new legal and novel cultural contexts [111–114]. Recent initiatives such as the bestowing of the rights of an individual on Te Urewera, a former national park, set an international precedent for this change in approach [115]. Therefore, it is critical that research permissions be obtained and appropriate indigenous consultation with Iwi, Rūnanga, Whānau, and Hapū be conducted. The regulatory arm of the New Zealand government in this process, the Department of Conservation, is legally required to give effect to the Principles of the Treaty of Waitangi [116] in its administration of the legislation pursuant to which Authorities are issues.

At another level the Ngāi Tahu Deed of Settlement Act recognizes all native penguin species as Taonga, or treasured possessions [117]. Consequently, not only is it a legal requirement to undertake rigorous Māori consultation when studying Taonga [118, 119], the Department of Conservation has to have particular regard to the views of Iwi, Rūnanga, Whānau, or Hapū when considering whether to authorize any application. Recent discussions have also emphasized that Taonga genomes are sacred (tapu) because they are considered to contain both the living and the future generations (whakapapa, mauri, and wairua of tipuna), engendering Māori concerns surrounding the commercialization, ownership, storage, and modification of Taonga genomes [120]. We generated Taonga genomes encompassing hoiho (yellow-eyed penguin, Megadyptes antipodes antipodes), kororā (little penguin, Eudyptula spp.), pokotiwha (Snares-crested penguin, E. robustus), tawaki (Fiordland-crested penguin, E. pachyrhynchus), and erect-crested penguin (Eudyptes sclateri). These genomes were obtained following rigorous Department of Conservation permitting procedures (including collection, holding, and exporting permits) and following Department of Conservation Iwi, Rūnanga, Whānau, or Hapū consultation (Supplementary Table 1). Several of the Taonga genomes studied here were collected alongside broader research projects, and additional consultation efforts were undertaken for those projects. We emphasize that there will be no commercialization, ownership, or modification of any of the genomes presented here. While these Taonga genomes will be publicly available, it is critical that new researchers studying these genomes take the appropriate steps to seek additional Māori permissions and consultation, which will ensure respect of New Zealand cultural values.

The emerging issues surrounding the generation and use of Taonga genomes also highlight that Māori consultation should also be undertaken when obtaining genomes from Taonga housed in overseas museum collections. We hope that the data and our research questions presented here, and our future research outputs using these genomes will be valuable for both cultural heritage and for conservation management of penguin populations.

Early-release use of the data

The Fort Lauderdale [121] and Toronto [122] agreements state that in exchange for early release of datasets, the data producers retain the right to be the first to describe and analyse the complete datasets in peer-reviewed publications. Comparative and evolutionary genomic analyses are currently being carried out, and the consortium welcomes new members interested in contributing to this work. While this work is still underway we have published these 19 penguin genomes to provide early access, while requesting researchers intending to use these data for similar cross-species comparisons to continue to follow the long-running Fort Lauderdale and Toronto rules.

Conclusions

Genomics is prohibitively costly—it requires high-quality samples and extensive laboratory and bioinformatic skills. The genomics era has been boosted by global research consortiums, which bring together contextual, technical, and analytical skills spanning a network of international collaborations [123–126]. Our consortium and dataset introduced here are no exception, and as such, we expect our future research using these genomes to bring together additional collaborators that encompass a wide range of expertise regarding penguin biology and physiology. At another level, collecting high-quality fresh blood samples from some of the most remote regions in the Southern Ocean remains technically and logistically difficult, requiring the efforts and long-term organization from many collaborations and expedition programs. While this study is an exciting development for understanding the evolution of penguins, the global efforts involved in designing our study, obtaining samples, and developing appropriate sequencing and bioinformatic pipelines have been extensive. The dataset and project design introduced here highlight the need for transparent research projects and global collaborations, which together maximize the use of samples, minimizing sequencing costs, and laboratory and analytical efforts.

In this study we have presented 19 new high-coverage penguin genomes. Together with 2 genomes previously obtained by members of our consortium [51], this combined dataset encompasses the genomes of all extant penguin species. We have also constructed a comprehensive phylogenomic tree encompassing all extant penguins. We will use these datasets to address a range of evolutionary, adaptive, biogeographic, and demographic questions regarding penguins. As such, we hope not only that our ongoing projects that encompass these genomes will provide novel insights for understanding the broad evolution and adaptation of avifauna to different environments but also that this knowledge will increase cultural heritage and aid conservation management decisions for remote Southern Ocean regions.

Availability of supporting data and materials

The genome sequencing data and assemblies of this study have been deposited in the CNSA (https://db.cngb.org/cnsa/) of the CNGBdb database with the accession number CNP0000605, as well as the NCBI database with the Bioproject ID PRJNA556735 (Aptenodytes patagonicus: SAMN12384866; Eudyptes chrysolophus chrysolophus: SAMN12384869; E. c. schlegeli: SAMN12384870; E. chrysocome: SAMN12384872; E. filholi: SAMN12384873; E. moseleyi: SAMN12384871; E. pachyrhynchus: SAMN12384875; Eudyptes robustus: SAMN12384876; E. sclateri: SAMN12384874; Eudyptula minor albosignata: SAMN12384880; E. m. minor: SAMN12384879; E. novaehollandiae: SAMN12384878; Megadyptes antipodes antipodes: SAMN12384877; Pygoscelis antarctica: SAMN12384868; P. papua: SAMN12384867; Spheniscus demersus: SAMN12384881; S. humboldti: SAMN12384883; S. magellanicus: SAMN12384882; S. mendiculus: SAMN12384884. Data from all of the penguin species are also available from the GigaScience GigaDB database [127].

Abbreviations

BLAST: Basic Local Alignment Search Tool; bp: base pairs; BUSCO: Benchmarking Universal Single-Copy Orthologs; CNSA: CNGB Nucleotide Sequence Archive; ExaML: Exascale Maximum Likelihood; Gb: gigabase pairs; kb: kilobase pairs; KEGG: Kyoto Encyclopedia of Genes and Genomes; LINE: long interspersed nuclear element; LTR: long terminal repeat; Mb: megabase pairs; NCBI: National Center for Biotechnology Information; ORF: open reading frame; RAxML: Randomized Axelerated Maximum Likelihood; SINE: short interspersed nuclear element; TRF: Tandem Repeat Finder; UCSC: University of California Santa Cruz.

Ethics approval and consent to participate

All samples were obtained under valid animal ethics permits.

Competing interests

The authors declare that they have no competing interests.

Funding

This project was supported by the National Key R&D Program of China (MOST) grant 2018YFC1406901 and by the Science, Technology and Innovation Commission of Shenzhen Municipality grant No. JCYJ20170817150721687 and JCYJ20170817150239127. T.L.C. was supported by an Otago University postgraduate publishing bursary. G.Z. was supported by the Lundbeckfonden (grant No. R190–2014-2827), Carlsbergfondet (grant No. CF CF16–0663), the Villum Foundation (grant No. 25900), and by the Strategic Priority Research Program of the Chinese Academy of Science (grant No. XDB13000000, XDB31020000). M.T.P.G. was supported by the ERC Consolidator Grant 681396 “Extinction Genomics”.

Authors’ contributions

G.Z. developed the concept; G.Z., D.-X.Z., T.L.C., and H.P. designed the project and wrote the manuscript; L.S.A., J.L.B., M.F.B., P.D.B., T.L.C., Y.C., P.D., U.E., S.R.F., S.G., D.M.H., P.H., T.H., E.K., K.L., G.M., T.M., L.J.N., P.P., P.G.R., D.R.T., H.T., and M.J.Y. collected and/or provided samples; J.L.B., T.L.C., A.H.R., T.H., K.J., B.M., T.S., D.R.T., and G.Z. facilitated sample collection; H.P., S.R.F., M.R.E., M.-H.S.S., and G.P. undertook laboratory work. H.P., X.B., M.F., C.Z., and Z.Y. undertook the bioinformatics work; G.Z., T.L.C., H.P., D.T.K., C.-A.B., M.R.E., P.G.B., M.T.P.G., T.H., J.F.M., R.A.P., A.J.D.T., L.D.S., M.-H.S.S., and P.Q. helped design sampling and project directions. All authors contributed to the final manuscript.

Additional files

Supplementary Figure 1: Phylogenomic trees.

Supplementary Table 1: Sampling and permitting details of all penguin samples tested.

Supplementary Table 2: Assemblers and Kmer sizes used for each penguin.

Supplementary Table 3: Information of 71 avian transcriptomic samples downloaded from NCBI.

giz117_GIGA-D-19-00280_Original_Submission

giz117_GIGA-D-19-00280_Revision_1

giz117_Response_to_Reviewer_Comments_Original_Submission

giz117_Reviewer_1_Report_Original_Submission

Hyun Park -- 8/14/2019 Reviewed

giz117_Reviewer_2_Report_Original_Submission

Taras K Oleksyk, Ph.D. -- 8/26/2019 Reviewed

giz117_Supplemental_File

ACKNOWLEDGEMENTS

We thank the following: John Cockrem, Scott Flemming, Helen McConnell, Chris Rickard, Sarah Fraser, Otto Whitehead, Kyle Morrison, and Amy Van Buren for help collecting samples; Jonathan Banks, Kirsten Rodgers, and Jo Hiscock for sample information; Manuel Paredes Oyarzún and Hernán Rivera Meléndez for facilitating permits and sample collection; Lauren Tworkowski, Richard O'Rorke, and Joanna Sumner for facilitating sample collection; Adrian Smith for providing laboratory support to extract 2 DNA samples; Peter Dearden, Neil Fowke, Michael Knapp, Hoani Langsbury, Claire Porima, Nic Rawlence, Paul Scofield, Jonathan Waters, Janet Wilmshurst, and Jamie Wood for informal discussions regarding the New Zealand indigenous consultation; The NewZealand Department of Conservation for facilitating New Zealand indigenous consultation and approving permits, particularly Neil Fowke and Jesse Mason for facilitating permits and/or obtaining past permit details; Brett Gartrell and Pauline Nijman for providing animal ethics details; and the China National Genebank for contributing the sequencing resources for this project. The Penguin Genome Consortium welcomes participation and collaboration for our ongoing work regarding comparative and evolutionary genomics of penguins.

References

1. Ksepka DT, Bertelli S, Giannini NP. The phylogeny of the living and fossil Sphenisciformes (penguins). Cladistics. 2006;22(5):412–41. [Google Scholar]
2. Cole TL, Waters J, Shepherd LD, et al. .. Ancient DNA reveals that the ‘extinct’ Hunter Island penguin (Tasidyptes hunteri) is not a distinct taxon. Zool J Linn Soc. 2018;182(2):459–64. [Google Scholar]
3. Cole TL, Ksepka DT, Mitchell KJ, et al. .. Mitogenomes uncover extinct penguin taxa and reveal island formation as a key driver of speciation. Mol Biol Evol. 2019;36(4):784–97. [PubMed] [Google Scholar]
4. Challies CW, Burleigh RR. Abundance and breeding distribution of the white-flippered penguin (Eudyptula minor albosignata) on Banks Peninsula, New Zealand. Notornis. 2004;51(1):1–6. [Google Scholar]
5. Grosser S, Rawlence NJ, Anderson CNK, et al. .. Invader or resident? Ancient-DNA reveals rapid species turnover in New Zealand little penguins. Proc Biol Sci. 2016;283(1824):20152879. [PMC free article] [PubMed] [Google Scholar]
6. Mattern T, Wilson K-J. New Zealand penguins – current knowledge and research priorities. A report compiled for Birds New Zealand 2018, http://www.birdsnz.org.nz/wp-content/uploads/2019/06/1904-NZ-Penguin-Research-Priorities-Report-Mattern-Wilson.pdf. Accessed on 11 September 2019. [Google Scholar]
7. Banks J, Van Buren A, Cherel Y, et al. .. Genetic evidence for three species of rockhopper penguins, Eudyptes chrycosome. Polar Biol. 2006;30(1):61–67. [Google Scholar]
8. Frugone M-J, Lowther A, Noll D, et al. .. Contrasting phylogeographic pattern among Eudyptes penguins around the Southern Ocean. Sci Rep. 2018;8(1):17481. [PMC free article] [PubMed] [Google Scholar]
9. Christidis L, Boles WE. Systematics and Taxonomy of Australian Birds. Canberra, Australia: CSIRO; 2008:98. [Google Scholar]
10. Cole TL, Rawlence NJ, Dussex N, et al. .. Ancient DNA of crested penguins: Testing for temporal genetic shifts in the world's most diverse penguin clade. Mol Phylogenet Evol. 2019;131:72–79. [PubMed] [Google Scholar]
11. Frugone M-J, López ME, Segovia NI, et al. .. More than the eye can see: Genomic insights into the drivers of genetic differentiation in Royal/Macaroni penguins across the Southern Ocean. Mol Phylogenet Evol. 2019;139:106563. [PubMed] [Google Scholar]
12. Slack KE, Jones CM, Ando T, et al. .. Early penguin fossils, plus mitochondrial genomes, calibrate avian evolution. Mol Biol Evol. 2006;23(6):1144–55. [PubMed] [Google Scholar]
13. Mayr G, Scofield RP, De Pietri VL, et al. .. A Paleocene penguin from New Zealand substantiates multiple origins of gigantism in fossil Sphenisciformes. Nat Commun. 2017;8(1):1927. [PMC free article] [PubMed] [Google Scholar]
14. Stonehouse B. The general biology and thermal balances of penguins. Adv Ecol Res. 1967;4:131–96. [Google Scholar]
15. Marchant S, Higgins PJ. Handbook of Australian, New Zealand and Antarctic Birds. Vol. 1, Pt. B Melbourne, Australia: Oxford University Press; 1990. [Google Scholar]
16. Boersma PD. Penguins as marine sentinels. Bioscience. 2008;58(7):597–607. [Google Scholar]
17. , Ropert-Coudert Y, Hindell MA, Phillips R, De Broyer C Koubbi P, Griffiths HJ, Raymond B, Udekem d'Acoz Cd', Van de Putte AP, Danis B, David B, Grant S, Gutt J, Held C, Hosie G, Huettmann F, Post A, Ropert-Coudert Y et al.., et al.., Cambridge, Scientific Committee on Antarctic Research; et al. Biogeographic patterns of birds and mammals. In: The Biogeographic Atlas of the Southern Ocean. Scientific Committee on Antarctic Research. 2014:364–87. [Google Scholar]
18. Baker AJ, Pereira SL, Haddrath OP, et al. .. Multiple gene evidence for expansion of extant penguins out of Antarctica due to global cooling. Proc Biol Sci. 2006;273(1582):11–17. [PMC free article] [PubMed] [Google Scholar]
19. Acosta Hospitaleche C, Reguero M, Scarano A. Main pathways in the evolution of the Paleogene Antarctic Sphenisciformes. J South Am Earth Sci. 2013;43:101–11. [Google Scholar]
20. Bertelli S, Giannini NP. A phylogeny of extant penguins (Aves: Sphenisciformes) combining morphology and mitochondrial sequences. Cladistics. 2005;21(3):209–39. [Google Scholar]
21. Garcia Borboroglu P, Boersma PD. Penguins: Natural History and Conservation. Seattle, WA, USA: University of Washington Press; 2013:328. [Google Scholar]
22. Thiébot JB, Cherel Y, Trathan PN, et al. .. Coexistence of oceanic predators on wintering areas explained by population-scale foraging segregation in space or time. Ecology. 2012;93(1):12–130. [PubMed] [Google Scholar]
23. Woehler EJ, Cooper J, Croxall JP, et al. .. A Statistical Assessment of the Status and Trends of Antarctic and Sub-Antarctic Seabirds. Cambridge, UK: Scientific Committee on Antarctic Research; 2011. [Google Scholar]
24. Goldsmith R, Sladen WJ. Temperature regulation of some Antarctic penguins. J Physiol. 1961;157:251–62. [PMC free article] [PubMed] [Google Scholar]
25. Ksepka DT, Ando T. Penguins past, present, and future: trends in the evolution of the Sphenisciformes. In: Dyke G, Kaiser G, eds. Living Dinosaurs. Oxford, UK: Wiley; 2011:155–86. [Google Scholar]
26. Watson M. Report on the Anatomy of the Spheniscidae Collected by HMS Challenger, During the Years 1873–1876. Edinburgh, UK: Neill and Co; 1883. [Google Scholar]
27. Taylor JRE. Thermal insulation of the down and feathers of pygoscelid penguin chicks and the unique properties of penguin feathers. Auk. 1986;103:160–8. [Google Scholar]
28. Sivak JG. The role of a flat cornea in the amphibious behaviour of the blackfoot penguin (Spheniscus demersus). Can J Zool. 1976;54:1341–5. [Google Scholar]
29. Sivak JG, Millodot M. Optical performance of the penguin eye in air and water. J Comp Physiol. 1977;119:241–7. [Google Scholar]
30. Bowmaker JK, Martin GR. Visual pigments and oil droplets in the penguin, Spheniscus humboldti. J Comp Physiol A Neuroethol Sens Neural Behav Physiol. 1985;156:71–77. [Google Scholar]
31. Meister W. Histological structure of the long bones of penguins. Anat Rec. 1962;143:377–87. [PubMed] [Google Scholar]
32. Raikow RJ, Bicanovsky L, Bledsoe AH. Forelimb joint mobility and the evolution of wing-propelled diving in birds. Auk. 1988;105:446–51. [Google Scholar]
33. Schreiweis DO. A comparative study of the appendicular musculature of penguins (Aves: Sphenisciformes). Smithsonian Contrib Zool. 1982;341:1–46. [Google Scholar]
34. Frost PGH, Siegfried WR, Greenwood PJ. Arterio-venous heat exchange systems in the Jackass penguin Spheniscus demersus. J Zool. 1975;175:231–41. [Google Scholar]
35. Groscolas R. Metabolic adaptations to fasting in emperor and king penguins. In: Davis LS, Darby JT, eds. Penguin Biology. San Diego, CA, USA: Academic; 1990:269–96. [Google Scholar]
36. Cherel Y, Gilles J, Handrich Y, Le Maho Y. Nutrient reserve dynamics and energetics during long-term fasting in the king penguin (Aptenodytes patagonicus). J Zool. 1994;234:1–12. [Google Scholar]
37. Groscolas R, Robin JP. Long-term fasting and re-feeding in penguins. Comp Biochem Physiol A Mol Integr Physiol. 2001;128:645–55. [PubMed] [Google Scholar]
38. Gauthier-Clerc M, Le Maho Y, Clerquin Y, et al. .. Seabird reproduction in an unpredictable environment: How King penguins provide their young chicks with food. Mar Ecol Prog Ser. 2002;237:291–300. [Google Scholar]
39. Thouzeau C, Le Maho Y, Froget G, et al. .. Spheniscins, avian β-defensins in preserved stomach contents of the king penguin, Aptenodytes patagonicus. J Biol Chem. 2003;278: 51053–8. [PubMed] [Google Scholar]
40. Thomas DB, Fordyce RE. The heterothermic loophole exploited by penguins. Aust J Zool. 2008;55:317–21. [Google Scholar]
41. Thomas DB, McGoverin CM, McGraw KJ, et al. .. Vibrational spectroscopic analyses of unique yellow feather pigments (spheniscins) in penguins. J Roy Soc Interface. 2013;10(83):20121065. [PMC free article] [PubMed] [Google Scholar]
42. Cairns DK. Plumage colour in pursuit-diving seabirds: Why do penguins wear tuxedos?. Bird Behav. 1986;6(2):58–65. [Google Scholar]
43. Croxall JP. Energy costs of incubation and moult in petrels and penguins. J Anim Ecol. 1982;177–94. [Google Scholar]
44. Barbraud C, Weimerskirch H. Emperor penguins and climate change. Nature. 2001;411(6834):183–6. [PubMed] [Google Scholar]
45. Forcada J, Trathan PN, Reid K, et al. .. Contrasting population changes in sympatric penguin species in association with climate warming. Glob Change Biol. 2006;12(3):411–23. [Google Scholar]
46. Fretwell PT, Trathan PN. Emperors on thin ice: Three years of breeding failure at Halley Bay. Antarct Sci. 2019;31(3):133–8. [Google Scholar]
47. Trivelpiece WZ, Hinke JT, Miller AK, et al. .. Variability in krill biomass links harvesting and climate warming to penguin population changes in Antarctica. Proc Natl Acad Sci U S A. 2011;108(18):7625–8. [PMC free article] [PubMed] [Google Scholar]
48. Lynch HJ, Naveen R, Trathan PN, et al. .. Spatially integrated assessment reveals widespread changes in penguin populations on the Antarctic Peninsula. Ecology. 2012;93(6):1367–77. [PubMed] [Google Scholar]
49. Mattern T, Meyer S, Ellenberg U, et al. .. Quantifying climate change impacts emphasises the importance of managing regional threats in the endangered yellow-eyed penguin. PeerJ. 2017;5:e3272. [PMC free article] [PubMed] [Google Scholar]
50. Heerah K, Dias MP, Delord K, et al. .. Important areas and conservation sites for a community of globally threatened marine predators of the Southern Indian Ocean. Biol Conserv. 2019;234(1):192–201. [Google Scholar]
51. Li C, Zhang Y, Li J, et al. .. Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment. Gigascience. 2014;3(1):27. [PMC free article] [PubMed] [Google Scholar]
52. Trucchi E, Gratton P, Whittington JD, et al. .. King penguin demography since the last glaciation inferred from genome-wide data. Proc Biol Sci. 2016;281(1787):20140528. [PMC free article] [PubMed] [Google Scholar]
53. Cristofari R, Bertorelle G, Ancel A, et al. .. Full circumpolar migration ensures evolutionary utility in the Emperor penguin. Nat Commun. 2016;7:11842. [PMC free article] [PubMed] [Google Scholar]
54. Cristofari R, Liu X, Bonadonna F, et al. .. Climate-driven range shifts of the king penguin in a fragmented ecosystem. Nat Clim Change. 2018;8(3):245. [Google Scholar]
55. Le Bohec C, Durant JM, Gauthier-Clerc M, et al. .. King penguin population threatened by Southern Ocean warming. Proc Biol Sci. 2008;105(7):2493–7. [PMC free article] [PubMed] [Google Scholar]
56. Jenouvrier S, Caswell H, Barbraud C, et al. .. Demographic models and IPCC climate projections predict the decline of an emperor penguin population. Proc Natl Acad Sci U S A. 2009;106(6):1844–7. [PMC free article] [PubMed] [Google Scholar]
57. Jenouvrier S, Holland M, Stroeve J, et al. .. Projected continent-wide declines of the emperor penguin under climate change. Nat Clim Change. 2014;4(8):715–8. [Google Scholar]
58. Boessenkool S, Austin JA, Worthy TH, et al. .. Relict or colonizer? Extinction and range expansion of penguins in southern New Zealand. Proc Biol Sci. 2008;276(1658): 815–21. [PMC free article] [PubMed] [Google Scholar]
59. Clucas GV, Dunn MJ, Dyke G, et al. .. A reversal of fortunes: Climate change ‘winners’ and ‘losers’ in Antarctic Peninsula penguins. Sci Rep. 2014;4:5024. [PMC free article] [PubMed] [Google Scholar]
60. Younger JL, Clucas GV, Kooyman G, et al. .. Too much of a good thing; sea ice extent may have forced emperor penguins into refugia during the last glacial maximum. Glob Change Biol. 2015;21(6):2215–26. [PubMed] [Google Scholar]
61. Subramanian S, Beans-Picón G, Swaminathan SK, et al. .. Evidence for a recent origin of penguins. Biol Lett. 2013;9(6):20130748. [PMC free article] [PubMed] [Google Scholar]
62. Gavryushkina A, Heath TA, Ksepka DT, et al. .. Bayesian total evidence dating reveals the recent crown radiation of penguins. Syst Biol. 2017;66(1):57–73. [PMC free article] [PubMed] [Google Scholar]
63. Grosser S, Burridge CP, Peucker AJ, et al. .. Coalescent modelling suggests recent secondary-contact of cryptic penguin species. PLoS One. 2015;10(12):e0144966. [PMC free article] [PubMed] [Google Scholar]
64. Vianna JA, Noll D, Mura-Jornet I, et al. .. Comparative genome-wide polymorphic microsatellite markers in Antarctic penguins through next generation sequencing. Genet Mol Biol. 2017;40(3):676–87. [PMC free article] [PubMed] [Google Scholar]
65. Ramos B, González-Acuña D, Loyola DE, et al. .. Landscape genomics: natural selection drives the evolution of mitogenome in penguins. BMC Genomics. 2018;19:53. [PMC free article] [PubMed] [Google Scholar]
66. Clucas GV, Younger JL, Kao D, et al. .. Dispersal in the sub-Antarctic: King penguins show remarkably little population genetic differentiation across their range. BMC Evol Biol. 2016;16(1):211. [PMC free article] [PubMed] [Google Scholar]
67. Younger JL, Clucas GV, Kao D, et al. .. The challenges of detecting subtle population structure and its importance for the conservation of Emperor penguins. Mol Ecol. 2017;26(15):3883–97. [PubMed] [Google Scholar]
68. Clucas GV, Younger JL, Kao D, et al. .. Comparative population genomics reveals key barriers to dispersal in Southern Ocean penguins. Mol Ecol. 2018;27(23):4680–97. [PubMed] [Google Scholar]
69. Younger J, Emmerson L, Southwell C, et al. .. Proliferation of East Antarctic Adélie penguins in response to historical deglaciation. BMC Evol Biol. 2015;15(1):236. [PMC free article] [PubMed] [Google Scholar]
70. Zhao H, Li J, Zhang J. Molecular evidence for the loss of three basic tastes in penguins. Curr Biol. 2015;25(4):R141–2. [PMC free article] [PubMed] [Google Scholar]
71. Zhang G, Li C, Li Q, et al. .. Comparative genomics reveals insights into avian genome evolution and adaptation. Science. 2014;346(6215):1311–20. [PMC free article] [PubMed] [Google Scholar]
72. Borges R, Khan I, Johnson WE, et al. .. Gene loss, adaptive evolution and the co-evolution of plumage coloration genes with opsins in birds. BMC Genomics. 2015;16:751. [PMC free article] [PubMed] [Google Scholar]
73. Jarvis ED, Mirarab S, Aberer AJ, et al. .. Whole genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346(6215):1320–31. [PMC free article] [PubMed] [Google Scholar]
74. Grosser S, Scofield RP, Waters JM. Multivariate skeletal analyses support a taxonomic distinction between New Zealand and Australian Eudyptula penguins (Sphenisciformes: Spheniscidae). Emu. 2017;177:176–283. [Google Scholar]
75. Bi K, Linderoth T, Vanderpool D, et al. .. Unlocking the vault: Next‐generation museum population genomics. Mol Ecol. 2013;22(24):6018–32. [PMC free article] [PubMed] [Google Scholar]
76. Stiller J, Zhang G. Comparative phylogenomics, a stepping stone for bird biodiversity studies. Diversity. 2019;11(7):115. [Google Scholar]
77. Edmunds S.(2018): HiSeq 4000 sequencing protocol. protocols.io 10.17504/protocols.io.q58dy9w. [CrossRef] [Google Scholar]
78. Huang J, Liang X, Xuan Y, et al. .. A reference human genome dataset of the BGISEQ-500 sequencer. Gigascience. 2017;6(5):1–9. [PMC free article] [PubMed] [Google Scholar]
79. Teh BT, Lim K, Yong CH, et al. .. The draft genome of tropical fruit durian (Durio zibethinus). Nat Genet. 2017;49:1633–41. [PubMed] [Google Scholar]
80. Heydari M, Miclotte G, Demeester P, et al. .. Evaluation of the impact of Illumina error correction tools on de novo genome assembly. BMC Bioinformatics. 2017;18:374. [PMC free article] [PubMed] [Google Scholar]
81. Luo R, Liu B, Xie Y, et al. .. SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18. [PMC free article] [PubMed] [Google Scholar]
82. Gnerre S, Maccallum I, Przybylski D, et al. .. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci U S A. 2011;108(4):1513–8. [PMC free article] [PubMed] [Google Scholar]
83. Weisenfeld NI, Kumar V, Shah P, et al. .. Direct determination of diploid genome sequences. Genome Res. 2017;5: 757–67. [PMC free article] [PubMed] [Google Scholar]
84. Simão FA, Waterhouse RM, Ioannidis P, et al. .. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. [PubMed] [Google Scholar]
85. Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0. 2013–2015.RepeatMasker Home Page. http://www.repeatmasker.org. Accessed on 1 June 2019. [Google Scholar]
86. RepeatMasker. RepeatMasker Home Page. http://www.repeatmasker.org. Accessed on 1 June 2019. [Google Scholar]
87. Benson G. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–80. [PMC free article] [PubMed] [Google Scholar]
88. Smit AFA, Hubley RR, Green PR. Open-1.0. 2008–2015. Seattle, WA, USA: Institute for Systems Biology; 2008. [Google Scholar]
89. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6(1):11. [PMC free article] [PubMed] [Google Scholar]
90. Bird 10,000 Genomes (B10K) Project. http://b10k.genomics.cn. [Google Scholar]
92. Altschul SF, Gish W, Miller W, et al. .. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. [PubMed] [Google Scholar]
93. She R, Chu JS, Wang K, et al. .. GenBlastA: Enabling BLAST to identify homologous gene sequences. Genome Res. 2009;19(1):143–9. [PMC free article] [PubMed] [Google Scholar]
94. Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 2004;14(5):988–95. [PMC free article] [PubMed] [Google Scholar]
95. Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7. [PMC free article] [PubMed] [Google Scholar]
96. Wheeler DL, Barrett T, Benson DA, et al. .. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006;14(35, suppl 1):D5–12. [PMC free article] [PubMed] [Google Scholar]
97. Silva GG, Dutilh BE, Matthews TD, et al. .. Combining de novo and reference-guided assembly with scaffold_builder. Source Code Biol Med. 2013;8(1):23. [PMC free article] [PubMed] [Google Scholar]
98. Grabherr MG, Haas BJ, Yassour M, et al. .. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644. [PMC free article] [PubMed] [Google Scholar]
99. Boeckmann B, Bairoch A, Apweiler R, et al. .. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31(1):365–70. [PMC free article] [PubMed] [Google Scholar]
100. Jones P, Binns D, Chang HY, et al. .. InterProScan 5: Genome-scale protein function classification. Bioinformatics. 2014;30(9):1236–40. [PMC free article] [PubMed] [Google Scholar]
101. Kanehisa M, Sato Y, Furumichi M, et al. .. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 2018;47(D1):D590–5. [PMC free article] [PubMed] [Google Scholar]
102. Ashburner M, Ball CA, Blake JA, et al. .. Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25(1): 25. [PMC free article] [PubMed] [Google Scholar]
103. Kozlov AM, Aberer AJ, Stamatakis A. ExaML version 3: A tool for phylogenomic analyses on supercomputers. Bioinformatics. 2015;31(15):2577–9. [PMC free article] [PubMed] [Google Scholar]
104. Liu L, Yu L, Edwards SV. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol. 2010;10(1):302. [PMC free article] [PubMed] [Google Scholar]
105. Zhang C, Rabiee M, Sayyari E, et al. .. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics. 2018;19(6): 153. [PMC free article] [PubMed] [Google Scholar]
106. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. [PMC free article] [PubMed] [Google Scholar]
107. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15): 1972–3. [PMC free article] [PubMed] [Google Scholar]
108. Sackton TB, Grayson P, Cloutier A, et al. .. Convergent regulatory evolution and loss of flight in paleognathous birds. Science. 2019;364(6435):74–8. [PubMed] [Google Scholar]
109. Stamatakis A. RAxML version 8: A tool for phylogenetic analysis and post analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3. [PMC free article] [PubMed] [Google Scholar]
110. Paradis E, Claude J, Strimmer K. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20(2):289–90. [PubMed] [Google Scholar]
111. Tipene‐Matua B, Henaghan M.. Establishing a Māori ethical framework for genetic research with Māori. In: Henaghan M, ed. Genes, Society and the Future. Dunedin, New Zealand: Human Genome Research Project; 2007: 1–44. [Google Scholar]
112. Wilcox PL, Charity JA, Roberts MR, et al. .. A values‐based process for cross‐cultural dialogue between scientists and Māori. J R Soc N Z. 2008;38:215–27. [Google Scholar]
113. Hudson M, Milne M, Reynolds P, et al. .. Te Ara Tika Guidelines for Māori research ethics: A framework for researchers and ethics committee members, New Zealand, Health Council of New Zealand; 2010. [Google Scholar]
114. Galla SJ, Buckley TR, Elshire R, et al. .. Building strong relationships between conservation genetics and primary industry leads to mutually beneficial genomic advances. Mol Ecol. 2016;25(21):5267–81. [PubMed] [Google Scholar]
115. New Zealand Biodiversity Action Plan 2016 – 2020. Department of Conservation, Department of Conservation, Wellington: 2016: ISBN: 978-0-478-15095-7. [Google Scholar]
116. Waitangi Tribunal. http://www.waitangitribunal.govt.nz/. Accessed on 27 July 2019 [Google Scholar]
118. Wong PB, Wiley EO, Johnson WE, et al. .. Tissue sampling methods and standards for vertebrate genomics. Gigascience. 2012;1(1):8. [PMC free article] [PubMed] [Google Scholar]
119. New Zealand Department of Conservation. Iwi/hapū/whānau consultation. https://www.doc.govt.nz/get-involved/apply-for-permits/iwi-consultation/. Accessed on 27 July 2019. [Google Scholar]
120. Greig E. The Māori right to development and new forms of property. Ph.D. Thesis. University of Otago; 2010. [Google Scholar]
121. National Human Genome Institute. Reaffirmation and Extension of NHGRI Rapid Data Release Policies: Large-scale Sequencing and Other Community Resource Projects. https://www.genome.gov/10506537/reaffirmation-and-extension-of-nhgri-rapid-data-release-policies. Accessed on 27 July 2019. [Google Scholar]
122. Toronto International Data Release Workshop Authors. Prepublication data sharing. Nature. 2009;461:168–70. [PMC free article] [PubMed] [Google Scholar]
123. Lindblad-Toh K, Garber M, Zuk O, et al. .. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476–82. [PMC free article] [PubMed] [Google Scholar]
124. i5K Consortium. The i5K Initiative: Advancing arthropod genomics for knowledge, human health, agriculture, and the environment. J Hered. 2013;104(5): 500–600. [PMC free article] [PubMed] [Google Scholar]
125. Koepfli KP, Paten BGenome 10K Community of Scientists, et al. ., Genome 10K Community of Scientists The Genome 10K Project: A way forward. Annu Rev Anim Biosci. 2015;3(1):57–111. [PMC free article] [PubMed] [Google Scholar]
126. Wang Y, Zhang C, Wang N, et al. .. Genetic basis of ruminant headgear and rapid antler regeneration. Science. 2019;364(6446):eaav6335. [PubMed] [Google Scholar]
127. Pan H, Cole T, Bi X, et al. .. High-coverage genomes of all extant penguin taxa. GigaScience Database. 2019. 10.5524/100649. [CrossRef] [Google Scholar]

Articles from GigaScience are provided here courtesy of Oxford University Press