Parvovirus dark matter in the cloaca of wild birds

Abstract With the development of viral metagenomics and next-generation sequencing technology, more and more novel parvoviruses have been identified in recent years, including even entirely new lineages. The Parvoviridae family includes a different group of viruses that can infect a wide variety of animals. In this study, systematic analysis was performed to identify the “dark matter” (datasets that cannot be easily attributed to known viruses) of parvoviruses and to explore their genetic diversity from wild birds’ cloacal swab samples. We have tentatively defined this parvovirus “dark matter” as a highly divergent lineage in the Parvoviridae family. All parvoviruses showed several characteristics, including 2 major protein-coding genes and similar genome lengths. Moreover, we observed that the novel parvo-like viruses share similar genome organizations to most viruses in Parvoviridae but could not clustered with the established subfamilies in phylogenetic analysis. We also found some new members associated with the Bidnaviridae family, which may be derived from parvovirus. This suggests that systematic analysis of domestic and wild animal samples is necessary to explore the genetic diversity of parvoviruses and to mine for more of this potential dark matter.


Introduction
Line 55. Elaborate on what is known about pathology, prevalence, etc., of parvoviruses. =OK, We describe the pathogenicity of GPV and MDPV, and describe the prevalence and transmission of parvoviruses. Please see lines 65-74.
Line 57. The threshold for species demarcation is 85%, what is the threshold for genus demarcation? =NS1 proteins of members of the same genus should share at least 35-40% amino acid sequence identity with a coverage of >80% between any two members. Please see lines 53-60.
Materials and methods Line 73. Change "vigorously" to the actual speed used. =The "vigorously" have been changed to "1,800 rpm ". Please see line 92.
Line 74. The centrifuge speed is incorrect. =The correct centrifugal force is "15,000×ｇ". Please see line 93.
Line 75. "Sample collection and preparation" section does not properly explain the amount of fecal content represented in each one of the supernatants collected after preparation. = Sample pools were added about 0.1mL supernatant of each cloacal swab specimens from the same bird species. Details of each pool can be found in Supplementary Table 1. Line 88: Please describe the threshold that were used to define "low-sequencing-quality reads". ==Low-sequencing-quality reads, including low quality bases/reads, tag sequences, duplicates, etc. In NGS technology, the qualities of bases on most sequencing platforms will degrade as the run progresses, so it is common to see the quality of base calls falling towards the end of a read. Our research used 10 as the cutoff value to define "low-sequencing-quality reads".
Line 89. Why did the authors set the threshold to 10? Normally this value is 20. =Our previous research used 10 as the cutoff value. Thank you for your professional reminder. We will use 20 as the cutoff value in future studies.
Line 91. why did the adapters get removed at the end of the bioinformatics pipeline and not after removing the barcodes? =We generally remove adapters after trimming low-sequencing-quality tails, because we find that removing adapters before trimming low-sequencing-quality tails will cause more errors in de novo assembly.
Line 119. The phylogenetic analysis is missing a few important details. For instance, what does a Bayesian prior set? Were recombinant events investigated within the whole genome? =We use the parameter "Aamodelpr" to set the rate matrix for amino acid data. Please see lines 148-150. No recombination events were found in the 170 new viral genomes identified in this study.
How to evaluate the best amino acid substitution model, and what amino acid substitution model was used to construct phylogenetic tree? =In phylogenetic analysis, we used MrBayes to integrate over a predetermined set of fixed rate matrices and then summarize the MCMC samples and calculate the posterior probability estimate for each of these models. In our study, the particular posterior probability overwhelmingly supports the blosum model, so this maybe the only model sampled after the burn-in phase. Figure 1A, "unclassified RNA viruses-Shi M.2016" is not a proper taxa. Does it mean these RNA viruses are highly similar to the unclassified RNA viruses reported by Shi M? = Thank you for your professional advice. We carefully reviewed the virus taxonomic information and replotted stacked bar graphs to show the composition of individual virus families in bird cloacal sample. Please see Fig.1A.

Results
Line 159. The conserved PLA2 motif of VP1 was found and stated within the manuscript. Were the authors able to find conserved domains in NS1? Please refer to manuscript above for guidance with this motif. = In NS1, several conserved domains were identified, including a replication initiator domain (xxHxHxxxxx), an SF3 helicase domain with an ATP-or GTP-binding Walker A loop (GxxxxGKT), Walker B loop (xxxxEE), and Walker B' loop (KxxxxGxxxxxxxK). These conservative domains are shown in Fig.2B.
Please indicate in the manuscript whether recombination events have been found. Birds could travel long distance and are major vectors for virus spread. Did they find any highly similar parvoviruses (potential transmission) among birds from the same habitat/location, and among birds from different provinces? This could be mentioned and discussed. =No recombination events were found in the 170 new viral genomes identified in this study. Many parvoviruses were found among different birds at MES mountain, such as MW046591, MW046463 and MW046598, which grouped as unclassified Parvoviridae. Some highly similar viruses were also found in birds from different regions, such as strain fcc107par07 (MW046633) and wiw119par01 (MW046604), both belonging to the subfamily Densovirinae, were found in birds from Jilin and Heilongjiang provinces ( Supplementary Fig. 3). Future studies are needed to evaluate their potential for spillover to other species to better understand the risks to human health. Please see lines 174-184.
In the results, archaea, bacteria, and phage … were removed. What the percentage of viruses reads before remove not viruses? And what are the proportions of RNA and DNA viruses in this study? = In total, the 228 libraries generated 483,194,686 sequence reads. Virus sequence reads accounted for 9.62%. The total number of reads in each library was added in column G of Supplementary Table 1. As shown in Figure 1A, the proportion of RNA virus reads and DNA virus reads was 49.06% and 34.38%, respectively.

Discussion
Line 252. The viruses identified in this study were from cloacal swabs, please discuss whether or not cloacal swab is associated with infection within the birds. =Thank you for your suggestion, please see lines 320 to 325.