Scheduled Seminars on 9/6/2011

David Kristensen at 11:00  Edit  Reschedule  Delete

Systems Biology Of Phage Proteins And New Dimensions Of The Virus World Discovered Through Metagenomics
Phages are extremely active players in the global ecosystem, as well as the most abundant organisms on planet Earth, but much remains unknown about how these viruses function in their natural environments. Advances in full-genome sequencing technologies have generated a large collection of hundreds of genomes, which allows deep insight into their genetic evolution, and metagenomics technologies seem to promise more rewarding glimpses into their lifecycles and community structures.
Recently we developed an automated approach to assemble a collection of orthologous gene clusters of double-stranded DNA phages (Phage Orthologous Groups, or POGs). This approach follows the well-known Clusters of Orthologous Groups (COGs) framework to identify sets of orthologs by examining top-ranked sequence similarities between proteins in complete genomes without the use of arbitrary similarity cutoffs, and thus represents a natural system for fast-evolving and slow-evolving proteins alike. This automated approach was designed to keep pace with the rapid and accelerating pace of growth of full-genome information from sequencing projects. In particular, we employ a faster graph-theoretical COG-building algorithm that vastly improves our ability to deal with larger numbers of genomes (N) by reducing the worst-case complexity from O(N6) to O(N3*log N). This system encompasses >2,000 groups from the almost 600 known phage genomes deposited at NCBI, and is in the process of being expanded to also include single-stranded DNA and single- and double-stranded RNA phages.
Using this approach, we found that more than half of the POGs have no or very few evolutionary connections to their cellular hosts, indicating that dsDNA phages combine the ability to share and transduce the host genes with the ability to maintain a large fraction of unique, phage-specific genes. Such genes are useful for targeted research strategies, such as diagnostic indicators, fundamental units of systems biology studies, etc. We employed this set of dsDNA phage-specific genes to probe the composition of several oceanic metagenome samples. Although virus-enriched samples indeed contain more homologous matches to phage-specific POGs compared to a full metagenomic sample also containing cellular DNA, the total gene repertoire of the marine DNA virome is dramatically different from that of known bacteriophages. In particular, it is dominated by rare genes, many of which might be contained within virus-like entities such as gene transfer agents (GTAs) rather than true viruses. This result might suggest a necessity for a radical re-thinking of what constitutes the "virus world", since the major component of viromes could be GTAs that encapsidate bacterial and archaeal genes.
We are also using the POGs to investigate the evolutionary connections between the viruses that infect prokaryotes and those that infect eukaryotes, which are not currently thought to be related by descent from a single common ancestor. Defining the nature and scope of these relationships will shed light on the ancient origins of these two broad groups of viruses, and may have implications for the origins of cellular life forms as well.

Schedule Another Seminar on 9/6/2011