Format
Sort by
Items per page

Send to

Choose Destination

Search results

Items: 1 to 20 of 27

1.
Bioinformatics. 2018 Nov 5. doi: 10.1093/bioinformatics/bty922. [Epub ahead of print]

High-Complexity Regions in Mammalian Genomes are Enriched for Developmental Genes.

Author information

1
RWTH Aachen University.
2
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany.
3
Department of Mathematical Stochastics, Freiburg University.
4
Buchenallee 8, 24306 Plön, Germany.

Abstract

Motivation:

Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a previously published variant of the Lempel-Ziv complexity, the match complexity, Cm, and augment it by deriving its null distribution for random sequences. We then apply Cm to the human and mouse genomes to investigate the relationship between sequence complexity and function.

Results:

We implemented Cm in the program macle and show through simulation that the newly derived null distribution of Cm is accurate. This allows us to delineate high-complexity regions in the human and mouse genomes. Using our program macle2go, we find that these regions are two-fold enriched for genes. Moreover, the genes contained in these regions are more than 10-fold enriched for developmental functions.

Availability:

Source code for macle and macle2go is available from www.github.com/evolbioinf/macle and www.github.com/evolbioinf/macle2go, respectively; Cm browser tracks from guanine.evolbio.mgp.de/complexity.

2.
Bioinformatics. 2016 Aug 15;32(16):2554-5. doi: 10.1093/bioinformatics/btw195. Epub 2016 Apr 13.

hotspot: software to support sperm-typing for investigating recombination hotspots.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany.

Abstract

MOTIVATION:

In many organisms, including humans, recombination clusters within recombination hotspots. The standard method for de novo detection of recombinants at hotspots is sperm typing. This relies on allele-specific PCR at single nucleotide polymorphisms. Designing allele-specific primers by hand is time-consuming. We have therefore written a package to support hotspot detection and analysis.

RESULTS:

hotspot consists of four programs: asp looks up SNPs and designs allele-specific primers; aso constructs allele-specific oligos for mapping recombinants; xov implements a maximum-likelihood method for estimating the crossover rate; six, finally, simulates typing data.

AVAILABILITY AND IMPLEMENTATION:

hotspot is written in C. Sources are freely available under the GNU General Public License from http://github.com/evolbioinf/hotspot/

CONTACT:

haubold@evolbio.mpg.de

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
27153632
PMCID:
PMC4978934
DOI:
10.1093/bioinformatics/btw195
[Indexed for MEDLINE]
Free PMC Article
Icon for Silverchair Information Systems Icon for PubMed Central
3.
Life (Basel). 2016 Mar 7;6(1). pii: E11. doi: 10.3390/life6010011.

Support Values for Genome Phylogenies.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany. kloetzl@evolbio.mpg.de.
2
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany. haubold@evolbio.mpg.de.

Abstract

We have recently developed a distance metric for efficiently estimating the number of substitutions per site between unaligned genome sequences. These substitution rates are called "anchor distances" and can be used for phylogeny reconstruction. Most phylogenies come with bootstrap support values, which are computed by resampling with replacement columns of homologous residues from the original alignment. Unfortunately, this method cannot be applied to anchor distances, as they are based on approximate pairwise local alignments rather than the full multiple sequence alignment necessary for the classical bootstrap. We explore two alternatives: pairwise bootstrap and quartet analysis, which we compare to classical bootstrap. With simulated sequences and 53 human primate mitochondrial genomes, pairwise bootstrap gives better results than quartet analysis. However, when applied to 29 E. coli genomes, quartet analysis comes closer to the classical bootstrap.

KEYWORDS:

bootstrap; distance matrix; phylogeny; quartet analysis; support value

4.
PLoS One. 2015 Aug 12;10(8):e0133988. doi: 10.1371/journal.pone.0133988. eCollection 2015.

Social exclusion changes histone modifications H3K4me3 and H3K27ac in liver tissue of wild house mice.

Author information

1
Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
2
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany.

Abstract

Wild house mice form social hierarchies with aggressive males defending territories, in which females, young mice and submissive adult males share nests. In contrast, socially excluded males are barred from breeding groups, have numerous bite wounds and patches of thinning fur. Since their feeding times are often disrupted, we investigated whether social exclusion leads to changes in epigenetic marks of metabolic genes in liver tissue. We used chromatin immunoprecipitation and quantitative PCR to measure enrichment of two activating histone marks at 15 candidate loci. The epigenetic profiles of healthy males sampled from nest boxes differed significantly from the profiles of ostracized males caught outside of nests and showing bite wounds indicative of social exclusion. Enrichment of histone-3 lysine-4 trimethylation (H3K4me3) changed significantly at genes Cyp4a14, Gapdh, Nr3c1, Pck1, Ppara, and Sqle. Changes at histone-3 lysine-27 acetylation (H3K27ac) marks were detected at genes Fasn, Nr3c1, and Plin5. A principal components analysis separated the socialized from the ostracized mice. This was independent of body weight for the H3K4me3 mark, and partially dependent for H3K27ac. There was no separation, however, between healthy males that had been sampled from two different nests. A hierarchical cluster analysis also separated the two phenotypes, which was independent of body weight for both markers. Our study shows that a period of social exclusion during adult life leads to quantitative changes in histone modification patterns in mouse liver tissue. Similar epigenetic changes might occur during the development of stress-induced metabolic disorders in humans.

PMID:
26267652
PMCID:
PMC4534140
DOI:
10.1371/journal.pone.0133988
[Indexed for MEDLINE]
Free PMC Article
Icon for Public Library of Science Icon for PubMed Central
5.
Bioinformatics. 2015 Apr 15;31(8):1169-75. doi: 10.1093/bioinformatics/btu815. Epub 2014 Dec 10.

andi: fast and accurate estimation of evolutionary distances between closely related genomes.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, 24306 Plön, Germany, Institue for Neuro- and Bioinformatics, Lübeck University, 23562 Lübeck, Germany and Mathematical Stochastics, Mathematical Institute, Freiburg University, Germany.
2
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, 24306 Plön, Germany, Institue for Neuro- and Bioinformatics, Lübeck University, 23562 Lübeck, Germany and Mathematical Stochastics, Mathematical Institute, Freiburg University, Germany Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, 24306 Plön, Germany, Institue for Neuro- and Bioinformatics, Lübeck University, 23562 Lübeck, Germany and Mathematical Stochastics, Mathematical Institute, Freiburg University, Germany.

Abstract

MOTIVATION:

A standard approach to classifying sets of genomes is to calculate their pairwise distances. This is difficult for large samples. We have therefore developed an algorithm for rapidly computing the evolutionary distances between closely related genomes.

RESULTS:

Our distance measure is based on ungapped local alignments that we anchor through pairs of maximal unique matches of a minimum length. These exact matches can be looked up efficiently using enhanced suffix arrays and our implementation requires approximately only 1 s and 45 MB RAM/Mbase analysed. The pairing of matches distinguishes non-homologous from homologous regions leading to accurate distance estimation. We show this by analysing simulated data and genome samples ranging from 29 Escherichia coli/Shigella genomes to 3085 genomes of Streptococcus pneumoniae.

AVAILABILITY AND IMPLEMENTATION:

We have implemented the computation of anchor distances in the multithreaded UNIX command-line program andi for ANchor DIstances. C sources and documentation are posted at http://github.com/evolbioinf/andi/

CONTACT:

haubold@evolbio.mpg.de

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
25504847
DOI:
10.1093/bioinformatics/btu815
[Indexed for MEDLINE]
Icon for Silverchair Information Systems
6.
Genetics. 2014 Sep;198(1):269-81. doi: 10.1534/genetics.114.166843. Epub 2014 Jun 19.

Genome-wide linkage-disequilibrium profiles from single individuals.

Author information

1
Department of Biology, Indiana University, Bloomington, Indiana 47401.
2
Faculty of Mathematics and Physics, University of Freiburg, Freiburg 79104, Germany.
3
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön 24306, Germany.

Abstract

Although the analysis of linkage disequilibrium (LD) plays a central role in many areas of population genetics, the sampling variance of LD is known to be very large with high sensitivity to numbers of nucleotide sites and individuals sampled. Here we show that a genome-wide analysis of the distribution of heterozygous sites within a single diploid genome can yield highly informative patterns of LD as a function of physical distance. The proposed statistic, the correlation of zygosity, is closely related to the conventional population-level measure of LD, but is agnostic with respect to allele frequencies and hence likely less prone to outlier artifacts. Application of the method to several vertebrate species leads to the conclusion that >80% of recombination events are typically resolved by gene-conversion-like processes unaccompanied by crossovers, with the average lengths of conversion patches being on the order of one to several kilobases in length. Thus, contrary to common assumptions, the recombination rate between sites does not scale linearly with distance, often even up to distances of 100 kb. In addition, the amount of LD between sites separated by <200 bp is uniformly much greater than can be explained by the conventional neutral model, possibly because of the nonindependent origin of mutations within this spatial scale. These results raise questions about the application of conventional population-genetic interpretations to LD on short spatial scales and also about the use of spatial patterns of LD to infer demographic histories.

KEYWORDS:

gene conversion; linkage disequilibrium; population genomics; recombination

PMID:
24948778
PMCID:
PMC4174938
DOI:
10.1534/genetics.114.166843
[Indexed for MEDLINE]
Free PMC Article
Icon for HighWire Icon for PubMed Central
7.
PLoS One. 2014 May 21;9(5):e97568. doi: 10.1371/journal.pone.0097568. eCollection 2014.

Genome-wide quantitative analysis of histone H3 lysine 4 trimethylation in wild house mouse liver: environmental change causes epigenetic plasticity.

Author information

1
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany.
2
Institute for Evolution and Ecology, University of Tübingen, Tübingen, Germany.
3
Cologne Center for Genomics, University of Cologne, Köln, Germany.

Abstract

In mammals, exposure to toxic or disease-causing environments can change epigenetic marks that are inherited independently of the intrauterine environment. Such inheritance of molecular phenotypes may be adaptive. However, studies demonstrating molecular evidence for epigenetic inheritance have so far relied on extreme treatments, and are confined to inbred animals. We therefore investigated whether epigenomic changes could be detected after a non-drastic change in the environment of an outbred organism. We kept two populations of wild-caught house mice (Mus musculus domesticus) for several generations in semi-natural enclosures on either standard diet and light cycle, or on an energy-enriched diet with longer daylight to simulate summer. As epigenetic marker for active chromatin we quantified genome-wide histone-3 lysine-4 trimethylation (H3K4me3) from liver samples by chromatin immunoprecipitation and high-throughput sequencing as well as by quantitative polymerase chain reaction. The treatment caused a significant increase of H3K4me3 at metabolic genes such as lipid and cholesterol regulators, monooxygenases, and a bile acid transporter. In addition, genes involved in immune processes, cell cycle, and transcription and translation processes were also differently marked. When we transferred young mice of both populations to cages and bred them under standard conditions, most of the H3K4me3 differences were lost. The few loci with stable H3K4me3 changes did not cluster in metabolic functional categories. This is, to our knowledge, the first quantitative study of an epigenetic marker in an outbred mammalian organism. We demonstrate genome-wide epigenetic plasticity in response to a realistic environmental stimulus. In contrast to disease models, the bulk of the epigenomic changes we observed were not heritable.

PMID:
24849289
PMCID:
PMC4029994
DOI:
10.1371/journal.pone.0097568
[Indexed for MEDLINE]
Free PMC Article
Icon for Public Library of Science Icon for PubMed Central
8.
Brief Bioinform. 2014 May;15(3):407-18. doi: 10.1093/bib/bbt083. Epub 2013 Nov 29.

Alignment-free phylogenetics and population genetics.

Author information

1
Corresponding author. Bernhard Haubold. haubold@evolbio.mpg.de.

Abstract

Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on comparative data, today usually DNA sequences. These have become so plentiful that alignment-free sequence comparison is of growing importance in the race between scientists and sequencing machines. In phylogenetics, efficient distance computation is the major contribution of alignment-free methods. A distance measure should reflect the number of substitutions per site, which underlies classical alignment-based phylogeny reconstruction. Alignment-free distance measures are either based on word counts or on match lengths, and I apply examples of both approaches to simulated and real data to assess their accuracy and efficiency. While phylogeny reconstruction is based on the number of substitutions, in population genetics, the distribution of mutations along a sequence is also considered. This distribution can be explored by match lengths, thus opening the prospect of alignment-free population genomics.

KEYWORDS:

match length; mutation distance; phylogenetics; population genetics; suffix tree

PMID:
24291823
DOI:
10.1093/bib/bbt083
[Indexed for MEDLINE]
Icon for Silverchair Information Systems
9.
Bioinformatics. 2013 Dec 15;29(24):3121-7. doi: 10.1093/bioinformatics/btt550. Epub 2013 Sep 23.

An alignment-free test for recombination.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, 24306 Plön, Institute for Neuro- and Bioinformatics, Lübeck University, 23562 Lübeck and Mathematical Stochastics, Mathematical Institute, Freiburg University, 79104 Freiburg, Germany.

Abstract

MOTIVATION:

Why recombination? is one of the central questions in biology. This has led to a host of methods for quantifying recombination from sequence data. These methods are usually based on aligned DNA sequences. Here, we propose an efficient alignment-free alternative.

RESULTS:

Our method is based on the distribution of match lengths, which we look up using enhanced suffix arrays. By eliminating the alignment step, the test becomes fast enough for application to whole bacterial genomes. Using simulations we show that our test has similar power as established tests when applied to long pairs of sequences. When applied to 58 genomes of Escherichia coli, we pick up the strongest recombination signal from a 125 kb horizontal gene transfer engineered 20 years ago.

AVAILABILITY AND IMPLEMENTATION:

We have implemented our method in the command-line program rush. Its C sources and documentation are available under the GNU General Public License from http://guanine.evolbio.mpg.de/rush/.

PMID:
24064419
PMCID:
PMC5994939
DOI:
10.1093/bioinformatics/btt550
[Indexed for MEDLINE]
Free PMC Article
Icon for Silverchair Information Systems Icon for PubMed Central
10.
PLoS Pathog. 2013;9(7):e1003503. doi: 10.1371/journal.ppat.1003503. Epub 2013 Jul 25.

Genomic analysis of the Kiwifruit pathogen Pseudomonas syringae pv. actinidiae provides insight into the origins of an emergent plant disease.

Author information

1
New Zealand Institute for Advanced Study and Allan Wilson Centre, Massey University, Auckland, New Zealand.

Erratum in

  • PLoS Pathog. 2013;9(9). doi:10.1371/annotation/af157ddc-200a-4105-b243-3f01251cc677. Vanneste, Joel [corrected to Vanneste, Joel L].

Abstract

The origins of crop diseases are linked to domestication of plants. Most crops were domesticated centuries--even millennia--ago, thus limiting opportunity to understand the concomitant emergence of disease. Kiwifruit (Actinidia spp.) is an exception: domestication began in the 1930s with outbreaks of canker disease caused by P. syringae pv. actinidiae (Psa) first recorded in the 1980s. Based on SNP analyses of two circularized and 34 draft genomes, we show that Psa is comprised of distinct clades exhibiting negligible within-clade diversity, consistent with disease arising by independent samplings from a source population. Three clades correspond to their geographical source of isolation; a fourth, encompassing the Psa-V lineage responsible for the 2008 outbreak, is now globally distributed. Psa has an overall clonal population structure, however, genomes carry a marked signature of within-pathovar recombination. SNP analysis of Psa-V reveals hundreds of polymorphisms; however, most reside within PPHGI-1-like conjugative elements whose evolution is unlinked to the core genome. Removal of SNPs due to recombination yields an uninformative (star-like) phylogeny consistent with diversification of Psa-V from a single clone within the last ten years. Growth assays provide evidence of cultivar specificity, with rapid systemic movement of Psa-V in Actinidia chinensis. Genomic comparisons show a dynamic genome with evidence of positive selection on type III effectors and other candidate virulence genes. Each clade has highly varied complements of accessory genes encoding effectors and toxins with evidence of gain and loss via multiple genetic routes. Genes with orthologs in vascular pathogens were found exclusively within Psa-V. Our analyses capture a pathogen in the early stages of emergence from a predicted source population associated with wild Actinidia species. In addition to candidate genes as targets for resistance breeding programs, our findings highlight the importance of the source population as a reservoir of new disease.

PMID:
23935484
PMCID:
PMC3723570
DOI:
10.1371/journal.ppat.1003503
[Indexed for MEDLINE]
Free PMC Article
Icon for Public Library of Science Icon for PubMed Central
11.
G3 (Bethesda). 2012 Aug;2(8):883-9. doi: 10.1534/g3.112.002527. Epub 2012 Aug 1.

Alignment-free population genomics: an efficient estimator of sequence diversity.

Author information

1
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany. haubold@evolbio.mpg.de

Abstract

Comparative sequencing contributes critically to the functional annotation of genomes. One prerequisite for successful analysis of the increasingly abundant comparative sequencing data is the availability of efficient computational tools. We present here a strategy for comparing unaligned genomes based on a coalescent approach combined with advanced algorithms for indexing sequences. These algorithms are particularly efficient when analyzing large genomes, as their run time ideally grows only linearly with sequence length. Using this approach, we have derived and implemented a maximum-likelihood estimator of the average number of mismatches per site between two closely related sequences, π. By allowing for fluctuating coalescent times, we are able to improve a previously published alignment-free estimator of π. We show through simulation that our new estimator is fast and accurate even with moderate recombination (ρ ≤ π). To demonstrate its applicability to real data, we compare the unaligned genomes of Drosophila persimilis and D. pseudoobscura. In agreement with previous studies, our sliding window analysis locates the global divergence minimum between these two genomes to the pericentromeric region of chromosome 3.

KEYWORDS:

Drosophila; alignment-free; genetic diversity; match length distribution; maximum-likelihood

PMID:
22908037
PMCID:
PMC3411244
DOI:
10.1534/g3.112.002527
[Indexed for MEDLINE]
Free PMC Article
Icon for HighWire Icon for PubMed Central
12.
Nat Commun. 2012 Jun 26;3:919. doi: 10.1038/ncomms1930.

Emergence of stable polymorphisms driven by evolutionary games between mutants.

Author information

1
Evolutionary Theory Group, Max-Planck-Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany.

Abstract

Under neutrality, polymorphisms are maintained through the balance between mutation and drift. Under selection, a variety of mechanisms may be involved in the maintenance of polymorphisms, for example, sexual selection or host-parasite coevolution on the population level or heterozygote advantage in diploid individuals. Here we address the emergence of polymorphisms in a population of interacting haploid individuals. In our model, each mutation generates a new evolutionary game characterized by a payoff matrix with an additional row and an additional column. Hence, in general, the fitness of new mutations is frequency-dependent rather than constant. This dynamical process is distinct from the sequential fixation of advantageous traits and naturally leads to the emergence of polymorphisms under selection. It causes substantially higher diversity than observed under the established models of neutral or frequency-independent selection. Our framework allows for the coexistence of an arbitrary number of types, but predicts an intermediate average diversity.

PMID:
22735447
PMCID:
PMC3621454
DOI:
10.1038/ncomms1930
[Indexed for MEDLINE]
Free PMC Article
Icon for Nature Publishing Group Icon for PubMed Central
13.
Mob Genet Elements. 2011 Sep;1(3):230-235. Epub 2011 Sep 1.

Alignment-free detection of horizontal gene transfer between closely related bacterial genomes.

Author information

1
Faculty of Electrical Engineering and Computing; Department of Applied Computing; University of Zagreb; Zagreb, Croatia.

Abstract

Bacterial epidemics are often caused by strains that have acquired their increased virulence through horizontal gene transfer. Due to this association with disease, the detection of horizontal gene transfer continues to receive attention from microbiologists and bioinformaticians alike. Most software for detecting transfer events is based on alignments of sets of genes or of entire genomes. But despite great advances in the design of algorithms and computer programs, genome alignment remains computationally challenging. We have therefore developed an alignment-free algorithm for rapidly detecting horizontal gene transfer between closely related bacterial genomes. Our implementation of this algorithm is called alfy for "ALignment Free local homologY" and is freely available from http://guanine.evolbio.mpg.de/alfy/. In this comment we demonstrate the application of alfy to the genomes of Staphylococcus aureus. We also argue that-contrary to popular belief and in spite of increasing computer speed-algorithmic optimization is becoming more, not less, important if genome data continues to accumulate at the present rate.

14.
PLoS One. 2011;6(5):e18155. doi: 10.1371/journal.pone.0018155. Epub 2011 May 26.

Estimating parameters of speciation models based on refined summaries of the joint site-frequency spectrum.

Author information

1
Department of Biology II, Section of Evolutionary Biology, LMU University of Munich, Planegg-Martinsried, Germany. tellier@biologie.uni-muenchen.de

Abstract

Understanding the processes and conditions under which populations diverge to give rise to distinct species is a central question in evolutionary biology. Since recently diverged populations have high levels of shared polymorphisms, it is challenging to distinguish between recent divergence with no (or very low) inter-population gene flow and older splitting events with subsequent gene flow. Recently published methods to infer speciation parameters under the isolation-migration framework are based on summarizing polymorphism data at multiple loci in two species using the joint site-frequency spectrum (JSFS). We have developed two improvements of these methods based on a more extensive use of the JSFS classes of polymorphisms for species with high intra-locus recombination rates. First, using a likelihood based method, we demonstrate that taking into account low-frequency polymorphisms shared between species significantly improves the joint estimation of the divergence time and gene flow between species. Second, we introduce a local linear regression algorithm that considerably reduces the computational time and allows for the estimation of unequal rates of gene flow between species. We also investigate which summary statistics from the JSFS allow the greatest estimation accuracy for divergence time and migration rates for low (around 10) and high (around 100) numbers of loci. Focusing on cases with low numbers of loci and high intra-locus recombination rates we show that our methods for the estimation of divergence time and migration rates are more precise than existing approaches.

PMID:
21637331
PMCID:
PMC3102651
DOI:
10.1371/journal.pone.0018155
[Indexed for MEDLINE]
Free PMC Article
Icon for Public Library of Science Icon for PubMed Central
15.
Bioinformatics. 2011 Jun 1;27(11):1466-72. doi: 10.1093/bioinformatics/btr176. Epub 2011 Apr 6.

Alignment-free detection of local similarity among viral and bacterial genomes.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, 24306 Plön, Germany.

Abstract

MOTIVATION:

Bacterial and viral genomes are often affected by horizontal gene transfer observable as abrupt switching in local homology. In addition to the resulting mosaic genome structure, they frequently contain regions not found in close relatives, which may play a role in virulence mechanisms. Due to this connection to medical microbiology, there are numerous methods available to detect horizontal gene transfer. However, these are usually aimed at individual genes and viral genomes rather than the much larger bacterial genomes. Here, we propose an efficient alignment-free approach to describe the mosaic structure of viral and bacterial genomes, including their unique regions.

RESULTS:

Our method is based on the lengths of exact matches between pairs of sequences. Long matches indicate close homology, short matches more distant homology or none at all. These exact match lengths can be looked up efficiently using an enhanced suffix array. Our program implementing this approach, alfy (ALignment-Free local homologY), efficiently and accurately detects the recombination break points in simulated DNA sequences and among recombinant HIV-1 strains. We also apply alfy to Escherichia coli genomes where we detect new evidence for the hypothesis that strains pathogenic in poultry can infect humans.

AVAILABILITY:

alfy is written in standard C and its source code is available under the GNU General Public License from http://guanine.evolbio.mpg.de/alfy/. The software package also includes documentation and example data.

PMID:
21471011
DOI:
10.1093/bioinformatics/btr176
[Indexed for MEDLINE]
Icon for Silverchair Information Systems
16.
Bioinformatics. 2011 Feb 15;27(4):449-55. doi: 10.1093/bioinformatics/btq689. Epub 2010 Dec 14.

Alignment-free estimation of nucleotide diversity.

Author information

1
Department of Evolutionary Genetics, Albert-Ludwigs University, Freiburg, Germany.

Abstract

MOTIVATION:

Sequencing capacity is currently growing more rapidly than CPU speed, leading to an analysis bottleneck in many genome projects. Alignment-free sequence analysis methods tend to be more efficient than their alignment-based counterparts. They may, therefore, be important in the long run for keeping sequence analysis abreast with sequencing.

RESULTS:

We derive and implement an alignment-free estimator of the number of pairwise mismatches, . Our implementation of , pim, is based on an enhanced suffix array and inherits the superior time and memory efficiency of this data structure. Simulations demonstrate that is accurate if mutations are distributed randomly along the chromosome. While real data often deviates from this ideal, remains useful for identifying regions of low genetic diversity using a sliding window approach. We demonstrate this by applying it to the complete genomes of 37 strains of Drosophila melanogaster, and to the genomes of two closely related Drosophila species, D.simulans and D.sechellia. In both cases, we detect the diversity minimum and discuss its biological implications.

PMID:
21156730
DOI:
10.1093/bioinformatics/btq689
[Indexed for MEDLINE]
Icon for Silverchair Information Systems
17.
Mol Ecol. 2010 Mar;19 Suppl 1:277-84. doi: 10.1111/j.1365-294X.2009.04482.x.

mlRho - a program for estimating the population mutation and recombination rates from shotgun-sequenced diploid genomes.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany. haubold@evolbio.mpg.de

Abstract

Improvements in sequencing technology over the past 5 years are leading to routine application of shotgun sequencing in the fields of ecology and evolution. However, the theory to estimate evolutionary parameters from these data is still being worked out. Here we present an extension and implementation of part of this theory, mlRho. This program can efficiently compute the following three maximum likelihood estimators based on shotgun sequence data obtained from single diploid individuals: the population mutation rate (4N(e)mu), the sequencing error rate, and the population recombination rate (4N(e)c). We demonstrate the accuracy of mlRho by applying it to simulated data sets. In addition, we analyse the genomes of the sea squirt Ciona intestinalis and the water flea Daphnia pulex. Ciona intestinalis is an obligate outcrosser, while D. pulex is a cyclic parthenogen, and we discuss how these contrasting life histories are reflected in our parameter estimates. The program mlRho is freely available from http://guanine.evolbio.mpg.de/mlRho.

PMID:
20331786
PMCID:
PMC4870015
DOI:
10.1111/j.1365-294X.2009.04482.x
[Indexed for MEDLINE]
Free PMC Article
Icon for Wiley Icon for PubMed Central
18.
Mol Ecol. 2010 Mar;19 Suppl 1:162-75. doi: 10.1111/j.1365-294X.2009.04471.x.

Nucleotide divergence vs. gene expression differentiation: comparative transcriptome sequencing in natural isolates from the carrion crow and its hybrid zone with the hooded crow.

Author information

1
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, Plön, Germany. wolf@evolbio.mpg.de

Abstract

Recent advances in sequencing technology promise to provide new strategies for studying population differentiation and speciation phenomena in their earliest phases. We focus here on the black carrion crow (Corvus [corone] corone), which forms a zone of hybridization and overlap with the grey coated hooded crow (Corvus [corone] cornix). However, although these semispecies are taxonomically distinct, previous analyses based on several types of genetic markers did not reveal significant molecular differentiation between them. We here corroborate this result with sequence data obtained from a set of 25 nuclear intronic loci. Thus, the system represents a case of a very early phase of species divergence that requires new molecular approaches for its description. We have therefore generated RNAseq expression profiles using barcoded massively parallel pyrosequencing of brain mRNA from six individuals of the carrion crow and five individuals from a hybrid zone with the hooded crow. We obtained 856 675 reads from two runs, with average read length of 270 nt and coverage of 8.44. Reads were assembled de novo into 19 552 contigs, 70% of which could be assigned to annotated genes in chicken and zebra finch. This resulted in a total of 7637 orthologous genes and a core set of 1301 genes that could be compared across all individuals. We find a clear clustering of expression profiles for the pure carrion crow animals and disperse profiles for the animals from the hybrid zone. These results suggest that gene expression differences may indeed be a sensitive indicator of initial species divergence.

[Indexed for MEDLINE]
Icon for Wiley
19.
Bioinformatics. 2009 Dec 15;25(24):3221-7. doi: 10.1093/bioinformatics/btp590. Epub 2009 Oct 13.

Efficient estimation of pairwise distances between genomes.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, 24306 Plön, Germany.

Abstract

MOTIVATION:

Genome comparison is central to contemporary genomics and typically relies on sequence alignment. However, genome-wide alignments are difficult to compute. We have, therefore, recently developed an accurate alignment-free estimator of the number of substitutions per site based on the lengths of exact matches between pairs of sequences. The previous implementation of this measure requires n(n-1) suffix tree constructions and traversals, where n is the number of sequences analyzed. This does not scale well for large n.

RESULTS:

We present an algorithm to extract pairwise distances in a single traversal of a single suffix tree containing n sequences. As a result, the run time of the suffix tree construction phase of our algorithm is reduced from O(n(2)L) to O(nL), where L is the length of each sequence. We implement this algorithm in the program kr version 2 and apply it to 825 HIV genomes, 13 genomes of enterobacteria and the complete genomes of 12 Drosophila species. We show that, depending on the input dataset, the new program is at least 10 times faster than its predecessor.

AVAILABILITY:

Version 2 of kr can be tested via a web interface at http://guanine.evolbio.mpg.de/kr2/. It is written in standard C and its source code is available under the GNU General Public License from the same web site.

CONTACT:

haubold@evolbio.mpg.de Supplementary informations: Supplementary data are available at Bioinformatics online.

PMID:
19825795
DOI:
10.1093/bioinformatics/btp590
[Indexed for MEDLINE]
Icon for Silverchair Information Systems
20.
J Comput Biol. 2009 Oct;16(10):1487-500. doi: 10.1089/cmb.2009.0106.

Estimating mutation distances from unaligned genomes.

Author information

1
Department of Evolutionary Genetics, Max-Planck-Institute for Evolutionary Biology, Plön, Germany. haubold@evolbio.mpg.de

Abstract

Alignment-free distance measures are generally less accurate but more efficient than traditional alignment-based metrics. In the context of genome sequence analysis, the efficiency gain is often so substantial that it outweights the loss in accuracy. However, a further disadvantage of alignment-free distances is that their relationship to evolutionary events such as substitutions is generally unknown. We have therefore derived an estimator of the number of substitutions per site between two unaligned DNA sequences, K(r). Simulations show that this estimator works well with "ideal" data. We compare K(r) to two alternative alignment-free distances: a k-tuple distance and a measure of relative entropy based on average common substring length. All three measures are applied to 27 primate mitochondrial genomes, eight whole genomes of Streptococcus agalactiae strains, and 12 whole genomes of Drosophila species. In each case, the cluster diagrams based on K(r) are equivalent to or significantly better than those based on the two alternative measures. This is due to the fact that in contrast to the alternative measures K(r) is derived from an explicit model of evolution. The computation of K(r) is efficiently implemented in the program kr, which can be downloaded freely from the internet.

PMID:
19803738
DOI:
10.1089/cmb.2009.0106
[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center