U.S. flag

An official website of the United States government

Display Settings:

Items per page

PMC Full-Text Search Results

Items: 7

1.
Figure 2.

Figure 2. Domain sequence lengths.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

The bar chart shows how many domain sequences have 1 domain, 2 domains, … etc up to 25 domains. These are all non-overlapping hits in the protein. Single-domain proteins makes up more than half of the total number.

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.
2.
Figure 3.

Figure 3. Gene family distribution.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

The distributions of how many domain sequence families are found in 1, 2, …, 347 genomes. There are 909 ORFan families (leftmost bar), 479 core families (rightmost bar) and in total there are 5724 unique domain sequence families (sum of all bars).

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.
3.
Figure 5.

Figure 5. Functional distances.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

A histogram over all pairwise Manhattan distances between the genomes. The distance between two genomes is defined as the number of domain sequence families they differ in presence/absence status, i.e. a distance of 500 means there are 500 different families contained in one but not the other genome.

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.
4.
Figure 4.

Figure 4. Effect of significance cutoff.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

The horizontal axis is the – log 10 of the HMMER3 E-value cutoff and goes from E = 10 -0 on the left to E = 10 -15 on the right. The blue curve shows how the number of core families drops by stricter cutoff (going from left to right), and the red curve similar for the number of ORFan families.

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.
5.
Figure 6.

Figure 6. Genomes in functional space.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

Each dot corresponds to a genome plotted in the two first principal component directions of the E. coli functional space defined by the presence/absence of domain sequence families. There are four large subsets of genomes in the data set, and these dots are marked with colors, see figure legend. The first principal component accounts for 11% of the total data variation, and the second component 8%. Only relative positions of the genomes (dots) are important, the absolute scores on each axis lacks interpretation.

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.
6.
Figure 7.

Figure 7. Binomial mixture model.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

The left pie chart visualizes the E. coli pangenome and the right pie chart a single E. coli genome described by a binomial mixture model. There are 12 sectors, and the colors indicate the selection probabilities as displayed on the right hand colorbar. The size of the each sector shows its relative contribution to the pangenome (left) or a single genome (right). The pangenome is dominated by domain sequences having either a very large (darker blue sectors) or very small selection probability (pink sectors). In a single genome the highly conserved domain sequences (darker blue) clearly dominate.

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.
7.
Figure 1.

Figure 1. Complete and draft genomes.. From: A domain sequence approach to pangenomics: applications to Escherichia coli .

The box and whisker plots illustrate the differences between completed and draft genomes in this study. The left panel shows that the 56 complete genomes are somewhat smaller in size measured in megabases. This is most likely due to unresolved overlaps between the contigs in the draft genomes. The middle panel contains the number of unique genes predicted by the three gene finders after the elimination of all partial predictions (lacking start or stop codon). Notice the large number of predicted genes in virtually all cases, annotated E. coli genomes usually have 4500–5500 genes. Among the draft genomes some genomes have very few predicted genes, seen as circles. The rightmost panel shows the number of predicted genes with at least one Pfam-A hit. Except from four draft genomes with extremely few genes, the differences between complete and draft genomes are now ignorable.

Lars-Gustav Snipen, et al. F1000Res. 2012;1:19.

Display Settings:

Items per page

Supplemental Content

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...
Support Center