FIG. 1.

FIG. 1. From: gadem: A Genetic Algorithm Guided Formation of Spaced Dyads Coupled with an EM Algorithm for Motif Discovery.

Flowchart of gadem algorithm. The algorithm is divided into three parts: formation of spaced dyads (left large box), GA (center large box) and motif declaration (right box). The three parts constitute one cycle of gadem. gadem automatically carries out several such cycles until no further motifs with E-values below a pre-specified threshold can be found. For each gadem cycle, the steps in the blue box are repeated for a user-specified number of generations (indexed by g, g = 0 at the beginning of GA), whereas the steps in the red and green boxes are carried out only once for each gadem cycle. gadem begins by enumerating the matching instances of all k-mers (k = 3, 4, 5, 6). For each k, the words are rank-ordered based on their z-scores. This results in four groups of top-ranked k-mers. A spaced dyad is formed by randomly choosing two words (a1 and a2) independently from any of the four groups and a randomly chosen width between 0 and d (e.g., d = 10). In the GA stage, a “population” indexed by r (e.g., ) of such spaced dyads is generated. The r spaced dyads are converted into r position weight matrices (PWMs), θ. The PWMs are subjected to a user-specified number (e.g., 40) of steps of EM or until it converges. The score distribution of the integerized form of is computed. The same integerized is also used to scan for binding sites in the data. A subsequence of length w is declared a binding sites when the p-value of its PWM score is below a threshold (e.g., ≤ 2.5 × 10−5). The entropy score of the aligned binding sites (motif) is computed and the logarithm of its statistical significance (E-value) is used as the fitness score for the spaced dyad from which the motif is derived. Next, all except the best performing spaced dyad(s) (with the lowest E-value) in the population are subjected to either mutation or crossover operations. This process (blue box) is repeated until the maximal number of generations (e.g., 5) has been reached.

Leping Li. J Comput Biol. 2009 February;16(2):317-329.

Supplemental Content

Filter your results:

Search details

See more...

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...
Write to the Help Desk