Format

Send to

Choose Destination
J Theor Biol. 2016 Oct 21;407:318-327. doi: 10.1016/j.jtbi.2016.07.032. Epub 2016 Jul 25.

Comparison of genomic data via statistical distribution.

Author information

1
University of Wisconsin-Green Bay, Department of Natural and Applied Sciences, Green Bay, WI, USA. Electronic address: saeid.amiri1@gmail.com.
2
Statistics Online Computational Resource (SOCR), Michigan Institute for Data Science (MIDAS), School of Nursing, University of Michigan, Ann Arbor, MI 49109, USA. Electronic address: dinov@umich.edu.

Abstract

Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences.

KEYWORDS:

Alignment-free; Clustering; Distance; K-tuple

PMID:
27460589
PMCID:
PMC5361063
DOI:
10.1016/j.jtbi.2016.07.032
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center