ESMP: A high-throughput computational pipeline for mining SSR markers from ESTs

Ranjan Sarmah; Jagajjit Sahu; Budheswar Dehury; Kishore Sarma; Smita Sahoo; Mousumi Sahu; Madhumita Barooah; Priyabrata Sen; Mahendra Kumar Modi

doi:10.6026/97320630008206

ESMP: A high-throughput computational pipeline for mining SSR markers from ESTs

Bioinformation. 2012;8(4):206-8. doi: 10.6026/97320630008206. Epub 2012 Feb 28.

Authors

Ranjan Sarmah¹, Jagajjit Sahu, Budheswar Dehury, Kishore Sarma, Smita Sahoo, Mousumi Sahu, Madhumita Barooah, Priyabrata Sen, Mahendra Kumar Modi

Affiliation

¹ Agri-Bioinformatics Promotion Programme, Department of Agricultural Biotechnology, Assam Agricultural University, Jorhat- 785013, Assam, India.

Abstract

With the advent of high-throughput sequencing technology, sequences from many genomes are being deposited to public databases at a brisk rate. Open access to large amount of expressed sequence tag (EST) data in the public databases has provided a powerful platform for simple sequence repeat (SSR) development in species where sequence information is not available. SSRs are markers of choice for their high reproducibility, abundant polymorphism and high inter-specific transferability. The mining of SSRs from ESTs requires different high-throughput computational tools that need to be executed individually which are computationally intensive and time consuming. To reduce the time lag and to streamline the cumbersome process of SSR mining from ESTs, we have developed a user-friendly, web-based EST-SSR pipeline "EST-SSR-MARKER PIPELINE (ESMP)". This pipeline integrates EST pre-processing, clustering, assembly and subsequently mining of SSRs from assembled EST sequences. The mining of SSRs from ESTs provides valuable information on the abundance of SSRs in ESTs and will facilitate the development of markers for genetic analysis and related applications such as marker-assisted breeding.

Availability: The database is available for free at http://bioinfo.aau.ac.in/ESMP.

Keywords: ESMP; Expressed Sequence Tag; Simple Sequence Repeats; Single Nucleotide Polymorphism.