Format

Send to

Choose Destination
J Proteome Res. 2018 Jul 6;17(7):2511-2520. doi: 10.1021/acs.jproteome.8b00262. Epub 2018 May 24.

Functional Annotation of Proteins Encoded by the Minimal Bacterial Genome Based on Secondary Structure Element Alignment.

Yang Z1,2,3, Tsui SK2,3,4.

Author information

1
College of Life Information Science & Instrument Engineering , Hangzhou Dianzi University , Hangzhou 310018 , China.
2
School of Biomedical Sciences , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.
3
Hong Kong Bioinformatics Centre , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.
4
Centre for Microbial Genomics and Proteomics , The Chinese University of Hong Kong , Shatin , N.T. , Hong Kong.

Abstract

In synthetic biology, one of the key focuses is building a minimal artificial cell which can provide basic chassis for functional study. Recently, the J. Craig Venter Institute published the latest version of the minimal bacterial genome JCVI-syn3.0, which only encoded 438 essential proteins. However, among them functions of 149 proteins remain unknown because of the lack of effective annotation method. Here, we report a secondary structure element alignment method called SSEalign based on an effective training data set extracting from various bacterial genomes. The experimentally validated homologous genes in different species were selected as training positives, while unrelated genes in different species were selected as training negatives. Moreover, SSEalign used a set of well-defined basic alignment elements with the backtracking line search algorithm to derive the best parameters for accurate prediction. Experimental results showed that SSEalign achieved 88.2% test accuracy, which is better than the existing prediction methods. SSEalign was subsequently applied to identify the functions of those unannotated proteins in the latest published minimal bacteria genome JCVI-syn3.0. Results indicated that at least 136 proteins out of 149 unannotated proteins in the JCVI-syn3.0 genome could be annotated by SSEalign. Our method is effective for the identification of protein homology in JCVI-syn3.0 and can be used to annotate those hypothetical proteins in other bacterial genomes.

KEYWORDS:

JCVI-syn3.0; essential gene; homology identification; protein secondary structure

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center