Send to

Choose Destination
Proteomics. 2005 Feb;5(2):450-60.

Multigenic families and proteomics: extended protein characterization as a tool for paralog gene identification.

Author information

Laboratoire de Spectrom├ętrie de Masse Bio-Organique, Strasbourg, France.


In classical proteomic studies, the searches in protein databases lead mostly to the identification of protein functions by homology due to the non-exhaustiveness of the protein databases. The quality of the identification depends on the studied organism, its complexity and its representation in the protein databases. Nevertheless, this basic function identification is insufficient for certain applications namely for the development of RNA-based gene-silencing strategies, commonly termed RNA interference (RNAi) in animals and post-transcriptional gene silencing (PTGS) in plants, that require an unambiguous identification of the targeted gene sequence. A PTGS strategy was considered in the study of the infection of Oryza sativa by the Rice Yellow Mottle Virus (RYMV). It is suspected that the RYMV recruits host proteins after its entry into plant cells to form a complex facilitating virus multiplication and spreading. The protein partners of this complex were identified by a classical proteomic approach, nano liquid chromatography tandem mass spectrometry. Among the identified proteins, several were retained for a PTGS strategy. Nevertheless most of the protein candidates appear to be members of multigenic families for which all paralog genes are not present in protein databases. Thus the identification of the real expressed paralog gene with classical protein database searches is impossible. Consequently, as the genome contains all genes and thus all paralog genes, a whole genome search strategy was developed to determine the specific expressed paralog gene. With this approach, the identification of peptides matching only a single gene, called discriminant peptides, allows definitive proof of the expression of this identified gene. This strategy has several requirements: (i) a genome completely sequenced and accessible; (ii) high protein sequence coverage. In the present work, through three examples, we report and validate for the first time a genome database search strategy to specifically identify paralog genes belonging to multigenic families expressed under specific conditions.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center