Your browser version may not work well with NCBI's Web applications. More information here...
Items 1 - 20 of 26
of 2Next
1: Bioinformatics. 2008 Aug 15;24(16):i187-92.Click here to read Links

Segment-based multiple sequence alignment.

International Max Planck Research School for Computational Biology and Scientific Computing, Ihnestr 63-73, 14195 Berlin, Germany. rausch@inf.fu-berlin.de

MOTIVATION: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given the importance and wide-spread use of alignment tools, progress in both categories is a contribution to the community and has driven research in the field so far. RESULTS: We introduce a graph-based extension to the consistency-based, progressive alignment strategy. We apply the consistency notion to segments instead of single characters. The main problem we solve in this context is to define segments of the sequences in such a way that a graph-based alignment is possible. We implemented the algorithm using the SeqAn library and report results on amino acid and DNA sequences. The benefit of our approach is threefold: (1) sequences with conserved blocks can be rapidly aligned, (2) the implementation is conceptually easy, generic and fast and (3) the consistency idea can be extended to align multiple genomic sequences. AVAILABILITY: The segment-based multiple sequence alignment tool can be downloaded from http://www.seqan.de/projects/msa.html. A novel version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org. The usage of the tool is described in both documentations.

PMID: 18689823 [PubMed - indexed for MEDLINE]

2: Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W10-3. Epub 2008 May 15.Click here to read Click here to read Links

R-Coffee: a web server for accurately aligning noncoding RNA sequences.

Swiss Institute of Bioinformatics (SIB), Quartier Sorge - Genopode, UNIL, CH-1015 Lausanne, Switzerland.

The R-Coffee web server produces highly accurate multiple alignments of noncoding RNA (ncRNA) sequences, taking into account predicted secondary structures. R-Coffee uses a novel algorithm recently incorporated in the T-Coffee package. R-Coffee works along the same lines as T-Coffee: it uses pairwise or multiple sequence alignment (MSA) methods to compute a primary library of input alignments. The program then computes an MSA highly consistent with both the alignments contained in the library and the secondary structures associated with the sequences. The secondary structures are predicted using RNAplfold. The server provides two modes. The slow/accurate mode is restricted to small datasets (less than 5 sequences less than 150 nucleotides) and combines R-Coffee with Consan, a very accurate pairwise RNA alignment method. For larger datasets a fast method can be used (RM-Coffee mode), that uses R-Coffee to combine the output of the three packages which combines the outputs from programs found to perform best on RNA (MUSCLE, MAFFT and ProbConsRNA). Our BRAliBase benchmarks indicate that the R-Coffee/Consan combination is one of the best ncRNA alignment methods for short sequences, while the RM-Coffee gives comparable results on longer sequences. The R-Coffee web server is available at http://www.tcoffee.org.

PMID: 18483080 [PubMed - indexed for MEDLINE]

PMCID: PMC2447777

3: Curr Protoc Bioinformatics. 2004 Feb;Chapter 3:Unit3.8.Click here to read Links

Computing multiple sequence/structure alignments with the T-coffee package.

Swiss Institute of Bioinformatics, Epalinges, Switzerland.

The FASTA package provides a comprehensive set of similarity searching programs, similar to those provided by the BLAST package, and some additional programs for searching with short peptides and oligonucleotides that are not provided by BLAST. The FASTA programs work with a wide variety of database formats, including mySQL sequence databases. FASTA provides very accurate statistical significance estimates, and is more sensitive than BLASTN when comparing DNA sequences. These protocols describe how to use the FASTA programs to characterize protein and DNA sequences, using protein:protein, protein:DNA, and DNA:DNA comparisons.

PMID: 18428722 [PubMed - indexed for MEDLINE]

4: Nucleic Acids Res. 2008 May;36(9):e52. Epub 2008 Apr 17.Click here to read Click here to read Links

R-Coffee: a method for multiple alignment of non-coding RNA.

The Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Ireland.

R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).

PMID: 18420654 [PubMed - indexed for MEDLINE]

PMCID: PMC2396437

5: BMC Genomics. 2007 Oct 31;8:398.Click here to read Click here to read Links

Vertebrate conserved non coding DNA regions have a high persistence length and a short persistence time.

Computational Cancer Genomics Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland. Dorota.Retelska@isrec.ch

BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints.Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements.Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes.

PMID: 17973996 [PubMed - indexed for MEDLINE]

PMCID: PMC2211324

6: PLoS Comput Biol. 2007 Aug;3(8):e123.Click here to read Click here to read Links

Recent evolutions of multiple sequence alignment algorithms.

Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology, Parc Scientifique de Luminy, Marseille, France. cedric.notredame@europe.com

PMID: 17784778 [PubMed - indexed for MEDLINE]

PMCID: PMC1963500

7: Nucleic Acids Res. 2007 Jul;35(Web Server issue):W645-8. Epub 2007 May 25.Click here to read Click here to read Links

The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods.

Swiss Institute of Bioinformatics, Bâtiment Génopode, UNIL, CH-101 Lausanne.

The M-Coffee server is a web server that makes it possible to compute multiple sequence alignments (MSAs) by running several MSA methods and combining their output into one single model. This allows the user to simultaneously run all his methods of choice without having to arbitrarily choose one of them. The MSA is delivered along with a local estimation of its consistency with the individual MSAs it was derived from. The computation of the consensus multiple alignment is carried out using a special mode of the T-Coffee package [Notredame, Higgins and Heringa (T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000; 302: 205-217); Wallace, O'Sullivan, Higgins and Notredame (M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006; 34: 1692-1699)] Given a set of sequences (DNA or proteins) in FASTA format, M-Coffee delivers a multiple alignment in the most common formats. M-Coffee is a freeware open source package distributed under a GPL license and it is available either as a standalone package or as a web service from www.tcoffee.org.

PMID: 17526519 [PubMed - indexed for MEDLINE]

PMCID: PMC1933118

8: Bioinformatics. 2006 Oct 1;22(19):2439-40.Click here to read Links

APDB: a web server to evaluate the accuracy of sequence alignments using structural information.

CNRS UPR2589, Institute for Structural Biology and Microbiology (IBSM), Parc Scientifique de Luminy, 163 Avenue de Luminy, FR-13288, Marseille cedex 09, France.

The APDB webserver uses structural information to evaluate the alignment of sequences with known structures. It returns a score correlated to the overall alignment accuracy as well as a local evaluation. Any sequence alignment can be analyzed with APDB provided it includes at least two proteins with known structures. Sequences without a known structure are simply ignored and do not contribute to the scoring procedure. AVAILABILITY: APDB is part of the T-Coffee suite of tools for alignment analysis, it is available on www.tcoffee.org. A stand-alone version of the package is also available as a freeware open source from the same address.

PMID: 17032685 [PubMed - indexed for MEDLINE]

9: Bioinformatics. 2006 Jul 15;22(14):e35-9.Click here to read Links

The iRMSD: a local measure of sequence alignment accuracy using structural information.

Laboratoire Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology (IBSM) Parc Scientifique de Luminy, case 934, 163 Avenue de Luminy, FR-13288, Marseille cedex 09.

MOTIVATION: We introduce the iRMSD, a new type of RMSD, independent from any structure superposition and suitable for evaluating sequence alignments of proteins with known structures. RESULTS: We demonstrate that the iRMSD is equivalent to the standard RMSD although much simpler to compute and we also show that it is suitable for comparing sequence alignments and benchmarking multiple sequence alignment methods. We tested the iRMSD score on 6 established multiple sequence alignment packages and found the results to be consistent with those obtained using an established reference alignment collection like Prefab. AVAILABILITY: The iRMSD is part of the T-Coffee package and is distributed as an open source freeware (http://www.tcoffee.org/).

PMID: 16873492 [PubMed - indexed for MEDLINE]

10: Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W604-8.Click here to read Click here to read Links

Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee.

Laboratoire Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology (IBSM), Parc Scientifique de Luminy, 163 Avenue de Luminy, FR- 13288, Marseille cedex 09, France.

Expresso is a multiple sequence alignment server that aligns sequences using structural information. The user only needs to provide sequences. The server runs BLAST to identify close homologues of the sequences within the PDB database. These PDB structures are used as templates to guide the alignment of the original sequences using structure-based sequence alignment methods like SAP or Fugue. The final result is a multiple sequence alignment of the original sequences based on the structural information of the templates. An advanced mode makes it possible to either upload private structures or specify which PDB templates should be used to model each sequence. Providing the suitable structural information is available, Expresso delivers sequence alignments with accuracy comparable with structure-based alignments. The server is available on http://www.tcoffee.org/.

PMID: 16845081 [PubMed - indexed for MEDLINE]

PMCID: PMC1538866

11: Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W600-3.Click here to read Click here to read Links

PROTOGENE: turning amino acid alignments into bona fide CDS nucleotide alignments.

Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology (IBSM), Parc Scientifique de Luminy, 163 Avenue de Luminy, FR 13288, Marseille cedex 09, France.

We describe Protogene, a server that can turn a protein multiple sequence alignment into the equivalent alignment of the original gene coding DNA. Protogene relies on a pipeline where every initial protein sequence is BLASTed against RefSeq or NR. The annotation associated with potential matches is used to identify the gene sequence. This gene sequence is then aligned with the query protein using Exonerate in order to extract a coding nucleotide sequence matching the original protein. Protogene can handle protein fragments and will return every CDS coding for a given protein, even if they occur in different genomes. Protogene is available from http://www.tcoffee.org/.

PMID: 16845080 [PubMed - indexed for MEDLINE]

PMCID: PMC1538918

12: Nucleic Acids Res. 2006 Mar 23;34(6):1692-9. Print 2006.Click here to read Click here to read Links

M-Coffee: combining multiple sequence alignment methods with T-Coffee.

The Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Ireland.

We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from http://www.tcoffee.org/.

PMID: 16556910 [PubMed - indexed for MEDLINE]

PMCID: PMC1410914

13: Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W606-9.Click here to read Click here to read Links

CaspR: a web server for automated molecular replacement using homology modelling.

Information Génomique & Structurale (UPR CNRS 2589), Institut de Biologie Structurale et Microbiologie, 31, chemin Joseph Aiguier, 13402 Marseille Cedex 20, France.

Molecular replacement (MR) is the method of choice for X-ray crystallography structure determination when structural homologues are available in the Protein Data Bank (PDB). Although the success rate of MR decreases sharply when the sequence similarity between template and target proteins drops below 35% identical residues, it has been found that screening for MR solutions with a large number of different homology models may still produce a suitable solution where the original template failed. Here we present the web tool CaspR, implementing such a strategy in an automated manner. On input of experimental diffraction data, of the corresponding target sequence and of one or several potential templates, CaspR executes an optimized molecular replacement procedure using a combination of well-established stand-alone software tools. The protocol of model building and screening begins with the generation of multiple structure-sequence alignments produced with T-COFFEE, followed by homology model building using MODELLER, molecular replacement with AMoRe and model refinement based on CNS. As a result, CaspR provides a progress report in the form of hierarchically organized summary sheets that describe the different stages of the computation with an increasing level of detail. For the 10 highest-scoring potential solutions, pre-refined structures are made available for download in PDB format. Results already obtained with CaspR and reported on the web server suggest that such a strategy significantly increases the fraction of protein structures which may be solved by MR. Moreover, even in situations where standard MR yields a solution, pre-refined homology models produced by CaspR significantly reduce the time-consuming refinement process. We expect this automated procedure to have a significant impact on the throughput of large-scale structural genomics projects. CaspR is freely available at http://igs-server.cnrs-mrs.fr/Caspr/.

PMID: 15215460 [PubMed - indexed for MEDLINE]

PMCID: PMC441538

14: Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W37-40.Click here to read Click here to read Links

3DCoffee@igs: a web server for combining sequences and structures into a multiple sequence alignment.

Information Génomique et Structurale UPR2589-CNRS, CNRS, 31, Chemin Joseph Aiguier, 13 402 Marseille Cedex 20, France.

This paper presents 3DCoffee@igs, a web-based tool dedicated to the computation of high-quality multiple sequence alignments (MSAs). 3D-Coffee makes it possible to mix protein sequences and structures in order to increase the accuracy of the alignments. Structures can be either provided as PDB identifiers or directly uploaded into the server. Given a set of sequences and structures, pairs of structures are aligned with SAP while sequence-structure pairs are aligned with Fugue. The resulting collection of pairwise alignments is then combined into an MSA with the T-Coffee algorithm. The server and its documentation are available from http://igs-server.cnrs-mrs.fr/Tcoffee/.

PMID: 15215345 [PubMed - indexed for MEDLINE]

PMCID: PMC441520

15: J Mol Biol. 2004 Jul 2;340(2):385-95.Click here to read Links

3DCoffee: combining protein sequences and structures within multiple sequence alignments.

Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland.

Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.

PMID: 15201059 [PubMed - indexed for MEDLINE]

16: J Struct Funct Genomics. 2003;4(2-3):141-57.Click here to read Links

Structural genomics of highly conserved microbial genes of unknown function in search of new antibacterial targets.

Structural and Genomic Information Laboratory, UMR 1889 CNRS-AVENTIS, 13402 Marseille cedex 20, France. Chantal.Abergel@igs.cnrs-mrs.fr

With more than 100 antibacterial drugs at our disposal in the 1980's, the problem of bacterial infection was considered solved. Today, however, most hospital infections are insensitive to several classes of antibacterial drugs, and deadly strains of Staphylococcus aureus resistant to vancomycin--the last resort antibiotic--have recently begin to appear. Other life-threatening microbes, such as Enterococcus faecalis and Mycobacterium tuberculosis are already able to resist every available antibiotic. There is thus an urgent, and continuous need for new, preferably large-spectrum, antibacterial molecules, ideally targeting new biochemical pathways. Here we report on the progress of our structural genomics program aiming at the discovery of new antibacterial gene targets among evolutionary conserved genes of uncharacterized function. A series of bioinformatic and comparative genomics analyses were used to identify a set of 221 candidate genes common to Gram-positive and Gram-negative bacteria. These genes were split between two laboratories. They are now submitted to a systematic 3-D structure determination protocol including cloning, protein expression and purification, crystallization, X-ray diffraction, structure interpretation, and function prediction. We describe here our strategies for the 111 genes processed in our laboratory. Bioinformatics is used at most stages of the production process and out of 111 genes processed--and 17 months into the project--108 have been successfully cloned, 103 have exhibited detectable expression, 84 have led to the production of soluble protein, 46 have been purified, 12 have led to usable crystals, and 7 structures have been determined.

PMID: 14649299 [PubMed - indexed for MEDLINE]

Patient Drug Information

  • Vancomycin (Vancocin® )

    Vancomycin is used to treat colitis (inflammation of the intestine caused by certain bacteria) that may occur after antibiotic treatment. Vancomycin is in a class of medications called glycopeptide antibiotics. It works ...

17: Bioinformatics. 2003;19 Suppl 1:i215-21.Click here to read Links

APDB: a novel measure for benchmarking sequence alignment methods without reference alignments.

Department of Biochemistry, University College, Cork, Ireland.

MOTIVATION: We describe APDB, a novel measure for evaluating the quality of a protein sequence alignment, given two or more PDB structures. This evaluation does not require a reference alignment or a structure superposition. APDB is designed to efficiently and objectively benchmark multiple sequence alignment methods. RESULTS: Using existing collections of reference multiple sequence alignments and existing alignment methods, we show that APDB gives results that are consistent with those obtained using conventional evaluations. We also show that APDB is suitable for evaluating sequence alignments that are structurally equivalent. We conclude that APDB provides an alternative to more conventional methods used for benchmarking sequence alignment packages.

PMID: 12855461 [PubMed - indexed for MEDLINE]

18: Nucleic Acids Res. 2003 Jul 1;31(13):3503-6.Click here to read Click here to read Links

Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments.

Information Genomique et Structurale, CNRS, 31 Chemin Joseph Aiguier, 13 402 Marseille Cedex 20, France.

This paper presents Tcoffee@igs, a new server provided to the community by Hewlet Packard computers and the Centre National de la Recherche Scientifique. This server is a web-based tool dedicated to the computation, the evaluation and the combination of multiple sequence alignments. It uses the latest version of the T-Coffee package. Given a set of unaligned sequences, the server returns an evaluated multiple sequence alignment and the associated phylogenetic tree. This server also makes it possible to evaluate the local reliability of an existing alignment and to combine several alternative multiple alignments into a single new one. Tcoffee@igs can be used for aligning protein, RNA or DNA sequences. Datasets of up to 100 sequences (2000 residues long) can be processed. The server and its documentation are available from: http://igs-server.cnrs-mrs.fr/Tcoffee/.

PMID: 12824354 [PubMed - indexed for MEDLINE]

PMCID: PMC168929

19: Pharmacogenomics. 2002 Jan;3(1):131-44.Click here to read Links

Recent progress in multiple sequence alignment: a survey.

Information Génétique et Structurale, UMR 1889, 31 Chemin Joseph Aiguier, 13 006 Marseille, France. cedric.notredame@igs.cnrs-mrs.fr

The assembly of a multiple sequence alignment (MSA) has become one of the most common tasks when dealing with sequence analysis. Unfortunately, the wide range of available methods and the differences in the results given by these methods makes it hard for a non-specialist to decide which program is best suited for a given purpose. In this review we briefly describe existing techniques and expose the potential strengths and weaknesses of the most widely used multiple alignment packages.

PMID: 11966409 [PubMed - indexed for MEDLINE]

20: Bioinformatics. 2001 Apr;17(4):373-4.Click here to read Links

Mocca: semi-automatic method for domain hunting.

Information Genetique et Structurale, CNRS-UMR 1889, 31 Ch. Joseph Aiguier, 13 402 Marseille, France.

MOTIVATION: Multiple OCCurrences Analysis (Mocca) is a new method for repeat extraction. It is based on the T-Coffee package (Notredame et al., JMB, 302, 205-217, 2000). Given a sequence or a set of sequences, and a library of local alignments, Mocca extracts every segment of sequence homologous to a pre-specified master. The implementation is meant for domain hunting and makes it fast and easy to test for new boundaries or extend known repeats in an interactive manner. Mocca is designed to deal with highly divergent protein repeats (less than 30% amino acid identity) of more than 30 amino acids.

PMID: 11301309 [PubMed - indexed for MEDLINE]

Items 1 - 20 of 26
of 2Next