![]() | ![]() |
Formats:
|
||||||||
Copyright © 2008 The Author(s) CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats 1Univ. Paris-Sud 11, CNRS, UMR8621, Institut de Génétique et Microbiologie, 91405 Orsay and 2DGA/D4S - Mission pour la Recherche et l’Innovation Scientifique, 7, rue des Mathurins, 00470 Armées, France *To whom correspondence should be addressed.+33 1 69 15 30 01 +33 1 69 15 66 78 Email: ibtissem.grissa/at/igmors.u-psud.fr Received January 25, 2008; Revised April 6, 2008; Accepted April 11, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Clustered regularly interspaced short palindromic repeat (CRISPR) elements are a particular family of tandem repeats present in prokaryotic genomes, in almost all archaea and in about half of bacteria, and which participate in a mechanism of acquired resistance against phages. They consist in a succession of direct repeats (DR) of 24–47 bp separated by similar sized unique sequences (spacers). In the large majority of cases, the direct repeats are highly conserved, while the number and nature of the spacers are often quite diverse, even among strains of a same species. Furthermore, the acquisition of new units (DR + spacer) was shown to happen almost exclusively on one side of the locus. Therefore, the CRISPR presents an interesting genetic marker for comparative and evolutionary analysis of closely related bacterial strains. CRISPRcompar is a web service created to assist biologists in the CRISPR typing process. Two tools facilitates the in silico investigation: CRISPRcomparison and CRISPRtionary. This website is freely accessible at http://crispr.u-psud.fr/CRISPRcompar/. INTRODUCTION The clustered regularly interspaced short palindromic repeat (CRISPR)-associated system (CASS) comprises the particular repeated element CRISPR itself, the promoter for its transcription (also called the leader) and a set of cas genes responsible for its maintenance and function (1,2). It is found in most Archea and 40% bacteria, and is linked to a mechanism of acquired resistance against bacteriophages (3). Some genomes harbour a significant number of CRISPRs [18 in Methanocaldococcus jannaschii DSM 2661with three different direct repeats (DRs)] (4). When different CRISPRs with the same DR are present in a genome, they have a very similar leader, generally different spacers, and only one is associated with cas genes (5). When CRISPRs from different CRISPR families exist in the same genome, one set of cas genes specific for each family is present. Finally, within a species, different strains may have different CRISPRs. The example of the three sequenced strains of Streptococcus thermophilus is very illustrative of this situation, since three CRISPRs were identified in this species but only strain LMD-9 possesses the three of them (4). CRISPRs evolve either by deletion or acquisition of units (a DR and a spacer) following a mechanism proposed firstly by Pourcel et al. (6) and recently confirmed (7–9). In the majority of cases, new units are added at one end of the CRISPR adjacent to the leader, whereas motif deletions can occur randomly. The independent acquisition of the same spacer twice is possible but is not frequent and easily detected. Thus, the presence of identical spacers in the same CRISPR locus in distinct strains reflects shared ancestry. The polymorphism of CRISPRs can be used for molecular typing. The standard and classical technology developed for Mycobacterium tuberculosis typing (10) is the spoligotyping, which consists in detecting the presence/absence of a range of spacers. This technique and other PCR-based typing methods have been applied in CRISPR genotyping to study other bacterial species (6,11–16). We recently implemented a program (CRISPRFinder) allowing the identification of a CRISPR structure based on a thorough characterization of its components, i.e. the DR and the spacers (17). Using this program, public genome sequences are analysed and the extracted CRISPRs are stored into a database (CRISPRdb) (4). CRISPRFinder and CRISPRdb are accessible on the web together with different tools that assist in recovering spacers and DR sequences, and blasting them against Genbank. We now report on the development of a new website dedicated to the comparison of CRISPRs between strains and the labelling of spacers when multiple alleles are analysed. CRISPRcompar is freely accessible at http://crispr.u-psud.fr/CRISPRcompar/index.php. METHODS AND IMPLEMENTATION CRISPRcompar is a friendly web resource offering tools to compare CRISPRs between strains of a given species or between closely related species, and to classify the spacers. Its core routines were developed in Perl under Debian Linux. It is composed of two main applications; CRISPRcomparison and CRISPRtionary. CRISPRcomparison identifies and compares the CRISPRs of two or more genomes (complete or partial sequences). It is particularly useful when strains of a species possess several CRISPRs for which positions on the genome might vary, as a result for instance of large-scale genome rearrangements, or of presence–absence polymorphism of CRISPR loci in the genomes of interest. The similarity criteria are based on having an identical consensus DR and similar flanking sequences. The flanking sequences are compared by the ClustalW alignment of the 200 bp adjacent sequences to the CRISPR with a threshold of 90% of similarity. In the majority of cases, when multiple CRISPRs with the same DR are present in a genome, only one flanking sequence is similar, the one corresponding to the leader. CRISPRtionary lists the spacers from different alleles derived from the same CRISPR locus and annotates them in a polarized fashion. Such data will be produced for instance when investigating the diversity (evolution) of CRISPRs within a species by sequencing the locus in different isolates. This tool can then be used to automatically number spacers, produce a ‘dictionary’ or repertoire of spacers and code the alleles using this dictionary. CRISPRFinder is used to identify the DR and order the spacers according to the DR sequence. When sequencing PCR products, the first few nucleotides may be missed or the data may be of poor quality. In addition, the first, often partial and degenerated DR (up to 50% of differences have been observed) may be missed by CRISPRFinder in this context. For this reason, a filter exploring the existence of stretches of additional DR in the flanking sequence was added so as to correctly identify the first spacer. It consists in blasting the two halves of the DR against the remaining nucleotides of the allele sequence. Given the mechanism of acquisition of new spacers, we recommend to orientate the CRISPR such that the degenerated DR is located on the left extremity and the leader is on the right. These criteria are convenient to attribute increasing numbers to the spacers from left to right, according to their acquisition order, i.e. the more recently added spacer close to the leader will be given the highest number. Input The CRISPRcompar program automatically recovers from CRISPRdb all strains containing a CRISPR and proposes to compare each of them using the alphabetic list (alternatively, all strains from a given genus can be selected at once using the ‘strain taxonomy browser’). To compare unpublished sequences and genomes, a private database on the model of CRISPRdb (4) must first be created (http://crispr.u-psud.fr/CRISPRcompar/private/). Additional sequences from the private database can then be added in the comparison. Once a selection of sequences has been performed, the ‘compare’ button leads to a page where it is possible to choose the strain that will be used as a reference for the CRISPRs annotation. At this step, it is also possible to remove or add sequences in the comparison. When several alleles of a given locus are present in the submitted sequences, their spacers can be annotated using CRISPRtionary. Fasta files containing sequenced CRISPR alleles can also be directly submitted to CRISPRtionary. Output For the CRISPRcomparison application, the result is shown in a table where CRISPRs are grouped. Figure 1
The last step may be added to improve the output; this is called the re-annotation step. It might be interesting when a collection of alleles has been analysed to re-annotate the spacers such that numbering is increasing starting from one end of the CRISPR. We propose that the oldest spacer, i.e. the one near the degenerated DR, when the later is identified, be given the label 1 and subsequent ones increasing numbers. The re-annotation tool modifies the labels such that all the labels inside an allele are in an increasing order and a new set of output files is produced. Sometimes, a duplication of one or several spacers may occur and in this case, the term ‘bis’ is added to the spacer label in the CRISPR code. On Figure 2
DISCUSSION AND CONCLUSIONS The CRISPRcompar web server proposes a set of bioinformatic tools assisting biologists in the development and the setting up of a CRISPR genotyping scheme. In the pre-processing phase, the comparison of CRISPRs is mandatory and may be fulfilled using the CRISPRcomparison tool, which helps in selecting the most appropriate CRISPR loci and associated primers for the PCR amplification. CRISPRcomparison allows the identification of families of strains that share a CRISPR, inside species with high genetic diversity or the identification of homologous CRISPRs within species containing multiple CRISPR loci. In the post-processing phase, the CRISPRtionary program is very interesting since it allows the user to easily compare multiple alleles of a CRISPR locus investigated in a collection of strains and to obtain pre-calculated files that may be directly used in clustering analysis. Many clustering methods are applicable and may provide a good clustering of the strains even if these methods usually do not take full advantage of the CRISPR rules of evolution, which could be used to better assess—in addition to forming groups of related strains—parental relations between taxa. The primary evolutionary events considered are motifs insertion and deletion. In the case of inactive (in terms of spacer acquisition) CRISPRs, only deletions are possible, and the Camin–Soakal (19) Parsimony model may be considered. In Camin–Soakal parsimony, two states are considered (0 and 1 for example), and no transition from derived state back to ancestral state is allowed. For an inactive CRISPR locus, the ancestral state is the presence of a unit and the derived state is unit absence; thus only deletion changes are allowed. Our future developments of CRISPRcompar will incorporate applications such as the MIX program of the package phylip (Felsenstein), which carries out the Camin–Soakal Parsimony method. It can be applied using the binary file with minor modifications. ACKNOWLEDGEMENTS The CNRS and Université Paris Sud 11 have funded this project. I.G. is supported by the TBChina EU project grant LSHPCT-2005-012166. Funding to pay the Open Access publication charges for this article was provided by Association Vaincre la Mucoviscidose. Conflict of interest statement. None declared. REFERENCES 1. Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol. Direct. 2006;1:7. [PubMed] 2. Sorek R, Kunin V, Hugenholtz P. CRISPR - a widespread system that provides acquired resistance against phages in bacteria and archaea. Nat. Rev. Microbiol. 2008;6:181–186. [PubMed] 3. Deveau H, Barrangou R, Garneau JE, Labonte J, Fremaux C, Boyaval P, Romero DA, Horvath P, Moineau S. Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J. Bacteriol. 2008;190:1390–1400. [PubMed] 4. Grissa I, Vergnaud G, Pourcel C. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinform. 2007;8:172. 5. Jansen R, Embden JD, Gaastra W, Schouls LM. Identification of genes that are associated with DNA repeats in prokaryotes. Mol. Microbiol. 2002;43:1565–1575. [PubMed] 6. Pourcel C, Salvignol G, Vergnaud G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology. 2005;151:653–663. [PubMed] 7. Lillestol RK, Redder P, Garrett RA, Brugger K. A putative viral defence mechanism in archaeal cells. Archaea. 2006;2:59–72. [PubMed] 8. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P. CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007;315:1709–1712. [PubMed] 9. Tyson GW, Banfield JF. Rapidly evolving CRISPRs implicated in acquired resistance of microorganisms to viruses. Environ. Microbiol. 2008;10:200–207. [PubMed] 10. Groenen PM, Bunschoten AE, van Soolingen D, van Embden JD. Nature of DNA polymorphism in the direct repeat cluster of Mycobacterium tuberculosis; application for strain differentiation by a novel typing method. Mol. Microbiol. 1993;10:1057–1065. [PubMed] 11. Mokrousov I, Limeschenko E, Vyazovaya A, Narvskaya O. Corynebacterium diphtheriae spoligotyping based on combined use of two CRISPR loci. Biotechnol. J. 2007;2:901–906. [PubMed] 12. Mokrousov I, Narvskaya O, Limeschenko E, Vyazovaya A. Efficient discrimination within a Corynebacterium diphtheriae epidemic clonal group by a novel macroarray-based method. J. Clin. Microbiol. 2005;43:1662–1668. [PubMed] 13. Hoe N, Nakashima K, Grigsby D, Pan X, Dou SJ, Naidich S, Garcia M, Kahn E, Bergmire-Sweat D, Musser JM. Rapid molecular genetic subtyping of serotype M1 group A Streptococcus strains. Emerg. Infect. Dis. 1999;5:254–263. [PubMed] 14. Schouls LM, Reulen S, Duim B, Wagenaar JA, Willems RJ, Dingle KE, Colles FM, Van Embden JD. Comparative genotyping of Campylobacter jejuni by amplified fragment length polymorphism, multilocus sequence typing, and short repeat sequencing: strain diversity, host range, and recombination. J. Clin. Microbiol. 2003;41:15–26. [PubMed] 15. DeBoy RT, Mongodin EF, Emerson JB, Nelson KE. Chromosome evolution in the Thermotogales: large-scale inversions and strain diversification of CRISPR sequences. J. Bacteriol. 2006;188:2364–2374. [PubMed] 16. Vergnaud G, Li Y, Gorge O, Cui Y, Song Y, Zhou D, Grissa I, Dentovskaya SV, Platonov ME, Rakin A, et al. Analysis of the three Yersinia pestis CRISPR loci provides new tools for phylogenetic studies and possibly for the investigation of ancient DNA. Adv. Exp. Med. Biol. 2007;603:327–338. [PubMed] 17. Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007;35:W52–W57. [PubMed] 18. van Embden JD, van Gorkom T, Kremer K, Jansen R, van Der Zeijst BA, Schouls LM. Genetic variation and evolutionary origin of the direct repeat locus of Mycobacterium tuberculosis complex bacteria. J. Bacteriol. 2000;182:2393–2401. [PubMed] 19. Camin J, Soakal R. A method for deducing branching sequences in phylogeny. Evolution. 1965;19:311–326. |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||
Biol Direct. 2006 Mar 16; 1():7.
[Biol Direct. 2006]Nat Rev Microbiol. 2008 Mar; 6(3):181-6.
[Nat Rev Microbiol. 2008]J Bacteriol. 2008 Feb; 190(4):1390-400.
[J Bacteriol. 2008]Mol Microbiol. 2002 Mar; 43(6):1565-75.
[Mol Microbiol. 2002]Microbiology. 2005 Mar; 151(Pt 3):653-63.
[Microbiology. 2005]Archaea. 2006 Aug; 2(1):59-72.
[Archaea. 2006]Science. 2007 Mar 23; 315(5819):1709-12.
[Science. 2007]Environ Microbiol. 2008 Jan; 10(1):200-7.
[Environ Microbiol. 2008]Mol Microbiol. 1993 Dec; 10(5):1057-65.
[Mol Microbiol. 1993]Microbiology. 2005 Mar; 151(Pt 3):653-63.
[Microbiology. 2005]Biotechnol J. 2007 Jul; 2(7):901-6.
[Biotechnol J. 2007]J Clin Microbiol. 2005 Apr; 43(4):1662-8.
[J Clin Microbiol. 2005]Emerg Infect Dis. 1999 Mar-Apr; 5(2):254-63.
[Emerg Infect Dis. 1999]Nucleic Acids Res. 2007 Jul; 35(Web Server issue):W52-7.
[Nucleic Acids Res. 2007]J Bacteriol. 2000 May; 182(9):2393-401.
[J Bacteriol. 2000]