Send to

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2000 Jul;16(7):628-38.

Object-oriented parsing of biological databases with Python.

Author information

  • 1European Molecular Biological Laboratory, Meyerhofstrasse 1, Postfach 10.2209, Heidelberg, Germany.



While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way.


We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for later analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.

[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire
    Loading ...
    Support Center