NCBI C++ ToolKit
Classes | Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | Protected Member Functions | Protected Attributes | List of all members
CSeqDB Class Reference

Search Toolkit Book for CSeqDB

CSeqDB. More...

#include <objtools/blast/seqdb_reader/seqdb.hpp>

+ Inheritance diagram for CSeqDB:
+ Collaboration diagram for CSeqDB:

Classes

struct  TOffsetPair
 Structure to represent a range. More...
 
struct  TSequenceRanges
 List of sequence offset ranges. More...
 

Public Types

enum  EOidListType { eOidList, eOidRange }
 Indicates how block of OIDs was returned. More...
 
enum  ESeqType { eProtein, eNucleotide, eUnknown }
 Sequence types (eUnknown tries protein, then nucleotide). More...
 
enum  ESummaryType { eUnfilteredAll, eFilteredAll, eFilteredRange }
 Types of summary information available. More...
 
enum  EMmapFileTypes { eMmap_IndexFile, eMmap_SequenceFile }
 File type for which mmap strategy may be set. More...
 
enum  EMmapStrategies { eMmap_Normal, eMmap_Sequential, eMmap_WillNeed }
 Permitted mmap strategies. More...
 
typedef TSeqDBAliasFileValues TAliasFileValues
 Import type to allow shorter name. More...
 
typedef int TOID
 Sequence type accepted and returned for OID indices. More...
 
typedef int TPIG
 Sequence type accepted and returned for PIG indices. More...
 
typedef TGi TGI
 Sequence type accepted and returned for GI indices. More...
 
typedef set< pair< int, int > > TRangeList
 List of sequence offset ranges. More...
 
- Public Types inherited from CObject
enum  EAllocFillMode { eAllocFillNone = 1, eAllocFillZero, eAllocFillPattern }
 Control filling of newly allocated memory. More...
 
typedef CObjectCounterLocker TLockerType
 Default locker type for CRef. More...
 
typedef CAtomicCounter TCounter
 Counter type is CAtomiCounter. More...
 
typedef TCounter::TValue TCount
 Alias for value type of counter. More...
 

Public Member Functions

 CSeqDB (const string &dbname, ESeqType seqtype, CSeqDBGiList *gilist=0)
 Short Constructor. More...
 
 CSeqDB (const string &dbname, ESeqType seqtype, CSeqDBNegativeList *nlist)
 Short Constructor with Negative ID list. More...
 
 CSeqDB (const string &dbname, ESeqType seqtype, CSeqDBIdSet ids)
 Short Constructor with Computed ID list. More...
 
 CSeqDB (const vector< string > &dbs, ESeqType seqtype, CSeqDBGiList *gilist=0)
 Short Constructor. More...
 
 CSeqDB (const string &dbname, ESeqType seqtype, int oid_begin, int oid_end, bool use_mmap, CSeqDBGiList *gi_list=0)
 Constructor with MMap Flag and OID Range. More...
 
 CSeqDB (const vector< string > &dbname, ESeqType seqtype, int oid_begin, int oid_end, bool use_mmap, CSeqDBGiList *gi_list=0)
 Constructor with MMap Flag and OID Range. More...
 
 ~CSeqDB ()
 Destructor. More...
 
int GetSeqLength (int oid) const
 Returns the sequence length in base pairs or residues. More...
 
TGi GetSeqGI (int oid) const
 Returns the first Gi (if any) of the sequence. More...
 
int GetSeqLengthApprox (int oid) const
 Returns an unbiased, approximate sequence length. More...
 
CRef< CBlast_def_line_setGetHdr (int oid) const
 Get the ASN.1 header for the sequence. More...
 
void GetLeafTaxIDs (int oid, map< TGi, set< int > > &gi_to_taxid_set, bool persist=false) const
 Get taxid for an OID. More...
 
void GetLeafTaxIDs (int oid, vector< int > &taxids, bool persist=false) const
 Get taxids for an OID. More...
 
void GetTaxIDs (int oid, map< TGi, int > &gi_to_taxid, bool persist=false) const
 Get taxid for an OID. More...
 
void GetTaxIDs (int oid, vector< int > &taxids, bool persist=false) const
 Get taxids for an OID. More...
 
CRef< CBioseqGetBioseq (int oid, TGi target_gi=ZERO_GI, const CSeq_id *target_seq_id=NULL) const
 Get a CBioseq for a sequence. More...
 
CRef< CBioseqGetBioseqNoData (int oid, TGi target_gi=ZERO_GI, const CSeq_id *target_seq_id=NULL) const
 Get a CBioseq for a sequence without sequence data. More...
 
int GetSequence (int oid, const char **buffer) const
 Get a pointer to raw sequence data. More...
 
int GetAmbigSeq (int oid, const char **buffer, int nucl_code) const
 Get a pointer to sequence data with ambiguities. More...
 
int GetAmbigSeq (int oid, const char **buffer, int nucl_code, int begin_offset, int end_offset) const
 Get a pointer to a range of sequence data with ambiguities. More...
 
int GetAmbigSeqAlloc (int oid, char **buffer, int nucl_code, ESeqDBAllocType strategy, TSequenceRanges *masks=NULL) const
 Get a pointer to sequence data with ambiguities. More...
 
void RetSequence (const char **buffer) const
 Returns any resources associated with the sequence. More...
 
void RetAmbigSeq (const char **buffer) const
 Returns any resources associated with the sequence. More...
 
list< CRef< CSeq_id > > GetSeqIDs (int oid) const
 Gets a list of sequence identifiers. More...
 
void GetGis (int oid, vector< TGi > &gis, bool append=false) const
 Gets a list of GIs for an OID. More...
 
ESeqType GetSequenceType () const
 Returns the type of database opened - protein or nucleotide. More...
 
string GetTitle () const
 Returns the database title. More...
 
string GetDate () const
 Returns the construction date of the database. More...
 
int GetNumSeqs () const
 Returns the number of sequences available. More...
 
int GetNumSeqsStats () const
 Returns the number of sequences available. More...
 
int GetNumOIDs () const
 Returns the size of the (possibly sparse) OID range. More...
 
Uint8 GetTotalLength () const
 Returns the sum of the lengths of all available sequences. More...
 
Uint8 GetExactTotalLength ()
 Returns the exact sum of the lengths of all available sequences. More...
 
Uint8 GetTotalLengthStats () const
 Returns the sum of the lengths of all available sequences. More...
 
Uint8 GetVolumeLength () const
 Returns the sum of the lengths of all volumes. More...
 
void GetTotals (ESummaryType sumtype, int *oid_count, Uint8 *total_length, bool use_approx=true) const
 Returns the sum of the sequence lengths. More...
 
int GetMaxLength () const
 Returns the length of the largest sequence in the database. More...
 
int GetMinLength () const
 Returns the length of the shortest sequence in the database. More...
 
CSeqDBIter Begin () const
 Returns a sequence iterator. More...
 
bool CheckOrFindOID (int &next_oid) const
 Find an included OID, incrementing next_oid if necessary. More...
 
EOidListType GetNextOIDChunk (int &begin_chunk, int &end_chunk, int oid_size, vector< int > &oid_list, int *oid_state=NULL)
 Return a chunk of OIDs, and update the OID bookmark. More...
 
void ResetInternalChunkBookmark ()
 Resets this object's internal chunk bookmark, which is used when the oid_state argument to GetNextOIDChunk is NULL. More...
 
const stringGetDBNameList () const
 Get list of database names. More...
 
const CSeqDBGiListGetGiList () const
 Get GI list attached to this database. More...
 
CSeqDBIdSet GetIdSet () const
 Get IdSet list attached to this database. More...
 
void SetMemoryBound (Uint8 membound, Uint8 slice_size=0)
 Set upper limit on memory and mapping slice size. More...
 
bool PigToOid (int pig, int &oid) const
 Translate a PIG to an OID. More...
 
bool OidToPig (int oid, int &pig) const
 Translate an OID to a PIG. More...
 
bool TiToOid (Int8 ti, int &oid) const
 Translate a TI to an OID. More...
 
bool OidToGi (int oid, TGi &gi) const
 Translate an OID to a GI. More...
 
bool GiToOid (TGi gi, int &oid) const
 Translate a GI to an OID. More...
 
bool GiToOidwFilterCheck (TGi gi, int &oid) const
 Translate a GI To an OID with filter check. More...
 
bool GiToPig (TGi gi, int &pig) const
 Translate a GI to a PIG. More...
 
bool PigToGi (int pig, TGi &gi) const
 Translate a PIG to a GI. More...
 
void AccessionToOids (const string &acc, vector< int > &oids) const
 Translate an Accession to a list of OIDs. More...
 
void SeqidToOids (const CSeq_id &seqid, vector< int > &oids) const
 Translate a Seq-id to a list of OIDs. More...
 
bool SeqidToOid (const CSeq_id &seqid, int &oid) const
 Translate a Seq-id to any matching OID. More...
 
int GetOidAtOffset (int first_seq, Uint8 residue) const
 Find the sequence closest to the given offset into the database. More...
 
CRef< CBioseqGiToBioseq (TGi gi) const
 Get a CBioseq for a given GI. More...
 
CRef< CBioseqPigToBioseq (int pig) const
 Get a CBioseq for a given PIG. More...
 
CRef< CBioseqSeqidToBioseq (const CSeq_id &seqid) const
 Get a CBioseq for a given Seq-id. More...
 
void FindVolumePaths (vector< string > &paths, bool recursive=true) const
 Find volume paths. More...
 
void SetIterationRange (int oid_begin, int oid_end)
 Set Iteration Range. More...
 
void GetAliasFileValues (TAliasFileValues &afv)
 Get Name/Value Data From Alias Files. More...
 
CRef< CSeq_dataGetSeqData (int oid, TSeqPos begin, TSeqPos end) const
 Fetch data as a CSeq_data object. More...
 
void GetSequenceAsString (int oid, CSeqUtil::ECoding coding, string &output, TSeqRange range=TSeqRange()) const
 Get a sequence in a given encoding. More...
 
void GetSequenceAsString (int oid, string &output, TSeqRange range=TSeqRange()) const
 Get a sequence in a readable text encoding. More...
 
void ListColumns (vector< string > &titles)
 List columns titles found in this database. More...
 
int GetColumnId (const string &title)
 Get an ID number for a given column title. More...
 
const map< string, string > & GetColumnMetaData (int column_id)
 Get all metadata for the specified column. More...
 
const stringGetColumnValue (int column_id, const string &key)
 Look up the value for a specific column metadata key. More...
 
const map< string, string > & GetColumnMetaData (int column_id, const string &volname)
 Get all metadata for the specified column. More...
 
void GetColumnBlob (int col_id, int oid, CBlastDbBlob &blob)
 Fetch the data blob for the given column and oid. More...
 
void GetAvailableMaskAlgorithms (vector< int > &algorithms)
 Get a list of algorithm IDs for which mask data exists. More...
 
int GetMaskAlgorithmId (const string &algo_name) const
 Get the numeric algorithm ID for a string. More...
 
string GetAvailableMaskAlgorithmDescriptions ()
 Returns a formatted string with the list of available masking algorithms in this database for display purposes (i.e. More...
 
vector< intValidateMaskAlgorithms (const vector< int > &algorithm_ids)
 Validates the algorithm IDs passed to this function, returning a vector of those algorithm IDs not present in this object. More...
 
void GetMaskAlgorithmDetails (int algorithm_id, objects::EBlast_filter_program &program, string &program_name, string &algo_opts)
 Get information about one type of masking available here. More...
 
void GetMaskAlgorithmDetails (int algorithm_id, string &program, string &program_name, string &algo_opts)
 
void GetMaskData (int oid, const vector< int > &algo_ids, TSequenceRanges &ranges)
 Get masked ranges of a sequence. More...
 
void GetMaskData (int oid, int algo_id, TSequenceRanges &ranges)
 Get masked ranges of a sequence. More...
 
void GarbageCollect (void)
 Invoke the garbage collector to free up memory. More...
 
void SetOffsetRanges (int oid, const TRangeList &offset_ranges, bool append_ranges, bool cache_data)
 Apply a range of offsets to a database sequence. More...
 
void RemoveOffsetRanges (int oid)
 Remove any offset ranges for the given OID. More...
 
void FlushOffsetRangeCache ()
 Flush all offset ranges cached. More...
 
void SetNumberOfThreads (int num_threads, bool force_mt=false)
 Setting the number of threads. More...
 
Int8 GetSliceSize () const
 Retrieve the current slice size used for mmap. More...
 
Int8 GetDiskUsage () const
 Retrieve the disk usage in bytes for this BLAST database. More...
 
void SetVolsMemBit (int mbit)
 Set the membership of all volumes. More...
 
void DebugDump (CDebugDumpContext ddc, unsigned int depth) const
 Dump debug information for this object. More...
 
- Public Member Functions inherited from CObject
 CObject (void)
 Constructor. More...
 
 CObject (const CObject &src)
 Copy constructor. More...
 
virtual ~CObject (void)
 Destructor. More...
 
CObjectoperator= (const CObject &src) THROWS_NONE
 Assignment operator. More...
 
bool CanBeDeleted (void) const THROWS_NONE
 Check if object can be deleted. More...
 
bool IsAllocatedInPool (void) const THROWS_NONE
 Check if object is allocated in memory pool (not system heap) More...
 
bool Referenced (void) const THROWS_NONE
 Check if object is referenced. More...
 
bool ReferencedOnlyOnce (void) const THROWS_NONE
 Check if object is referenced only once. More...
 
void AddReference (void) const
 Add reference to object. More...
 
void RemoveReference (void) const
 Remove reference to object. More...
 
void ReleaseReference (void) const
 Remove reference without deleting object. More...
 
virtual void DoNotDeleteThisObject (void)
 Mark this object as not allocated in heap – do not delete this object. More...
 
virtual void DoDeleteThisObject (void)
 Mark this object as allocated in heap – object can be deleted. More...
 
void * operator new (size_t size)
 Define new operator for memory allocation. More...
 
void * operator new[] (size_t size)
 Define new[] operator for 'array' memory allocation. More...
 
void operator delete (void *ptr)
 Define delete operator for memory deallocation. More...
 
void operator delete[] (void *ptr)
 Define delete[] operator for memory deallocation. More...
 
void * operator new (size_t size, void *place)
 Define new operator. More...
 
void operator delete (void *ptr, void *place)
 Define delete operator. More...
 
void * operator new (size_t size, CObjectMemoryPool *place)
 Define new operator using memory pool. More...
 
void operator delete (void *ptr, CObjectMemoryPool *place)
 Define delete operator. More...
 
- Public Member Functions inherited from CDebugDumpable
 CDebugDumpable (void)
 
virtual ~CDebugDumpable (void)
 
void DebugDumpText (ostream &out, const string &bundle, unsigned int depth) const
 
void DebugDumpFormat (CDebugDumpFormatter &ddf, const string &bundle, unsigned int depth) const
 
void DumpToConsole (void) const
 

Static Public Member Functions

static string ESeqType2String (ESeqType type)
 Converts a CSeqDB sequence type into a human readable string. More...
 
static void SetMmapStrategy (EMmapFileTypes filetype, EMmapStrategies strategy)
 Sets mmap strategy to be used when mapping index or sequence files. More...
 
static string GenerateSearchPath ()
 Returns the default BLAST database search path configured for this local installation of BLAST. More...
 
static CRef< CBlast_def_line_setExtractBlastDefline (const CBioseq &bioseq)
 Extract a Blast-def-line-set object from a Bioseq retrieved by CSeqDB. More...
 
static CRef< CBlast_def_line_setExtractBlastDefline (const CBioseq_Handle &handle)
 Extract a Blast-def-line-set object from a Bioseq_Handle retrieved by CSeqDB. More...
 
static CTime GetDate (const string &dbname, ESeqType seqtype)
 Returns the construction date of the database. More...
 
static void FindVolumePaths (const string &dbname, ESeqType seqtype, vector< string > &paths, vector< string > *alias_paths=NULL, bool recursive=true, bool expand_links=true)
 Find volume paths. More...
 
static void GetTaxInfo (int taxid, SSeqDBTaxInfo &info)
 Get taxonomy information. More...
 
static void SetDefaultMemoryBound (Uint8 bytes)
 Set global default memory bound for SeqDB. More...
 
- Static Public Member Functions inherited from CObject
static NCBI_NORETURN void ThrowNullPointerException (void)
 Define method to throw null pointer exception. More...
 
static NCBI_NORETURN void ThrowNullPointerException (const type_info &type)
 
static EAllocFillMode GetAllocFillMode (void)
 
static void SetAllocFillMode (EAllocFillMode mode)
 
static void SetAllocFillMode (const string &value)
 Set mode from configuration parameter value. More...
 
- Static Public Member Functions inherited from CDebugDumpable
static void EnableDebugDump (bool on)
 

Static Public Attributes

static const string kOidNotFound
 String containing the error message in exceptions thrown when a given OID cannot be found. More...
 
static const char * kBlastDbDateFormat = "b d, Y H:m P"
 Format string for the date returned by CSeqDB::GetDate. More...
 
- Static Public Attributes inherited from CObject
static const TCount eCounterBitsCanBeDeleted = 1 << 0
 Define possible object states. More...
 
static const TCount eCounterBitsInPlainHeap = 1 << 1
 Heap signature was found. More...
 
static const TCount eCounterBitsPlaceMask
 Mask for 'in heap' state flags. More...
 
static const int eCounterStep = 1 << 2
 Skip over the "in heap" bits. More...
 
static const TCount eCounterValid = TCount(1) << (sizeof(TCount) * 8 - 2)
 Minimal value for valid objects (reference counter is zero) Must be a single bit value. More...
 
static const TCount eCounterStateMask
 Valid object, and object in heap. More...
 

Protected Member Functions

 CSeqDB ()
 No-argument Constructor. More...
 
- Protected Member Functions inherited from CObject
virtual void DeleteThis (void)
 Virtual method "deleting" this object. More...
 

Protected Attributes

class CSeqDBImplm_Impl
 Implementation details are hidden. (See seqdbimpl.hpp). More...
 

Detailed Description

CSeqDB.

User interface class for blast databases.

This class provides the top-level interface class for BLAST database users. It defines access to the database component by calling methods on objects which represent the various database files, such as the index, header, sequence, and alias files.

Definition at line 159 of file seqdb.hpp.

Member Typedef Documentation

Import type to allow shorter name.

Definition at line 162 of file seqdb.hpp.

typedef TGi CSeqDB::TGI

Sequence type accepted and returned for GI indices.

Definition at line 220 of file seqdb.hpp.

typedef int CSeqDB::TOID

Sequence type accepted and returned for OID indices.

Definition at line 214 of file seqdb.hpp.

typedef int CSeqDB::TPIG

Sequence type accepted and returned for PIG indices.

Definition at line 217 of file seqdb.hpp.

typedef set< pair<int, int> > CSeqDB::TRangeList

List of sequence offset ranges.

Definition at line 1410 of file seqdb.hpp.

Member Enumeration Documentation

File type for which mmap strategy may be set.

Enumerator
eMmap_IndexFile 

Index files (name ends with ".pin" or ".nin").

eMmap_SequenceFile 

Sequence files (name ends with ".psq" or ".nsq").

Definition at line 193 of file seqdb.hpp.

Permitted mmap strategies.

Enumerator
eMmap_Normal 

Normal, no special behavior (should undo next two options).

eMmap_Sequential 

Expect sequential page references.

eMmap_WillNeed 

Expect access in the near future.

Definition at line 202 of file seqdb.hpp.

Indicates how block of OIDs was returned.

Enumerator
eOidList 
eOidRange 

Definition at line 165 of file seqdb.hpp.

Sequence types (eUnknown tries protein, then nucleotide).

Enumerator
eProtein 
eNucleotide 
eUnknown 

Definition at line 171 of file seqdb.hpp.

Types of summary information available.

Enumerator
eUnfilteredAll 

Sum of all sequences, ignoring GI and OID lists and alias files.

eFilteredAll 

Values from alias files, or summation over all included sequences.

eFilteredRange 

Sum of included sequences with OIDs within the iteration range.

Definition at line 181 of file seqdb.hpp.

Constructor & Destructor Documentation

CSeqDB::CSeqDB ( const string dbname,
ESeqType  seqtype,
CSeqDBGiList gilist = 0 
)

Short Constructor.

This version of the constructor assumes memory mapping and that the entire possible OID range will be included. Please use quotes ("") around database names that contains space characters.

Parameters
dbnameA list of database or alias names, seperated by spaces
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
gilistThe database will be filtered by this GI list if non-null.

Definition at line 155 of file seqdb.cpp.

References m_Impl, NCBI_THROW, s_GetSeqTypeChar(), s_SeqDBInit(), and CSeqDBImpl::Verify().

CSeqDB::CSeqDB ( const string dbname,
ESeqType  seqtype,
CSeqDBNegativeList nlist 
)

Short Constructor with Negative ID list.

This version of the constructor assumes the entire OID range will be included, and applies filtering by a negative ID list. Please use quotes ("") around database names that contains space characters.

Parameters
dbnameA list of database or alias names, seperated by spaces
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
nlistThe database will be filtered to not include these GIs or TIs.

Definition at line 177 of file seqdb.cpp.

References m_Impl, NCBI_THROW, NULL, s_GetSeqTypeChar(), s_SeqDBInit(), and CSeqDBImpl::Verify().

CSeqDB::CSeqDB ( const string dbname,
ESeqType  seqtype,
CSeqDBIdSet  ids 
)

Short Constructor with Computed ID list.

This version of the constructor takes a computed CSeqDBIdSet list which can be positive or negative. This is equivalent to building a positive or negative list from the IdSet object and and passing it into one of the previous constructors.

Parameters
dbnameA list of database or alias names, seperated by spaces
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
idsThe database will be filtered by this set of IDs.

Definition at line 213 of file seqdb.cpp.

References CSeqDBIdSet::Blank(), CSeqDBIdSet::GetNegativeList(), CRef< C, Locker >::GetPointerOrNull(), CSeqDBIdSet::GetPositiveList(), CSeqDBIdSet::IsPositive(), m_Impl, NCBI_THROW, s_GetSeqTypeChar(), s_SeqDBInit(), and CSeqDBImpl::Verify().

CSeqDB::CSeqDB ( const vector< string > &  dbs,
ESeqType  seqtype,
CSeqDBGiList gilist = 0 
)

Short Constructor.

This version of the constructor assumes memory mapping and that the entire possible OID range will be included.

Parameters
dbsA list of database or alias names.
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
gilistThe database will be filtered by this GI list if non-null.

Definition at line 244 of file seqdb.cpp.

References dbname(), m_Impl, NCBI_THROW, s_GetSeqTypeChar(), s_SeqDBInit(), SeqDB_CombineAndQuote(), and CSeqDBImpl::Verify().

CSeqDB::CSeqDB ( const string dbname,
ESeqType  seqtype,
int  oid_begin,
int  oid_end,
bool  use_mmap,
CSeqDBGiList gi_list = 0 
)

Constructor with MMap Flag and OID Range.

If the oid_end value is specified as zero, or as a value larger than the number of OIDs, it will be adjusted to the number of OIDs in the database. Specifying 0,0 for the start and end will cause inclusion of the entire database. This version of the constructor is obsolete because the sequence type is specified as a character (eventually only the ESeqType version will exist). Please use quotes ("") around database names that contains space characters.

Parameters
dbnameA list of database or alias names, seperated by spaces.
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
oid_beginIterator will skip OIDs less than this value. Only OIDs found in the OID lists (if any) will be returned.
oid_endIterator will return up to (but not including) this OID.
use_mmapIf kSeqDBMMap is specified (the default), memory mapping is attempted. If kSeqDBNoMMap is specified, or memory mapping fails, this platform does not support it, the less efficient read and write calls are used instead.
gi_listThe database will be filtered by this GI list if non-null.

Definition at line 267 of file seqdb.cpp.

References m_Impl, NCBI_THROW, s_GetSeqTypeChar(), s_SeqDBInit(), and CSeqDBImpl::Verify().

CSeqDB::CSeqDB ( const vector< string > &  dbname,
ESeqType  seqtype,
int  oid_begin,
int  oid_end,
bool  use_mmap,
CSeqDBGiList gi_list = 0 
)

Constructor with MMap Flag and OID Range.

If the oid_end value is specified as zero, or as a value larger than the number of OIDs, it will be adjusted to the number of OIDs in the database. Specifying 0,0 for the start and end will cause inclusion of the entire database. This version of the constructor is obsolete because the sequence type is specified as a character (eventually only the ESeqType version will exist).

Parameters
dbnameA list of database or alias names.
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
oid_beginIterator will skip OIDs less than this value. Only OIDs found in the OID lists (if any) will be returned.
oid_endIterator will return up to (but not including) this OID.
use_mmapIf kSeqDBMMap is specified (the default), memory mapping is attempted. If kSeqDBNoMMap is specified, or memory mapping fails, this platform does not support it, the less efficient read and write calls are used instead.
gi_listThe database will be filtered by this GI list if non-null.

Definition at line 290 of file seqdb.cpp.

References dbname(), m_Impl, NCBI_THROW, s_GetSeqTypeChar(), s_SeqDBInit(), SeqDB_CombineAndQuote(), and CSeqDBImpl::Verify().

CSeqDB::~CSeqDB ( )

Destructor.

This will return resources acquired by this object, including any gotten by the GetSequence() call, whether or not they have been returned by RetSequence().

Definition at line 635 of file seqdb.cpp.

References m_Impl, and CSeqDBImpl::Verify().

CSeqDB::CSeqDB ( )
protected

No-argument Constructor.

This version of the constructor is used as an extension by the 'expert' interface in seqdbexpert.hpp.

Definition at line 316 of file seqdb.cpp.

References m_Impl, and CSeqDBImpl::Verify().

Member Function Documentation

void CSeqDB::AccessionToOids ( const string acc,
vector< int > &  oids 
) const
CSeqDBIter CSeqDB::Begin ( void  ) const

Returns a sequence iterator.

This gets an iterator designed to allow traversal of the database from beginning to end.

Definition at line 643 of file seqdb.cpp.

Referenced by BOOST_AUTO_TEST_CASE().

bool CSeqDB::CheckOrFindOID ( int next_oid) const
void CSeqDB::DebugDump ( CDebugDumpContext  ddc,
unsigned int  depth 
) const
virtual

Dump debug information for this object.

See also
CDebugDumpable

Reimplemented from CObject.

Definition at line 1486 of file seqdb.cpp.

References CObject::DebugDump(), CDebugDumpContext::Log(), m_Impl, and CDebugDumpContext::SetFrame().

string CSeqDB::ESeqType2String ( ESeqType  type)
static

Converts a CSeqDB sequence type into a human readable string.

Definition at line 1266 of file seqdb.cpp.

References eNucleotide, eProtein, and eUnknown.

Referenced by CBlastDbMetadata::GetMoleculeType().

CRef< CBlast_def_line_set > CSeqDB::ExtractBlastDefline ( const CBioseq bioseq)
static
CRef< CBlast_def_line_set > CSeqDB::ExtractBlastDefline ( const CBioseq_Handle handle)
static

Extract a Blast-def-line-set object from a Bioseq_Handle retrieved by CSeqDB.

Parameters
bioseqBioseq retrieved from CSeqDB [in]

Definition at line 1203 of file seqdbvol.cpp.

References s_ExtractBlastDefline().

void CSeqDB::FindVolumePaths ( const string dbname,
ESeqType  seqtype,
vector< string > &  paths,
vector< string > *  alias_paths = NULL,
bool  recursive = true,
bool  expand_links = true 
)
static

Find volume paths.

Find the base names of all volumes (and alias nodes). This method builds an alias hierarchy (which should be much faster than constructing an entire CSeqDB object), and returns the resolved volume/alias file base names from that hierarchy.

Parameters
dbnameThe input name of the database
seqtypeSpecify eProtein, eNucleotide, or eUnknown.
pathsThe set of resolved database volume file names
alias_pathsThe set of resolved database alias file names
recursiveIf true, the search will traverse the full alias node tree
expand_linksIf true, the search will expand the soft links

Definition at line 968 of file seqdb.cpp.

References eNucleotide, eProtein, and CSeqDBImpl::FindVolumePaths().

Referenced by BOOST_AUTO_TEST_CASE(), CheckForFreqRatioFile(), CLocalRPSBlast::CLocalRPSBlast(), DeleteBlastDb(), CMetaDataTest::DoTest(), CIndexedDb_New::EnumerateDbVolumes(), GetDate(), GetDiskUsage(), CProfileData::Load(), CMkIndexApplication::Run(), CSeqdb2CreateApplication::Run(), CSeqDBDemo_Threaded::Run(), s_MapDbToThread(), CDbTest::x_GetVolumeList(), CDirTest::x_GetVolumeList(), CBlastRPSInfo::x_Init(), CBlastDBCmdApp::x_PrintBlastDatabaseInformation(), BlastdbCopyApplication::x_ShouldCopyPIGs(), and BlastdbCopyApplication::x_ShouldParseSeqIds().

void CSeqDB::FindVolumePaths ( vector< string > &  paths,
bool  recursive = true 
) const

Find volume paths.

Find the base names of all volumes. This method returns the resolved base names of all referenced blast database volumes.

Parameters
pathsThe returned set of resolved database path names
recursiveIf true, the search will traverse the full alias node tree

Definition at line 990 of file seqdb.cpp.

References CSeqDBImpl::FindVolumePaths(), m_Impl, and CSeqDBImpl::Verify().

void CSeqDB::FlushOffsetRangeCache ( )

Flush all offset ranges cached.

Definition at line 1254 of file seqdb.cpp.

References CSeqDBImpl::FlushOffsetRangeCache(), and m_Impl.

Referenced by s_SeqDbResetChunkIterator().

void CSeqDB::GarbageCollect ( void  )

Invoke the garbage collector to free up memory.

Definition at line 1228 of file seqdb.cpp.

References CSeqDBImpl::GarbageCollect(), and m_Impl.

Referenced by CSeqDbSeqInfoSrc::GarbageCollect().

string CSeqDB::GenerateSearchPath ( )
static

Returns the default BLAST database search path configured for this local installation of BLAST.

Definition at line 1278 of file seqdb.cpp.

References CSeqDBAtlas::GenerateSearchPath().

Referenced by CBlastDBCmdApp::Run().

void CSeqDB::GetAliasFileValues ( TAliasFileValues afv)

Get Name/Value Data From Alias Files.

SeqDB treats each alias file as a map from a variable name to a value. This method will return a map from the basename of the filename of each alias file, to a vector of maps from variable name to value for each entry in that file. For example, the value of the "DBLIST" entry in the "wgs.nal" file would be values["wgs"][0]["DBLIST"]. The lines returned have been processed somewhat by SeqDB, including normalizing tabs to whitespace, trimming leading and trailing whitespace, and removal of comments and other non-value lines. Care should be taken when using the values returned by this method. SeqDB uses an internal "virtual" alias file entry, which maps from a filename of "-" and contains a single entry mapping "DBLIST" to SeqDB's database name input. This entry is the root of the alias file inclusion tree. Also note that alias files that appear in several places in the alias file inclusion tree may be different – SeqDB's internal editing distributes GI lists over sub-alias files, which is why the value type of the returned data is a vector.

Parameters
afvThe alias file contents will be returned here.

Definition at line 1026 of file seqdb.cpp.

References CSeqDBImpl::GetAliasFileValues(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

int CSeqDB::GetAmbigSeq ( int  oid,
const char **  buffer,
int  nucl_code 
) const

Get a pointer to sequence data with ambiguities.

In the protein case, this is identical to GetSequence(). In the nucleotide case, it stores 2 bases per byte instead of 4. The third parameter indicates the encoding for nucleotide data, either kSeqDBNuclNcbiNA8 or kSeqDBNuclBlastNA8, ignored if the sequence is a protein sequence. When done, resources should be returned with RetSequence.

Parameters
oidThe ordinal id of the sequence.
bufferA returned pointer to the data in the sequence.
nucl_codeThe encoding to use for the returned sequence data.
Returns
The return value is the sequence length (in base pairs or residues). In case of an error, an exception is thrown.

Definition at line 488 of file seqdb.cpp.

References CSeqDBImpl::GetAmbigSeq(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CLocalBlastDbAdapter::GetSequence(), GetSequenceAsString(), s_TestPartialAmbigRange(), and CSeqDBDemo_Thread::x_UseOID().

int CSeqDB::GetAmbigSeq ( int  oid,
const char **  buffer,
int  nucl_code,
int  begin_offset,
int  end_offset 
) const

Get a pointer to a range of sequence data with ambiguities.

This is like GetAmbigSeq(), but only a range of the sequence data is computed and returned. When done, resources should be returned with RetSequence.

Parameters
oidThe ordinal id of the sequence.
bufferA returned pointer to the data in the sequence.
nucl_codeThe encoding to use for the returned sequence data.
begin_offsetThe zero-based offset at which to start translating.
end_offsetThe zero-based offset at which to end translation.
Returns
The return value is the subsequence length (in base pairs or residues). In case of an error, an exception is thrown.

Definition at line 508 of file seqdb.cpp.

References CSeqDBImpl::GetAmbigSeq(), m_Impl, and CSeqDBImpl::Verify().

int CSeqDB::GetAmbigSeqAlloc ( int  oid,
char **  buffer,
int  nucl_code,
ESeqDBAllocType  strategy,
TSequenceRanges masks = NULL 
) const

Get a pointer to sequence data with ambiguities.

This is like GetAmbigSeq(), but the allocated object should be deleted by the caller. This is intended for users who are going to modify the sequence data, or are going to mix the data into a container with other data, and who are mixing data from multiple sources and want to free the data in the same way. The fourth parameter should be given one of the values from EAllocStrategy; the corresponding method should be used to delete the object. Note that "delete[]" should be used instead of "delete"

Parameters
oidOrdinal ID.
bufferAddress of a char pointer to access the sequence data.
nucl_codeThe NA encoding, kSeqDBNuclNcbiNA8 or kSeqDBNuclBlastNA8.
strategyIndicate which allocation strategy to use.
masksIf not empty, the return sequence will be (hard) masked. Masks are cleared on return.
Returns
The return value is the sequence length (in base pairs or residues). In case of an error, an exception is thrown.

Definition at line 529 of file seqdb.cpp.

References eMalloc, eNew, CSeqDBImpl::GetAmbigSeq(), m_Impl, NCBI_THROW, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), and s_SeqDbGetSequence().

string CSeqDB::GetAvailableMaskAlgorithmDescriptions ( )

Returns a formatted string with the list of available masking algorithms in this database for display purposes (i.e.

: help)

Definition at line 1170 of file seqdb.cpp.

References CSeqDBImpl::GetAvailableMaskAlgorithmDescriptions(), and m_Impl.

Referenced by CSequenceIStreamBlastDB::CSequenceIStreamBlastDB(), CSequenceIStreamBlastDB::ShowSupportedFilters(), and CBlastDBCmdApp::x_PrintBlastDatabaseInformation().

void CSeqDB::GetAvailableMaskAlgorithms ( vector< int > &  algorithms)

Get a list of algorithm IDs for which mask data exists.

Multiple sources of masking data may be used when building blast databases. This method retrieves a list of the IDs used to identify those types of filtering data to SeqDB. If the blast database volumes used by this instance of SeqDB were built with conflicting algorithm ID definitions, SeqDB will resolve the conflicts by renumbering some of the conflicting descriptions. For this reason, the IDs reported here may not match what was given to WriteDB when the database was created.

Parameters
algorithmsList of algorithm ids. [out]

Definition at line 1160 of file seqdb.cpp.

References CSeqDBImpl::GetAvailableMaskAlgorithms(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CRawSeqDBSource::CRawSeqDBSource(), s_SeqDbSrcNew(), ValidateMaskAlgorithms(), and CSearchDatabase::x_ValidateMaskingAlgorithm().

CRef< CBioseq > CSeqDB::GetBioseq ( int  oid,
TGi  target_gi = ZERO_GI,
const CSeq_id target_seq_id = NULL 
) const

Get a CBioseq for a sequence.

This builds and returns the header and sequence data corresponding to the indicated sequence as a CBioseq. If target_gi is non-zero or target_seq_id is non-null, the header information will be filtered to only include the defline associated with that gi/seq_id.

Parameters
oidThe ordinal id of the sequence.
target_giIf nonzero, the target gi to filter the header information by.
target_seq_idThe target seq_id to filter the header information by.
Returns
A CBioseq object corresponding to the sequence.

Definition at line 442 of file seqdb.cpp.

References CSeqDBImpl::GetBioseq(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CBlastDbBioseqSource::CBlastDbBioseqSource(), CSeqFormatter::DumpAll(), CMaskBDBReader::GetNextSequence(), CSequenceIStreamBlastDB::next(), s_CheckIdLookup(), s_DupIdsBioseq(), s_TestDatabase(), CBlastDBExtractor::SetSeqId(), CTestAction::TestOID(), CBlastDB_FastaFormatter::Write(), CBlastDB_BioseqFormatter::Write(), and CSearch< LEGACY, NHITS >::WriteBioseqs().

CRef< CBioseq > CSeqDB::GetBioseqNoData ( int  oid,
TGi  target_gi = ZERO_GI,
const CSeq_id target_seq_id = NULL 
) const

Get a CBioseq for a sequence without sequence data.

This builds and returns the data corresponding to the indicated sequence as a CBioseq, but without the sequence data. It is used when processing large sequences, to avoid accessing unused parts of the sequence.

Parameters
oidThe ordinal id of the sequence.
target_giIf nonzero, the target gi to filter the header information by.
target_seq_idThe target seq_id to filter the header information by.
Returns
A CBioseq object corresponding to the sequence, but without sequence data.

Definition at line 452 of file seqdb.cpp.

References CSeqDBImpl::GetBioseq(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CLocalBlastDbAdapter::GetBioseqNoData(), and CBlastDBExtractor::SetSeqId().

void CSeqDB::GetColumnBlob ( int  col_id,
int  oid,
CBlastDbBlob blob 
)

Fetch the data blob for the given column and oid.

Parameters
col_idThe column to fetch data from. [in]
oidThe OID of the blob. [in]
blobThe data will be returned here. [out]

Definition at line 1153 of file seqdb.cpp.

References CSeqDBImpl::GetColumnBlob(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), and CRawSeqDBSource::GetNext().

int CSeqDB::GetColumnId ( const string title)

Get an ID number for a given column title.

For a given column title, this returns an ID that can be used to access that column in the future. The returned ID number is specific to this instance of SeqDB. If the database does not have a column with this name, -1 will be returned.

Parameters
titleColumn title to search for. [in]
Returns
Column ID number for this column, or -1. [in]

Definition at line 1129 of file seqdb.cpp.

References CSeqDBImpl::GetColumnId(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CRawSeqDBSource::CRawSeqDBSource(), and CRawSeqDBSource::GetColumnId().

const map< string, string > & CSeqDB::GetColumnMetaData ( int  column_id)

Get all metadata for the specified column.

Columns may contain user-defined metadata as a list of key-value pairs. For the specified column, this returns that column's metadata in the provided map. If multiple volumes are present, and they define contradictory meta data (this is more common when multiple databases are opened at once), this method returns the first value it finds for each metadata key. If this is unsatisfactory, the two-argument version of this method may be used to get more precise values for specific volumes.

Parameters
column_idThe column id from GetColumnId. [in]
Returns
The map of metadata for this column. [out]

Definition at line 1135 of file seqdb.cpp.

References CSeqDBImpl::GetColumnMetaData(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CRawSeqDBSource::GetColumnMetaData(), and GetColumnValue().

const map< string, string > & CSeqDB::GetColumnMetaData ( int  column_id,
const string volname 
)

Get all metadata for the specified column.

Columns may contain user-defined metadata as a list of key-value pairs. For the specified database volume and column id, this returns that column's metadata (as defined for that volume) in the provided map. The volume name should match the string returned by FindVolumePaths(vector<string>&).

Parameters
column_idThe column id from GetColumnId. [in]
volnameThe volume to get metadata for. [in]
Returns
The map of metadata for this column + volume. [out]

Definition at line 1147 of file seqdb.cpp.

References CSeqDBImpl::GetColumnMetaData(), and m_Impl.

const string & CSeqDB::GetColumnValue ( int  column_id,
const string key 
)

Look up the value for a specific column metadata key.

Columns can contain user-defined metadata as a list of key-value pairs. For the specified column, this returns the value associated with one particular key.

Parameters
column_idThe column id from GetColumnId. [in]
Returns
The value corresponding to the specified key. [out]

Definition at line 1140 of file seqdb.cpp.

References GetColumnMetaData(), and SeqDB_MapFind().

Referenced by BOOST_AUTO_TEST_CASE().

string CSeqDB::GetDate ( void  ) const

Returns the construction date of the database.

This is encoded in the database. If multiple databases or multiple volumes were accessed, the latest date will be used.

Definition at line 555 of file seqdb.cpp.

References CSeqDBImpl::GetDate(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CMetaDataTest::DoTest(), CBlastDbMetadata::GetDate(), s_FillDbInfoLocally(), CBuildDatabase::SetSourceDb(), and CBlastDBCmdApp::x_PrintBlastDatabaseInformation().

CTime CSeqDB::GetDate ( const string dbname,
ESeqType  seqtype 
)
static

Returns the construction date of the database.

Parameters
dbnameThe database name.
seqtypeThe type of database (nucleotide or protein)
Returns
The latest date

Definition at line 561 of file seqdb.cpp.

References eProtein, f(), FindVolumePaths(), in(), CTime::IsEmpty(), ITERATE, offset(), and SeqDB_GetStdOrd().

const string & CSeqDB::GetDBNameList ( ) const
Int8 CSeqDB::GetDiskUsage ( ) const

Retrieve the disk usage in bytes for this BLAST database.

Definition at line 1409 of file seqdb.cpp.

References _ASSERT, eProtein, ERR_POST, Error(), CFile::Exists(), file, FindVolumePaths(), CFile::GetLength(), CDirEntry::GetPath(), GetSequenceType(), ITERATE, and SeqDB_GetFileExtensions().

Referenced by BOOST_AUTO_TEST_CASE(), and CBlastDbMetadata::GetDiskUsage().

Uint8 CSeqDB::GetExactTotalLength ( )

Returns the exact sum of the lengths of all available sequences.

Calling this function may trigger a complete db scan if the total length of a db cannot be determined without iterating thorugh the sequences i.e. a db with gi list

Definition at line 610 of file seqdb.cpp.

References CSeqDBImpl::GetExactTotalLength(), and m_Impl.

Referenced by CBlastDBCmdApp::x_PrintBlastDatabaseInformation().

const CSeqDBGiList * CSeqDB::GetGiList ( ) const

Get GI list attached to this database.

This returns the GI list attached to this database, or NULL, if no GI list was used. The effects of changing the contents of this GI list are undefined. This method only deals with the GI list passed to the top level CSeqDB constructor; it does not consider volume GI lists.

Returns
A pointer to the attached GI list, or NULL.

Definition at line 1048 of file seqdb.cpp.

References CSeqDBImpl::GetGiList(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDbSeqInfoSrc::HasGiList(), and s_GetFilteredRedundantGis().

void CSeqDB::GetGis ( int  oid,
vector< TGi > &  gis,
bool  append = false 
) const

Gets a list of GIs for an OID.

This returns the GIs associated with the sequence specified by the given OID. If append is true, gis will be appended to the end of the provided vector; otherwise the vector will be emptied first.

Parameters
oidThe oid of the sequence.
gisThe returned list of gis.
appendSpecify true to append to gis, keeping existing elements.

Definition at line 998 of file seqdb.cpp.

References GetSeqIDs(), ITERATE, m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), and s_MapAllGis().

CRef< CBlast_def_line_set > CSeqDB::GetHdr ( int  oid) const

Get the ASN.1 header for the sequence.

Do not modify the object returned here (e.g. by removing some of the deflines), as the object is cached internally and future operations on this OID may be affected.

Parameters
oidThe ordinal ID of the sequence.
Returns
The blast deflines for this sequence.

Definition at line 362 of file seqdb.cpp.

References CSeqDBImpl::GetHdr(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CRawSeqDBSource::GetNext(), s_DupIdsBioseq(), s_DupIdsRaw(), CSearch< LEGACY, NHITS >::SetResult(), CBlastDB_SeqFormatter::Write(), CBuildDatabase::x_DupLocal(), and CBlastDBExtractor::x_InitDefline().

CSeqDBIdSet CSeqDB::GetIdSet ( ) const

Get IdSet list attached to this database.

This returns the ID set used to filter this database. If a CSeqDBGiList or CSeqDBNegativeList was used instead, then an ID set object will be constructed and returned (and cached here). This method only deals with filtering applied to the top level CSeqDB constructor; it does not consider GI or TI lists attached from alias files. If no filtering was used, a 'blank' list will be returned (an empty negative list).

Returns
A pointer to the attached ID set, or NULL.

Definition at line 1053 of file seqdb.cpp.

References CSeqDBImpl::GetIdSet(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), and s_SeqDbGetSequence().

void CSeqDB::GetLeafTaxIDs ( int  oid,
map< TGi, set< int > > &  gi_to_taxid_set,
bool  persist = false 
) const

Get taxid for an OID.

This finds the leaf-node TAXIDS associated with a given OID and computes a mapping from GI to taxid. This mapping is added to the map<int,set<int>> provided by the user. If the "persist" flag is set to true, the new associations will simply be added to the map. If it is false (the default), the map will be cleared first.

Parameters
oidThe ordinal id of the sequence.
gi_to_taxid_setA returned mapping from GI to set of taxids.
persistIf false, the map will be cleared before adding new entries.

Definition at line 411 of file seqdb.cpp.

References CSeqDBImpl::GetLeafTaxIDs(), ITERATE, m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

void CSeqDB::GetLeafTaxIDs ( int  oid,
vector< int > &  taxids,
bool  persist = false 
) const

Get taxids for an OID.

This finds the leaf-node TAXIDS associated with a given OID and returns them in a vector. If the "persist" flag is set to true, the new taxids will simply be appended to the vector. If it is false (the default), the vector will be cleared first. One advantage of this interface over the map<int,set<int>> version is that the vector interface works with databases with local IDs but lacking GIs.

Parameters
oidThe ordinal id of the sequence.
taxidsA returned vector of taxids.
persistIf false, the map will be cleared before adding new entries.

Definition at line 430 of file seqdb.cpp.

References CSeqDBImpl::GetLeafTaxIDs(), m_Impl, and CSeqDBImpl::Verify().

void CSeqDB::GetMaskAlgorithmDetails ( int  algorithm_id,
objects::EBlast_filter_program program,
string program_name,
string algo_opts 
)

Get information about one type of masking available here.

For a given algorithm_id, this method fetches information describing the basic algorithm used, as well as options passed to that algorithm to generate the data stored here. Each sequence in the database can provide sequence masking data from one or more sources. There can also be multiple types of masking data from the same algorithm (such as DUST), but generated with different sets of input parameters.

Parameters
algorithm_idThe ID as from GetAvailableMaskAlgorithms [in]
programThe filtering program used (DUST, SEG, etc.) [out]
program_namestring representation of program [out]
algo_optsDescribes options passed to `program'. [out]

Definition at line 1196 of file seqdb.cpp.

References NStr::fConvErr_NoThrow, CSeqDBImpl::GetMaskAlgorithmDetails(), m_Impl, and NStr::StringToNumeric().

Referenced by BOOST_AUTO_TEST_CASE(), CRawSeqDBSource::CRawSeqDBSource(), and s_FillDbInfoLocally().

void CSeqDB::GetMaskAlgorithmDetails ( int  algorithm_id,
string program,
string program_name,
string algo_opts 
)

Definition at line 1209 of file seqdb.cpp.

References CSeqDBImpl::GetMaskAlgorithmDetails(), and m_Impl.

int CSeqDB::GetMaskAlgorithmId ( const string algo_name) const

Get the numeric algorithm ID for a string.

Parameters
algo_nameThe name of the filtering algorithm

Definition at line 1165 of file seqdb.cpp.

References CSeqDBImpl::GetMaskAlgorithmId(), and m_Impl.

Referenced by CSequenceIStreamBlastDB::CSequenceIStreamBlastDB(), CBlastDBCmdApp::x_InitSearchRequest(), and CSearchDatabase::x_TranslateFilteringAlgorithm().

void CSeqDB::GetMaskData ( int  oid,
const vector< int > &  algo_ids,
TSequenceRanges ranges 
)
inline

Get masked ranges of a sequence.

For the provided OID and list of algorithm IDs, this method gets a list of masked areas of those sequences for the first algorithm ID. The list of masked areas is returned via the ranges parameter.

Parameters
oidThe ordinal ID of the sequence. [in]
algo_idThe algorithm ID to get data for. [in]
rangesThe list of sequence offset ranges. [out]

Definition at line 1382 of file seqdb.hpp.

Referenced by CSeqDbSeqInfoSrc::GetMasks(), CRawSeqDBSource::GetNext(), CSequenceIStreamBlastDB::next(), s_GetSeqMask(), s_SeqDbGetSequence(), and CBlastDB_FastaFormatter::Write().

void CSeqDB::GetMaskData ( int  oid,
int  algo_id,
TSequenceRanges ranges 
)

Get masked ranges of a sequence.

For the provided OID and algorithm ID, this method gets a list of masked areas of those sequences. The list of masked areas is returned via the ranges parameter.

Parameters
oidThe ordinal ID of the sequence. [in]
algo_idThe algorithm ID to get data for. [in]
rangesThe list of sequence offset ranges. [out]

Definition at line 1218 of file seqdb.cpp.

References CSeqDBImpl::GetMaskData(), and m_Impl.

int CSeqDB::GetMaxLength ( ) const

Returns the length of the largest sequence in the database.

This uses summary information stored in the database volumes or alias files. This might be used to chose buffer sizes.

Definition at line 625 of file seqdb.cpp.

References CSeqDBImpl::GetMaxLength(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDBDemo_Threaded::Run(), s_SeqDbGetMaxLength(), s_SeqDbGetSupportsPartialFetching(), and CBlastDBCmdApp::x_PrintBlastDatabaseInformation().

int CSeqDB::GetMinLength ( ) const

Returns the length of the shortest sequence in the database.

This uses summary information stored in the database volumes or alias files. This might be used to chose cutoff score.

Definition at line 630 of file seqdb.cpp.

References CSeqDBImpl::GetMinLength(), and m_Impl.

Referenced by s_SeqDbGetMinLength().

CSeqDB::EOidListType CSeqDB::GetNextOIDChunk ( int begin_chunk,
int end_chunk,
int  oid_size,
vector< int > &  oid_list,
int oid_state = NULL 
)

Return a chunk of OIDs, and update the OID bookmark.

This method allows the caller to iterate over the database by fetching batches of OIDs. It will either return a list of OIDs in a vector, or set a pair of integers to indicate a range of OIDs. The return value will indicate which technique was used. The caller sets the number of OIDs to get by setting the size of the vector. If eOidRange is returned, the first included oid is oid_begin and oid_end is the oid after the last included oid. If eOidList is returned, the vector contain the included OIDs, and may be resized to a smaller value if fewer entries are available (for the last chunk). In some cases it may be desireable to have several concurrent, independent iterations over the same database object. If this is required, the caller should specify the address of an int to the optional parameter oid_state. This should be initialized to zero (before the iteration begins) but should otherwise not be modified by the calling code (except that it can be reset to zero to restart the iteration). For the normal case of one iteration per program, this parameter can be omitted.

Parameters
begin_chunkFirst included oid (if eOidRange is returned).
end_chunkOID after last included (if eOidRange is returned).
oid_sizeNumber of OID to retrieve (ignored in MT environment)
oid_listAn empty list. Will contain oid list if eOidList is returned.
oid_stateOptional address of a state variable (for concurrent iterations).
Returns
eOidList in enumeration case, or eOidRange in begin/end range case.

Definition at line 659 of file seqdb.cpp.

References CSeqDBImpl::GetNextOIDChunk(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDBDemo_Thread::Main(), CSeqDBDemo_ChunkIteration::Run(), and s_SeqDbGetNextChunk().

int CSeqDB::GetNumOIDs ( ) const
int CSeqDB::GetNumSeqs ( void  ) const
int CSeqDB::GetNumSeqsStats ( ) const

Returns the number of sequences available.

This may be overridden by the STATS_NSEQ key.

Definition at line 595 of file seqdb.cpp.

References CSeqDBImpl::GetNumSeqsStats(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), s_SeqDbGetNumSeqsStats(), and CLocalRPSBlast::x_AdjustDbSize().

int CSeqDB::GetOidAtOffset ( int  first_seq,
Uint8  residue 
) const

Find the sequence closest to the given offset into the database.

The database volumes can be viewed as a single array of residues, partitioned into sequences by OID order. The length of this array is given by GetTotalLength(). Given an offset between 0 and this length, this method returns the OID of the sequence at the given offset into the array. It is normally used to split the database into sections with approximately equal numbers of residues.

Parameters
first_seqFirst oid to consider (will always return this or higher).
residueThe approximate number residues offset to search for.
Returns
An OID near the specified residue offset.

Definition at line 851 of file seqdb.cpp.

References CSeqDBImpl::GetOidAtOffset(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

CRef< CSeq_data > CSeqDB::GetSeqData ( int  oid,
TSeqPos  begin,
TSeqPos  end 
) const

Fetch data as a CSeq_data object.

All or part of the sequence is fetched in a CSeq_data object. The portion of the sequence returned is specified by begin and end. An exception will be thrown if begin is greater than or equal to end, or if end is greater than or equal to the length of the sequence. Begin and end should be specified in bases; a range like (0,1) specifies 1 base, not 2. Nucleotide data will always be returned in ncbi4na format.

Parameters
oidSpecifies the sequence to fetch.
beginSpecifies the start of the data to get. [in]
endSpecifies the end of the data to get. [in]
Returns
The sequence data as a Seq-data object.

Definition at line 477 of file seqdb.cpp.

References CSeqDBImpl::GetSeqData(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

TGi CSeqDB::GetSeqGI ( int  oid) const

Returns the first Gi (if any) of the sequence.

This method does NOT check whether the OID in question belongs to the BLAST database after all filtering is applied (e.g.: GI list restriction or membership bit). If you need those checks, please use GetGis()

See also
GetGis

Definition at line 696 of file seqdb.cpp.

References CSeqDBImpl::GetSeqGI(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE().

list< CRef< CSeq_id > > CSeqDB::GetSeqIDs ( int  oid) const

Gets a list of sequence identifiers.

This returns the list of CSeq_id identifiers associated with the sequence specified by the given OID.

Parameters
oidThe oid of the sequence.
Returns
A list of Seq-id objects for this sequence.

Definition at line 685 of file seqdb.cpp.

References CSeqDBImpl::GetSeqIDs(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CLBLASTObjectLoader::Execute(), GetGis(), CSeqDbSeqInfoSrc::GetId(), CBlastSequenceSource::GetSeqID(), CLocalBlastDbAdapter::GetSeqIDs(), CSequenceIStreamBlastDB::next(), CSeqdb2CreateApplication::processBatch(), s_GetFilteredRedundantGis(), s_SeqDbGetSequence(), CTestAction::TestOID(), CElementaryMatching::x_CreateRemapData(), CElementaryMatching::x_LoadRemapData(), and CBuildDatabase::x_ResolveFromSource().

int CSeqDB::GetSeqLength ( int  oid) const
int CSeqDB::GetSeqLengthApprox ( int  oid) const

Returns an unbiased, approximate sequence length.

For protein DBs, this method is identical to GetSeqLength(). In the nucleotide case, computing the exact length requires examination of the sequence data. This method avoids doing that, returning an approximation ranging from L-3 to L+3 (where L indicates the exact length), and unbiased on average.

Definition at line 353 of file seqdb.cpp.

References CSeqDBImpl::GetSeqLengthApprox(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

int CSeqDB::GetSequence ( int  oid,
const char **  buffer 
) const

Get a pointer to raw sequence data.

Get the raw sequence (strand data). When done, resources should be returned with RetSequence. This data pointed to by *buffer is in read-only memory (where supported).

Parameters
oidThe ordinal id of the sequence.
bufferA returned pointer to the data in the sequence.
Returns
The return value is the sequence length (in base pairs or residues). In case of an error, an exception is thrown.

Definition at line 468 of file seqdb.cpp.

References CSeqDBImpl::GetSequence(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDBSequence::CSeqDBSequence(), CBlastSequenceSource::GetSeq(), CLocalBlastDbAdapter::GetSequence(), CSeqDBDemo_GetSequence::Run(), s_SeqDbGetSequence(), CTestAction::TestOID(), CElementaryMatching::x_CreateIndex(), CSeqDBIter::x_GetSeq(), CElementaryMatching::x_InitFilteringVector(), and CSeqDBDemo_Thread::x_UseOID().

void CSeqDB::GetSequenceAsString ( int  oid,
CSeqUtil::ECoding  coding,
string output,
TSeqRange  range = TSeqRange() 
) const

Get a sequence in a given encoding.

This method gets the sequence data for the given OID, converts it to the specified encoding, and returns it in a string. It supports all values of the CSeqUtil::ECoding enumeration (but the type must match the database type). This method returns the same data as GetAmbigSeq() (or GetSequence() for protein), but may be less efficient due to the cost of translation and string allocation.

Parameters
oidThe OID of the sequence to fetch.
codingThe encoding to use for the data.
outputThe returned sequence data as a string.
rangeThe range of the sequence to retrieve, if empty, the entire sequence will be retrived [in]

Definition at line 1074 of file seqdb.cpp.

References buffer, CSeqConvert::Convert(), CSeqUtil::e_Ncbi8na, CSeqUtil::e_Ncbistdaa, eProtein, GetAmbigSeq(), COpenRange< Position >::GetFrom(), GetSequenceType(), COpenRange< Position >::GetToOpen(), kSeqDBNuclNcbiNA8, COpenRange< Position >::NotEmpty(), result, and RetAmbigSeq().

Referenced by BOOST_AUTO_TEST_CASE(), CBlastDBExtractor::ExtractHash(), CBlastDBExtractor::ExtractSeqData(), GetSequenceAsString(), CBlastDB_SeqFormatter::x_GetSeq(), and CBlastDB_SeqFormatter::x_GetSeqHash().

void CSeqDB::GetSequenceAsString ( int  oid,
string output,
TSeqRange  range = TSeqRange() 
) const

Get a sequence in a readable text encoding.

This method gets the sequence data for an OID, converts it to a human-readable encoding (either Iupacaa for protein, or Iupacna for nucleotide), and returns it in a string. This is equivalent to calling the three-argument versions of this method with those encodings.

Parameters
oidThe OID of the sequence to fetch.
outputThe returned sequence data as a string.
rangeThe range of the sequence to retrieve, if empty, the entire sequence will be retrived [in]

Definition at line 1063 of file seqdb.cpp.

References CSeqUtil::e_Iupacaa, CSeqUtil::e_Iupacna, eProtein, GetSequenceAsString(), and GetSequenceType().

CSeqDB::ESeqType CSeqDB::GetSequenceType ( void  ) const
Int8 CSeqDB::GetSliceSize ( ) const

Retrieve the current slice size used for mmap.

Definition at line 1402 of file seqdb.cpp.

References CSeqDBImpl::GetSliceSize(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

void CSeqDB::GetTaxIDs ( int  oid,
map< TGi, int > &  gi_to_taxid,
bool  persist = false 
) const

Get taxid for an OID.

This finds the TAXIDS associated with a given OID and computes a mapping from GI to a set of taxids. This mapping is added to the map<int,int> provided by the user. If the "persist" flag is set to true, the new associations will simply be added to the map. If it is false (the default), the map will be cleared first.

Parameters
oidThe ordinal id of the sequence.
gi_to_taxidA returned mapping from GI to taxid.
persistIf false, the map will be cleared before adding new entries.

Definition at line 385 of file seqdb.cpp.

References map_checker< std::map< Key, T, Compare > >::clear(), CSeqDBImpl::GetTaxIDs(), ITERATE, m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CLocalBlastDbAdapter::GetTaxId(), and CSearch< LEGACY, NHITS >::Search().

void CSeqDB::GetTaxIDs ( int  oid,
vector< int > &  taxids,
bool  persist = false 
) const

Get taxids for an OID.

This finds the TAXIDS associated with a given OID and returns them in a vector. If the "persist" flag is set to true, the new taxids will simply be appended to the vector. If it is false (the default), the vector will be cleared first. One advantage of this interface over the map<int,int> version is that the vector interface works with databases with local IDs but lacking GIs.

Parameters
oidThe ordinal id of the sequence.
taxidsA returned list of taxids.
persistIf false, the map will be cleared before adding new entries.

Definition at line 402 of file seqdb.cpp.

References CSeqDBImpl::GetTaxIDs(), m_Impl, and CSeqDBImpl::Verify().

void CSeqDB::GetTaxInfo ( int  taxid,
SSeqDBTaxInfo info 
)
static

Get taxonomy information.

This method returns taxonomy information for a single taxid. This information does not vary with sequence type (protein vs. nucleotide) and is the same for all blast databases. If the taxonomy database is not available or the taxid is not found, this method will throw an exception.

Parameters
taxidAn integer identifying the taxid to fetch.
infoA structure containing taxonomic description strings.

Definition at line 1033 of file seqdb.cpp.

References CSeqDBImpl::GetTaxInfo().

Referenced by BOOST_AUTO_TEST_CASE(), CMetaDataTest::DoTest(), CBlastDBExtractor::ExtractBlastName(), CBlastDBExtractor::ExtractCommonTaxonomicName(), CBlastDBExtractor::ExtractLeafCommonTaxonomicNames(), CBlastDBExtractor::ExtractLeafScientificNames(), CBlastDBExtractor::ExtractScientificName(), CBlastDBExtractor::ExtractSuperKingdom(), s_GetTaxName(), s_SeqAlignToXMLHit(), CGeneFileWriter::x_GetOrgnameForTaxId(), CTaxFormat::x_InitBlastDBTaxInfo(), CBlastTabularInfo::x_SetTaxInfo(), and CBlastTabularInfo::x_SetTaxInfoAll().

string CSeqDB::GetTitle ( void  ) const

Returns the database title.

This is usually read from database volumes or alias files. If multiple databases were passed to the constructor, this will be a concatenation of those databases' titles.

Definition at line 550 of file seqdb.cpp.

References CSeqDBImpl::GetTitle(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CLBLASTObjectLoader::CreateLoader(), CMetaDataTest::DoTest(), CBlastDbMetadata::GetTitle(), BlastdbCopyApplication::Run(), CSeqdb2CreateApplication::Run(), s_FillDbInfoLocally(), CBuildDatabase::SetSourceDb(), CMakeBlastDBApp::x_BuildDatabase(), and CBlastDBCmdApp::x_PrintBlastDatabaseInformation().

Uint8 CSeqDB::GetTotalLength ( void  ) const

Returns the sum of the lengths of all available sequences.

This uses summary information stored in the database volumes or alias files. It provides an approx value without iterating over individual sequences for cases when scanning the db is the only way to determine the exact total length

Definition at line 605 of file seqdb.cpp.

References CSeqDBImpl::GetTotalLength(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CMetaDataTest::DoTest(), CBlastDbMetadata::GetDbLength(), CBlastSequenceSource::GetTotalLength(), s_FillDbInfoLocally(), s_SeqDbGetSupportsPartialFetching(), s_SeqDbGetTotLen(), SDbSumInfo::SDbSumInfo(), CLocalRPSBlast::x_AdjustDbSize(), CElementaryMatching::x_CreateIndex(), CElementaryMatching::x_InitFilteringVector(), and CBlastDBCmdApp::x_PrintBlastDatabaseInformation().

Uint8 CSeqDB::GetTotalLengthStats ( ) const

Returns the sum of the lengths of all available sequences.

This uses summary information stored in the database volumes or alias files. It provides either an exact value or a value changed in the alias files by the STATS_TOTLEN key.

Definition at line 615 of file seqdb.cpp.

References CSeqDBImpl::GetTotalLengthStats(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), s_SeqDbGetTotLenStats(), and CLocalRPSBlast::x_AdjustDbSize().

void CSeqDB::GetTotals ( ESummaryType  sumtype,
int oid_count,
Uint8 total_length,
bool  use_approx = true 
) const

Returns the sum of the sequence lengths.

This uses summary information and iteration to compute the total length and number of sequences for some subset of the database. If eUnfilteredAll is specified, it uses information from the underlying database volumes, without filtering. If eFilteredAll is specified, all of the included sequences are used, for all possible OIDs. If eFilteredRange is specified, the returned values correspond to the sum over only those sequences that survive filtering, and are within the iteration range. If either of oid_count or total_length is passed NULL, that result is not returned. In some cases, the results can be computed in constant time; other cases require iteration proportional to the length of the database or the included OID range (see SetIterationRange()).

Parameters
sumtypeSpecifies the subset of sequences to include.
oid_countThe returned number of included OIDs.
total_lengthThe returned sum of included sequence lengths.
use_approxWhether to use approximate lengths for nucleotide.

Definition at line 1038 of file seqdb.cpp.

References CSeqDBImpl::GetTotals(), m_Impl, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), and s_ComputeNumSequencesAndDbLength().

Uint8 CSeqDB::GetVolumeLength ( ) const

Returns the sum of the lengths of all volumes.

This uses summary information stored in the database volumes (but not the alias files). It provides an exact value, without iterating over individual sequences. It includes all OIDs regardless of inclusion by the filtering mechanisms of the alias files.

Definition at line 620 of file seqdb.cpp.

References CSeqDBImpl::GetVolumeLength(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), CMetaDataTest::DoTest(), and SDbSumInfo::SDbSumInfo().

CRef< CBioseq > CSeqDB::GiToBioseq ( TGi  gi) const

Get a CBioseq for a given GI.

This builds and returns the header and sequence data corresponding to the indicated GI as a CBioseq.

Parameters
giThe GI of the sequence.
Returns
A CBioseq object corresponding to the sequence.

Definition at line 915 of file seqdb.cpp.

References CSeqDBImpl::GetBioseq(), CSeqDBImpl::GiToOid(), m_Impl, NULL, and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

bool CSeqDB::GiToOid ( TGi  gi,
int oid 
) const
bool CSeqDB::GiToOidwFilterCheck ( TGi  gi,
int oid 
) const

Translate a GI To an OID with filter check.

Definition at line 737 of file seqdb.cpp.

References CSeqDBImpl::GiToOidwFilterCheck(), m_Impl, and CSeqDBImpl::Verify().

Referenced by CBlastDBCmdApp::x_GetOids().

bool CSeqDB::GiToPig ( TGi  gi,
int pig 
) const

Translate a GI to a PIG.

Definition at line 774 of file seqdb.cpp.

References CSeqDBImpl::GiToOid(), m_Impl, CSeqDBImpl::OidToPig(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

void CSeqDB::ListColumns ( vector< string > &  titles)

List columns titles found in this database.

This returns a list of the column titles of all user created (and system generated) columns found in any of this database's volumes. Column titles appearing in more than one volume are only listed here once.

Parameters
titlesColumn titles are returned here. [out]

Definition at line 1124 of file seqdb.cpp.

References CSeqDBImpl::ListColumns(), and m_Impl.

Referenced by BOOST_AUTO_TEST_CASE(), and CRawSeqDBSource::CRawSeqDBSource().

bool CSeqDB::OidToGi ( int  oid,
TGi gi 
) const

Translate an OID to a GI.

Definition at line 746 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::OidToGi(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

bool CSeqDB::OidToPig ( int  oid,
int pig 
) const

Translate an OID to a PIG.

Definition at line 710 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::OidToPig(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), and CBlastDBExtractor::ExtractPig().

CRef< CBioseq > CSeqDB::PigToBioseq ( int  pig) const

Get a CBioseq for a given PIG.

This builds and returns the header and sequence data corresponding to the indicated PIG (a numeric identifier used for proteins) as a CBioseq.

Parameters
pigThe protein identifier group id of the sequence.
Returns
A CBioseq object corresponding to the sequence.

Definition at line 932 of file seqdb.cpp.

References CSeqDBImpl::GetBioseq(), m_Impl, NULL, CSeqDBImpl::PigToOid(), CSeqDBImpl::Verify(), and ZERO_GI.

Referenced by BOOST_AUTO_TEST_CASE().

bool CSeqDB::PigToGi ( int  pig,
TGi gi 
) const

Translate a PIG to a GI.

Definition at line 757 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::OidToGi(), CSeqDBImpl::PigToOid(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE().

bool CSeqDB::PigToOid ( int  pig,
int oid 
) const

Translate a PIG to an OID.

Definition at line 701 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::PigToOid(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CBlastDBExtractor::SetSeqId(), and CBlastDBCmdApp::x_ProcessEntry().

void CSeqDB::RemoveOffsetRanges ( int  oid)

Remove any offset ranges for the given OID.

Parameters
oidOID of the sequence.

Definition at line 1248 of file seqdb.cpp.

References SetOffsetRanges().

Referenced by CLocalBlastDbAdapter::GetSequence(), and s_SeqDbGetSequence().

void CSeqDB::ResetInternalChunkBookmark ( )

Resets this object's internal chunk bookmark, which is used when the oid_state argument to GetNextOIDChunk is NULL.

This allows for several iterations to be performed over the same CSeqDB object

Definition at line 675 of file seqdb.cpp.

References m_Impl, and CSeqDBImpl::ResetInternalChunkBookmark().

Referenced by BOOST_AUTO_TEST_CASE(), and s_SeqDbResetChunkIterator().

void CSeqDB::RetAmbigSeq ( const char **  buffer) const

Returns any resources associated with the sequence.

Calls to GetAmbigSeq (but not GetBioseq()) either increment a counter corresponding to a section of the database where the sequence data lives, or allocate a buffer to return to the user. This method decrements that counter or frees the allocated buffer, so that the memory can be used by other processes. Each allocating call should be paired with a returning call. Note that this does not apply to GetBioseq(), or GetHdr(), for example.

Parameters
bufferA pointer to the sequence data to release.

Definition at line 501 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::RetAmbigSeq(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CLocalBlastDbAdapter::GetSequence(), GetSequenceAsString(), s_FetchRawData(), s_TestPartialAmbigRange(), CSeqDBDemo_Thread::x_UseOID(), and CReturnSeqBuffer::~CReturnSeqBuffer().

void CSeqDB::RetSequence ( const char **  buffer) const

Returns any resources associated with the sequence.

Calls to GetSequence (but not GetBioseq()) either increment a counter corresponding to a section of the database where the sequence data lives, or allocate a buffer to return to the user. This method decrements that counter or frees the allocated buffer, so that the memory can be used by other processes. Each allocating call should be paired with a returning call. Note that this does not apply to GetBioseq(), or GetHdr(), for example.

Parameters
bufferA pointer to the sequence data to release.

Definition at line 461 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::RetSequence(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CRawSeqDBSource::ClearSequence(), CRawSeqDBSource::GetNext(), CLocalBlastDbAdapter::GetSequence(), CBlastSequenceSource::RetSequence(), CSeqDBDemo_GetSequence::Run(), s_SeqDbReleaseSequence(), CTestAction::TestOID(), CElementaryMatching::x_CreateIndex(), CElementaryMatching::x_InitFilteringVector(), CSeqDBIter::x_RetSeq(), CSeqDBDemo_Thread::x_UseOID(), CRawSeqDBSource::~CRawSeqDBSource(), CSeqDBSequence::~CSeqDBSequence(), and CSequenceReturn::~CSequenceReturn().

CRef< CBioseq > CSeqDB::SeqidToBioseq ( const CSeq_id seqid) const

Get a CBioseq for a given Seq-id.

This builds and returns the header and sequence data corresponding to the indicated Seq-id as a CBioseq. Note that certain forms of Seq-id map to more than one OID. If this is the case for the provided Seq-id, the first matching OID will be used.

Parameters
seqidThe Seq-id identifier of the sequence.
Returns
A CBioseq object corresponding to the sequence.

Definition at line 949 of file seqdb.cpp.

References CSeqDBImpl::GetBioseq(), m_Impl, CSeqDBImpl::SeqidToOids(), CSeqDBImpl::Verify(), and ZERO_GI.

Referenced by BOOST_AUTO_TEST_CASE(), CSeqDBDemo_SeqidToBioseq::Run(), and CIgBlast::x_AnnotateDomain().

bool CSeqDB::SeqidToOid ( const CSeq_id seqid,
int oid 
) const
void CSeqDB::SeqidToOids ( const CSeq_id seqid,
vector< int > &  oids 
) const
void CSeqDB::SetDefaultMemoryBound ( Uint8  bytes)
static

Set global default memory bound for SeqDB.

The memory bound for individual SeqDB objects can be adjusted with SetMemoryBound(), but this cannot be called until after the object is constructed. Until that time, the value used is set from a global default. This method allows that global default value to be changed. Any SeqDB object constructed after this method is called will use this value as the initial memory bound. If zero is specified, an appropriate default will be selected based on system information.

Definition at line 1058 of file seqdb.cpp.

References CSeqDBImpl::SetDefaultMemoryBound().

Referenced by BOOST_AUTO_TEST_CASE(), and CRPSTBlastnApp::Run().

void CSeqDB::SetIterationRange ( int  oid_begin,
int  oid_end 
)

Set Iteration Range.

This method sets the iteration range as a pair of OIDs. Iteration proceeds from begin, up to but not including end. End will be adjusted to the number of OIDs in the case that it is 0, negative, or greater than the number of OIDs.

Parameters
oid_beginIterator will skip OIDs less than this value. Only OIDs found in the OID lists (if any) will be returned.
oid_endIterator will return up to (but not including) this OID.

Definition at line 1021 of file seqdb.cpp.

References m_Impl, and CSeqDBImpl::SetIterationRange().

Referenced by BOOST_AUTO_TEST_CASE(), and s_SeqDbSrcNew().

void CSeqDB::SetMemoryBound ( Uint8  membound,
Uint8  slice_size = 0 
)

Set upper limit on memory and mapping slice size.

This sets a (not precisely enforced) upper limit on memory used by CSeqDB to memory map disk files (and for some large arrays). Setting this to a low value may degrade performance. Setting it to too high a value may cause address space exhaustion. Normally, SeqDB will start with a large bound and reduces it if memory exhaustion is detected. Applications that use a lot of memory outside of SeqDB may want to call this method to scale back SeqDB's demands. Note that slice size is no longer externally adjustable and may be removed in the future. Also note that if SeqDB detects a map failure, it will reduce the memory bound.

Parameters
memboundMaximum memory for SeqDB.
slice_sizeNo longer used.

Definition at line 846 of file seqdb.cpp.

References m_Impl, and CSeqDBImpl::SetMemoryBound().

Referenced by BOOST_AUTO_TEST_CASE(), CBlastSequenceSource::SetMemoryBound(), CElementaryMatching::x_CreateIndex(), and CElementaryMatching::x_InitFilteringVector().

void CSeqDB::SetMmapStrategy ( EMmapFileTypes  filetype,
EMmapStrategies  strategy 
)
static

Sets mmap strategy to be used when mapping index or sequence files.

This method sets internal flags of type EMemoryAdvise for a call to MemoryAdvise in CRegionMap::MapMmap. Note that these are only hints, the system may or may not actually alter its behavior when mapping these files.

Definition at line 322 of file seqdb.cpp.

References eMADV_Normal, eMADV_Sequential, eMADV_WillNeed, eMmap_IndexFile, eMmap_SequenceFile, eMmap_Sequential, eMmap_WillNeed, CRegionMap::SetMmapStrategy_Index(), and CRegionMap::SetMmapStrategy_Sequence().

void CSeqDB::SetNumberOfThreads ( int  num_threads,
bool  force_mt = false 
)

Setting the number of threads.

This should be called by the master thread, before and after multiple threads run.

Parameters
num_threadsNumber of threads

Definition at line 1259 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::SetNumberOfThreads(), and CSeqDBImpl::Verify().

Referenced by BOOST_AUTO_TEST_CASE(), CMagicBlastApp::Run(), CSeqDBDemo_Threaded::Run(), and s_SeqDbSetNumberOfThreads().

void CSeqDB::SetOffsetRanges ( int  oid,
const TRangeList offset_ranges,
bool  append_ranges,
bool  cache_data 
)

Apply a range of offsets to a database sequence.

The GetAmbigSeq() method requires an amount of work (and I/O) which is proportional to the size of the sequence data (more if ambiguities are present). In some cases, only certain subranges of this data will be utilized. This method allows the user to specify which parts of a sequence are actually needed by the user. (Care should be taken if one SeqDB object is shared by several program components.) (Note that offsets above the length of the sequence will not generate an error, and are replaced by the sequence length.)

If ranges are specified for a sequence, data areas in specified sequences will be accurate, but data outside the specified ranges should not be accessed, and no guarantees are made about what data they will contain. If the append_ranges flag is true, the range will be added to existing ranges. If false, existing ranges will be flushed and replaced by new ranges. To remove ranges, call this method with an empty list of ranges (and append_ranges == false); future calls will then return the complete sequence.

If the cache_data flag is set, data for this sequence will be kept for the duration of SeqDB's lifetime. To disable caching (and flush cached data) for this sequence, call the method again, but specify cache_data to be false.

Parameters
oidOID of the sequence.
offset_rangesRanges of sequence data to return.
append_rangesAppend new ranges to existing list.
cache_dataKeep sequence data for future callers.

Definition at line 1233 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::SetOffsetRanges(), and CSeqDBImpl::Verify().

Referenced by CSubjectRangesSet::ApplyRanges(), CLocalBlastDbAdapter::GetSequence(), RemoveOffsetRanges(), and s_SeqDbSetRanges().

void CSeqDB::SetVolsMemBit ( int  mbit)

Set the membership of all volumes.

Definition at line 1283 of file seqdb.cpp.

References m_Impl, and CSeqDBImpl::SetVolsMemBit().

Referenced by BlastdbCopyApplication::Run().

bool CSeqDB::TiToOid ( Int8  ti,
int oid 
) const

Translate a TI to an OID.

Definition at line 719 of file seqdb.cpp.

References m_Impl, CSeqDBImpl::TiToOid(), and CSeqDBImpl::Verify().

vector< int > CSeqDB::ValidateMaskAlgorithms ( const vector< int > &  algorithm_ids)

Validates the algorithm IDs passed to this function, returning a vector of those algorithm IDs not present in this object.

Definition at line 1175 of file seqdb.cpp.

References copy(), GetAvailableMaskAlgorithms(), and ITERATE.

Referenced by CSeqFormatter::CSeqFormatter(), and s_IsMaskAlgoIdValid().

Member Data Documentation

const char * CSeqDB::kBlastDbDateFormat = "b d, Y H:m P"
static

Format string for the date returned by CSeqDB::GetDate.

See also
CTime

Definition at line 797 of file seqdb.hpp.

const string CSeqDB::kOidNotFound
static
class CSeqDBImpl* CSeqDB::m_Impl
protected

Implementation details are hidden. (See seqdbimpl.hpp).

Definition at line 1481 of file seqdb.hpp.

Referenced by AccessionToOids(), CheckOrFindOID(), CSeqDB(), DebugDump(), FindVolumePaths(), FlushOffsetRangeCache(), GarbageCollect(), GetAliasFileValues(), GetAmbigSeq(), GetAmbigSeqAlloc(), GetAvailableMaskAlgorithmDescriptions(), GetAvailableMaskAlgorithms(), GetBioseq(), GetBioseqNoData(), GetColumnBlob(), GetColumnId(), GetColumnMetaData(), GetDate(), GetDBNameList(), GetExactTotalLength(), CSeqDBExpert::GetGiBounds(), GetGiList(), GetGis(), GetHdr(), GetIdSet(), GetLeafTaxIDs(), GetMaskAlgorithmDetails(), GetMaskAlgorithmId(), GetMaskData(), GetMaxLength(), GetMinLength(), GetNextOIDChunk(), GetNumOIDs(), GetNumSeqs(), GetNumSeqsStats(), GetOidAtOffset(), CSeqDBExpert::GetPigBounds(), CSeqDBExpert::GetRawSeqAndAmbig(), GetSeqData(), GetSeqGI(), GetSeqIDs(), GetSeqLength(), GetSeqLengthApprox(), GetSequence(), CSeqDBExpert::GetSequenceHash(), GetSequenceType(), GetSliceSize(), CSeqDBExpert::GetStringBounds(), GetTaxIDs(), GetTitle(), GetTotalLength(), GetTotalLengthStats(), GetTotals(), GetVolumeLength(), GiToBioseq(), GiToOid(), GiToOidwFilterCheck(), GiToPig(), CSeqDBExpert::HashToOids(), ListColumns(), OidToGi(), OidToPig(), PigToBioseq(), PigToGi(), PigToOid(), ResetInternalChunkBookmark(), RetAmbigSeq(), RetSequence(), SeqidToBioseq(), SeqidToOid(), SeqidToOids(), SetIterationRange(), SetMemoryBound(), SetNumberOfThreads(), SetOffsetRanges(), SetVolsMemBit(), TiToOid(), CSeqDBExpert::Verify(), and ~CSeqDB().


The documentation for this class was generated from the following files:
Modified on Sat Apr 22 17:07:16 2017 by modify_doxy.py rev. 533848