NCBI C++ ToolKit
Public Types | Public Member Functions | Static Public Member Functions | Private Member Functions | Private Attributes | List of all members
CRegexp Class Reference

Search Toolkit Book for CRegexp

CRegexp –. More...

#include <util/xregexp/regexp.hpp>

Public Types

enum  ECompile {
  fCompile_default = 0x80000000, fCompile_ignore_case = 0x80000001, fCompile_dotall = 0x80000002, fCompile_newline = 0x80000004,
  fCompile_ungreedy = 0x80000008, fCompile_extended = 0x80000010
}
 Flags for compile regular expressions. More...
 
enum  ECompile_deprecated {
  eCompile_default = fCompile_default, eCompile_ignore_case = fCompile_ignore_case, eCompile_dotall = fCompile_dotall, eCompile_newline = fCompile_newline,
  eCompile_ungreedy = fCompile_ungreedy
}
 
enum  EMatch { fMatch_default = 0x80000000, fMatch_not_begin = 0x80000001, fMatch_not_end = 0x80000002, fMatch_not_both = fMatch_not_begin | fMatch_not_end }
 Flags for match string against a precompiled pattern. More...
 
enum  EMatch_deprecated { eMatch_default = fMatch_default, eMatch_not_begin = fMatch_not_begin, eMatch_not_end = fMatch_not_end, eMatch_not_both = fMatch_not_both }
 
typedef unsigned int TCompile
 Type definitions used for code clarity. More...
 
typedef unsigned int TMatch
 Match options. More...
 

Public Member Functions

 CRegexp (CTempStringEx pattern, TCompile flags=fCompile_default)
 Constructor. More...
 
virtual ~CRegexp ()
 Destructor. More...
 
void Set (CTempStringEx pattern, TCompile flags=fCompile_default)
 Set and compile PCRE. More...
 
CTempString GetMatch (CTempString str, size_t offset=0, size_t idx=0, TMatch flags=fMatch_default, bool noreturn=false)
 Get matching pattern and subpatterns. More...
 
bool IsMatch (CTempString str, TMatch flags=fMatch_default)
 Check existence substring which match a specified pattern. More...
 
CTempString GetSub (CTempString str, size_t idx=0) const
 Get pattern/subpattern from previous GetMatch(). More...
 
void GetSub (CTempString str, size_t idx, string &dst) const
 Get pattern/subpattern from previous GetMatch(). More...
 
int NumFound () const
 Get number of patterns + subpatterns. More...
 
const intGetResults (size_t idx) const
 Get location of pattern/subpattern for the last GetMatch(). More...
 

Static Public Member Functions

static string Escape (CTempString str)
 Escape all regular expression meta characters in the string. More...
 
static string WildcardToRegexp (CTempString mask)
 Convert wildcard mask to regular expression. More...
 

Private Member Functions

 CRegexp (const CRegexp &)
 
void operator= (const CRegexp &)
 

Private Attributes

void * m_PReg
 
void * m_Extra
 Pointer to compiled PCRE pattern. More...
 
int m_Results [(kRegexpMaxSubPatterns+1)*3]
 Pointer to extra structure used for pattern study. More...
 
int m_NumFound
 The total number of pattern + subpatterns resulting from the last call to GetMatch. More...
 

Detailed Description

CRegexp –.

Define a wrapper class for the Perl-compatible regular expression (PCRE) library.

Internally, this class holds a compiled regular expression used for matching with strings passed as an argument to the GetMatch() member function. The regular expression is passed as a argument to the constructor or to the Set() member function.

Throw exception on error.

Definition at line 69 of file regexp.hpp.

Member Typedef Documentation

typedef unsigned int CRegexp::TCompile

Type definitions used for code clarity.

Compilation options.

Definition at line 73 of file regexp.hpp.

typedef unsigned int CRegexp::TMatch

Match options.

Definition at line 74 of file regexp.hpp.

Constructor & Destructor Documentation

CRegexp::CRegexp ( CTempStringEx  pattern,
TCompile  flags = fCompile_default 
)

Constructor.

Set and compile the PCRE pattern specified by argument according to compile options. Also allocate memory for compiled PCRE.

Parameters
patternPerl regular expression to compile.
flagsRegular expression compilation flags.
See also
ECompile

Definition at line 103 of file regexp.cpp.

References Set().

CRegexp::~CRegexp ( )
virtual

Destructor.

Deallocate compiled Perl-compatible regular expression.

Definition at line 110 of file regexp.cpp.

References m_Extra, and m_PReg.

CRegexp::CRegexp ( const CRegexp )
private

Member Function Documentation

string CRegexp::Escape ( CTempString  str)
static

Escape all regular expression meta characters in the string.

Definition at line 198 of file regexp.cpp.

References CTempString::data(), CTempString::find_first_of(), CTempString::length(), NPOS, out(), prev(), s_Special, and str().

Referenced by CFindASN1Dlg::OnReplaceButton().

CTempString CRegexp::GetMatch ( CTempString  str,
size_t  offset = 0,
size_t  idx = 0,
TMatch  flags = fMatch_default,
bool  noreturn = false 
)

Get matching pattern and subpatterns.

Return a string corresponding to the match to pattern or subpattern. Set noreturn to true when GetSub() or GetResults() will be used to retrieve pattern and subpatterns. Calling GetMatch() causes the entire search to be performed again. If you want to retrieve a different pattern/subpattern from an already performed search, it is more efficient to use GetSub() or GetResults(). If you need to get numeric offset of the found pattern or subpattern, that use GetResults() method. Doo not use functions like strstr(), or string's find() method and etc, because in general they give you wrong results. This is very dependent from used regular expression.

Parameters
strString to search.
offsetStarting offset in str.
idx(Sub) match to return. Use idx = 0 for complete pattern. Use idx > 0 for subpatterns.
flagsFlags to match.
noreturnReturn empty string if noreturn is true.
Returns
Return (sub) match with number idx or empty string when no match found or if noreturn is true.
See also
EMatch, GetSub, GetResult

Definition at line 173 of file regexp.cpp.

References CTempString::data(), GetSub(), int, kRegexpMaxSubPatterns, CTempString::length(), m_Extra, m_NumFound, m_PReg, m_Results, pcre_exec(), and s_GetRealMatchFlags().

Referenced by BrBookURLToCCddBookRef(), BrFcgiBookTermToEutilsTerm(), CapitalizeAfterApostrophe(), CPepXML::ConvertScanID(), DoesPatternMatchHighlightedResidues(), CRegexpUtil::Exists(), CRegexpUtil::Extract(), CFindPattern::Find(), FixAffiliationShortWordsInElement(), FixOrdinalNumbers(), CSpectrumSet::LoadMultDTA(), PortalBookURLToCCddBookRef(), CRegexpUtil::Replace(), CRegexpUtil::ReplaceRange(), COrfSearchJob::x_DoSearch(), CSequenceSearchJob::x_GetMatches(), and CFeatureSearchJob::x_Match().

const int * CRegexp::GetResults ( size_t  idx) const
inline

Get location of pattern/subpattern for the last GetMatch().

Parameters
idxIndex of pattern/subpattern to obtaining. Use idx = 0 for pattern, idx > 0 for sub patterns.
Returns
Return array where index 0 is location of first character in pattern/sub pattern and index 1 is 1 beyond last character in pattern/sub pattern. Throws if called with idx >= NumFound().
See also
GetMatch(), NumFound()

Definition at line 564 of file regexp.hpp.

References m_NumFound, and m_Results.

Referenced by CapitalizeAfterApostrophe(), CFindPattern::Find(), FixAffiliationShortWordsInElement(), FixOrdinalNumbers(), CNcbiApplogApp::GetRawAppName(), CRegexpUtil::Replace(), CNcbiApplogApp::Run(), and CSequenceSearchJob::x_GetMatches().

CTempString CRegexp::GetSub ( CTempString  str,
size_t  idx = 0 
) const

Get pattern/subpattern from previous GetMatch().

Should only be called after GetMatch() has been called with the same string. GetMatch() internally stores locations on string where pattern and subpatterns were found.

Parameters
strString to search.
idx(Sub) match to return.
Returns
Return the substring at location of pattern match (idx 0) or subpattern match (idx > 0). Return empty string when no match.
See also
GetMatch(), GetResult()

Definition at line 159 of file regexp.cpp.

References CTempString::data(), m_NumFound, and m_Results.

Referenced by BrBookURLToCCddBookRef(), BrFcgiBookTermToEutilsTerm(), GetMatch(), PortalBookURLToCCddBookRef(), s_ChrName(), CTabularFormatter::SetFormat(), sParseVersion(), CRegexpTemplateTester::x_CompareLines(), and CSeq_id_Resolver__LRG::x_Create().

void CRegexp::GetSub ( CTempString  str,
size_t  idx,
string dst 
) const

Get pattern/subpattern from previous GetMatch().

Deprecated:

Definition at line 143 of file regexp.cpp.

References CTempString::data(), CTempString::erase(), m_NumFound, and m_Results.

bool CRegexp::IsMatch ( CTempString  str,
TMatch  flags = fMatch_default 
)
int CRegexp::NumFound ( ) const
inline

Get number of patterns + subpatterns.

Returns
Return the number of patterns + subpatterns found as a result of the most recent GetMatch() call (check on >= 0).
See also
GetMatch, IsMatch

Definition at line 557 of file regexp.hpp.

References m_NumFound.

Referenced by BrBookURLToCCddBookRef(), CapitalizeAfterApostrophe(), DoesPatternMatchHighlightedResidues(), CRegexpUtil::Exists(), FixAffiliationShortWordsInElement(), FixOrdinalNumbers(), PortalBookURLToCCddBookRef(), CRegexpUtil::Replace(), CRegexpUtil::ReplaceRange(), CRegexpTemplateTester::x_CompareLines(), and CSequenceSearchJob::x_GetMatches().

void CRegexp::operator= ( const CRegexp )
private
void CRegexp::Set ( CTempStringEx  pattern,
TCompile  flags = fCompile_default 
)

Set and compile PCRE.

Set and compile the PCRE pattern specified by argument according to compile options. Also deallocate/allocate memory for compiled PCRE.

Parameters
patternPerl regular expression to compile.
flagsRegular expression compilation flags.
See also
ECompile

Definition at line 117 of file regexp.cpp.

References CTempString::data(), CTempStringEx::HasZeroAtEnd(), m_Extra, m_PReg, NCBI_THROW, NULL, pcre_compile(), pcre_study(), and s_GetRealCompileFlags().

Referenced by CRemoveDescDlg::ApplyToCSeq_entry(), and CRegexp().

string CRegexp::WildcardToRegexp ( CTempString  mask)
static

Convert wildcard mask to regular expression.

Escapes all regular expression meta characters in the string, except '*' and '?'. They will be replaced with '.*' and '.' accordingly.

Parameters
maskWildcard mask.
Returns
Regular expression.
See also
Escape, NStr::MatchesMask

Definition at line 226 of file regexp.cpp.

References CTempString::data(), CTempString::find_first_of(), CTempString::length(), mask, NPOS, out(), prev(), and s_Special.

Referenced by CSequenceSearchJob::x_DoSearch(), and CFeatureCheckPanel::x_InitTree().

Member Data Documentation

void* CRegexp::m_Extra
private

Pointer to compiled PCRE pattern.

Definition at line 285 of file regexp.hpp.

Referenced by GetMatch(), IsMatch(), Set(), and ~CRegexp().

int CRegexp::m_NumFound
private

The total number of pattern + subpatterns resulting from the last call to GetMatch.

Definition at line 295 of file regexp.hpp.

Referenced by GetMatch(), GetResults(), GetSub(), IsMatch(), and NumFound().

void* CRegexp::m_PReg
private

Definition at line 284 of file regexp.hpp.

Referenced by GetMatch(), IsMatch(), Set(), and ~CRegexp().

int CRegexp::m_Results[(kRegexpMaxSubPatterns+1)*3]
private

Pointer to extra structure used for pattern study.

Array of locations of patterns/subpatterns resulting from the last call to GetMatch(). Also contains 1/3 extra space used internally by the PCRE C library.

Definition at line 291 of file regexp.hpp.

Referenced by GetMatch(), GetResults(), GetSub(), and IsMatch().


The documentation for this class was generated from the following files:
Modified on Fri Apr 20 12:41:02 2018 by modify_doxy.py rev. 546573