NCBI C++ Toolkit Cross Reference

C++/src/util/regexp/README


  1 README file for PCRE (Perl-compatible regular expression library)
  2 -----------------------------------------------------------------
  3 
  4 The latest release of PCRE is always available in three alternative formats
  5 from:
  6 
  7   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
  8   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.bz2
  9   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.zip
 10 
 11 There is a mailing list for discussion about the development of PCRE at
 12 
 13   pcre-dev@exim.org
 14 
 15 Please read the NEWS file if you are upgrading from a previous release.
 16 The contents of this README file are:
 17 
 18   The PCRE APIs
 19   Documentation for PCRE
 20   Contributions by users of PCRE
 21   Building PCRE on non-Unix systems
 22   Building PCRE on Unix-like systems
 23   Retrieving configuration information on Unix-like systems
 24   Shared libraries on Unix-like systems
 25   Cross-compiling on Unix-like systems
 26   Using HP's ANSI C++ compiler (aCC)
 27   Making new tarballs
 28   Testing PCRE
 29   Character tables
 30   File manifest
 31 
 32 
 33 The PCRE APIs
 34 -------------
 35 
 36 PCRE is written in C, and it has its own API. The distribution also includes a
 37 set of C++ wrapper functions (see the pcrecpp man page for details), courtesy
 38 of Google Inc.
 39 
 40 In addition, there is a set of C wrapper functions that are based on the POSIX
 41 regular expression API (see the pcreposix man page). These end up in the
 42 library called libpcreposix. Note that this just provides a POSIX calling
 43 interface to PCRE; the regular expressions themselves still follow Perl syntax
 44 and semantics. The POSIX API is restricted, and does not give full access to
 45 all of PCRE's facilities.
 46 
 47 The header file for the POSIX-style functions is called pcreposix.h. The
 48 official POSIX name is regex.h, but I did not want to risk possible problems
 49 with existing files of that name by distributing it that way. To use PCRE with
 50 an existing program that uses the POSIX API, pcreposix.h will have to be
 51 renamed or pointed at by a link.
 52 
 53 If you are using the POSIX interface to PCRE and there is already a POSIX regex
 54 library installed on your system, as well as worrying about the regex.h header
 55 file (as mentioned above), you must also take care when linking programs to
 56 ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
 57 up the POSIX functions of the same name from the other library.
 58 
 59 One way of avoiding this confusion is to compile PCRE with the addition of
 60 -Dregcomp=PCREregcomp (and similarly for the other POSIX functions) to the
 61 compiler flags (CFLAGS if you are using "configure" -- see below). This has the
 62 effect of renaming the functions so that the names no longer clash. Of course,
 63 you have to do the same thing for your applications, or write them using the
 64 new names.
 65 
 66 
 67 Documentation for PCRE
 68 ----------------------
 69 
 70 If you install PCRE in the normal way on a Unix-like system, you will end up
 71 with a set of man pages whose names all start with "pcre". The one that is just
 72 called "pcre" lists all the others. In addition to these man pages, the PCRE
 73 documentation is supplied in two other forms:
 74 
 75   1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
 76      doc/pcretest.txt in the source distribution. The first of these is a
 77      concatenation of the text forms of all the section 3 man pages except
 78      those that summarize individual functions. The other two are the text
 79      forms of the section 1 man pages for the pcregrep and pcretest commands.
 80      These text forms are provided for ease of scanning with text editors or
 81      similar tools. They are installed in <prefix>/share/doc/pcre, where
 82      <prefix> is the installation prefix (defaulting to /usr/local).
 83 
 84   2. A set of files containing all the documentation in HTML form, hyperlinked
 85      in various ways, and rooted in a file called index.html, is distributed in
 86      doc/html and installed in <prefix>/share/doc/pcre/html.
 87 
 88 Users of PCRE have contributed files containing the documentation for various
 89 releases in CHM format. These can be found in the Contrib directory of the FTP
 90 site (see next section).
 91 
 92 
 93 Contributions by users of PCRE
 94 ------------------------------
 95 
 96 You can find contributions from PCRE users in the directory
 97 
 98   ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
 99 
100 There is a README file giving brief descriptions of what they are. Some are
101 complete in themselves; others are pointers to URLs containing relevant files.
102 Some of this material is likely to be well out-of-date. Several of the earlier
103 contributions provided support for compiling PCRE on various flavours of
104 Windows (I myself do not use Windows). Nowadays there is more Windows support
105 in the standard distribution, so these contibutions have been archived.
106 
107 
108 Building PCRE on non-Unix systems
109 ---------------------------------
110 
111 For a non-Unix system, please read the comments in the file NON-UNIX-USE,
112 though if your system supports the use of "configure" and "make" you may be
113 able to build PCRE in the same way as for Unix-like systems. PCRE can also be
114 configured in many platform environments using the GUI facility of CMake's
115 CMakeSetup. It creates Makefiles, solution files, etc.
116 
117 PCRE has been compiled on many different operating systems. It should be
118 straightforward to build PCRE on any system that has a Standard C compiler and
119 library, because it uses only Standard C functions.
120 
121 
122 Building PCRE on Unix-like systems
123 ----------------------------------
124 
125 If you are using HP's ANSI C++ compiler (aCC), please see the special note
126 in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
127 
128 The following instructions assume the use of the widely used "configure, make,
129 make install" process. There is also support for CMake in the PCRE
130 distribution; there are some comments about using CMake in the NON-UNIX-USE
131 file, though it can also be used in Unix-like systems.
132 
133 To build PCRE on a Unix-like system, first run the "configure" command from the
134 PCRE distribution directory, with your current directory set to the directory
135 where you want the files to be created. This command is a standard GNU
136 "autoconf" configuration script, for which generic instructions are supplied in
137 the file INSTALL.
138 
139 Most commonly, people build PCRE within its own distribution directory, and in
140 this case, on many systems, just running "./configure" is sufficient. However,
141 the usual methods of changing standard defaults are available. For example:
142 
143 CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
144 
145 specifies that the C compiler should be run with the flags '-O2 -Wall' instead
146 of the default, and that "make install" should install PCRE under /opt/local
147 instead of the default /usr/local.
148 
149 If you want to build in a different directory, just run "configure" with that
150 directory as current. For example, suppose you have unpacked the PCRE source
151 into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
152 
153 cd /build/pcre/pcre-xxx
154 /source/pcre/pcre-xxx/configure
155 
156 PCRE is written in C and is normally compiled as a C library. However, it is
157 possible to build it as a C++ library, though the provided building apparatus
158 does not have any features to support this.
159 
160 There are some optional features that can be included or omitted from the PCRE
161 library. You can read more about them in the pcrebuild man page.
162 
163 . If you want to suppress the building of the C++ wrapper library, you can add
164   --disable-cpp to the "configure" command. Otherwise, when "configure" is run,
165   it will try to find a C++ compiler and C++ header files, and if it succeeds,
166   it will try to build the C++ wrapper.
167 
168 . If you want to make use of the support for UTF-8 Unicode character strings in
169   PCRE, you must add --enable-utf8 to the "configure" command. Without it, the
170   code for handling UTF-8 is not included in the library. Even when included,
171   it still has to be enabled by an option at run time. When PCRE is compiled
172   with this option, its input can only either be ASCII or UTF-8, even when
173   running on EBCDIC platforms. It is not possible to use both --enable-utf8 and
174   --enable-ebcdic at the same time.
175 
176 . If, in addition to support for UTF-8 character strings, you want to include
177   support for the \P, \p, and \X sequences that recognize Unicode character
178   properties, you must add --enable-unicode-properties to the "configure"
179   command. This adds about 30K to the size of the library (in the form of a
180   property table); only the basic two-letter properties such as Lu are
181   supported.
182 
183 . You can build PCRE to recognize either CR or LF or the sequence CRLF or any
184   of the preceding, or any of the Unicode newline sequences as indicating the
185   end of a line. Whatever you specify at build time is the default; the caller
186   of PCRE can change the selection at run time. The default newline indicator
187   is a single LF character (the Unix standard). You can specify the default
188   newline indicator by adding --enable-newline-is-cr or --enable-newline-is-lf
189   or --enable-newline-is-crlf or --enable-newline-is-anycrlf or
190   --enable-newline-is-any to the "configure" command, respectively.
191 
192   If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
193   the standard tests will fail, because the lines in the test files end with
194   LF. Even if the files are edited to change the line endings, there are likely
195   to be some failures. With --enable-newline-is-anycrlf or
196   --enable-newline-is-any, many tests should succeed, but there may be some
197   failures.
198 
199 . By default, the sequence \R in a pattern matches any Unicode line ending
200   sequence. This is independent of the option specifying what PCRE considers to
201   be the end of a line (see above). However, the caller of PCRE can restrict \R
202   to match only CR, LF, or CRLF. You can make this the default by adding
203   --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
204 
205 . When called via the POSIX interface, PCRE uses malloc() to get additional
206   storage for processing capturing parentheses if there are more than 10 of
207   them in a pattern. You can increase this threshold by setting, for example,
208 
209   --with-posix-malloc-threshold=20
210 
211   on the "configure" command.
212 
213 . PCRE has a counter that can be set to limit the amount of resources it uses.
214   If the limit is exceeded during a match, the match fails. The default is ten
215   million. You can change the default by setting, for example,
216 
217   --with-match-limit=500000
218 
219   on the "configure" command. This is just the default; individual calls to
220   pcre_exec() can supply their own value. There is more discussion on the
221   pcreapi man page.
222 
223 . There is a separate counter that limits the depth of recursive function calls
224   during a matching process. This also has a default of ten million, which is
225   essentially "unlimited". You can change the default by setting, for example,
226 
227   --with-match-limit-recursion=500000
228 
229   Recursive function calls use up the runtime stack; running out of stack can
230   cause programs to crash in strange ways. There is a discussion about stack
231   sizes in the pcrestack man page.
232 
233 . The default maximum compiled pattern size is around 64K. You can increase
234   this by adding --with-link-size=3 to the "configure" command. You can
235   increase it even more by setting --with-link-size=4, but this is unlikely
236   ever to be necessary. Increasing the internal link size will reduce
237   performance.
238 
239 . You can build PCRE so that its internal match() function that is called from
240   pcre_exec() does not call itself recursively. Instead, it uses memory blocks
241   obtained from the heap via the special functions pcre_stack_malloc() and
242   pcre_stack_free() to save data that would otherwise be saved on the stack. To
243   build PCRE like this, use
244 
245   --disable-stack-for-recursion
246 
247   on the "configure" command. PCRE runs more slowly in this mode, but it may be
248   necessary in environments with limited stack sizes. This applies only to the
249   pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
250   use deeply nested recursion. There is a discussion about stack sizes in the
251   pcrestack man page.
252 
253 . For speed, PCRE uses four tables for manipulating and identifying characters
254   whose code point values are less than 256. By default, it uses a set of
255   tables for ASCII encoding that is part of the distribution. If you specify
256 
257   --enable-rebuild-chartables
258 
259   a program called dftables is compiled and run in the default C locale when
260   you obey "make". It builds a source file called pcre_chartables.c. If you do
261   not specify this option, pcre_chartables.c is created as a copy of
262   pcre_chartables.c.dist. See "Character tables" below for further information.
263 
264 . It is possible to compile PCRE for use on systems that use EBCDIC as their
265   character code (as opposed to ASCII) by specifying
266 
267   --enable-ebcdic
268 
269   This automatically implies --enable-rebuild-chartables (see above). However,
270   when PCRE is built this way, it always operates in EBCDIC. It cannot support
271   both EBCDIC and UTF-8.
272 
273 . It is possible to compile pcregrep to use libz and/or libbz2, in order to
274   read .gz and .bz2 files (respectively), by specifying one or both of
275 
276   --enable-pcregrep-libz
277   --enable-pcregrep-libbz2
278 
279   Of course, the relevant libraries must be installed on your system.
280 
281 . It is possible to compile pcretest so that it links with the libreadline
282   library, by specifying
283 
284   --enable-pcretest-libreadline
285 
286   If this is done, when pcretest's input is from a terminal, it reads it using
287   the readline() function. This provides line-editing and history facilities.
288   Note that libreadline is GPL-licenced, so if you distribute a binary of
289   pcretest linked in this way, there may be licensing issues.
290 
291   Setting this option causes the -lreadline option to be added to the pcretest
292   build. In many operating environments with a sytem-installed readline
293   library this is sufficient. However, in some environments (e.g. if an
294   unmodified distribution version of readline is in use), it may be necessary
295   to specify something like LIBS="-lncurses" as well. This is because, to quote
296   the readline INSTALL, "Readline uses the termcap functions, but does not link
297   with the termcap or curses library itself, allowing applications which link
298   with readline the to choose an appropriate library." If you get error
299   messages about missing functions tgetstr, tgetent, tputs, tgetflag, or tgoto,
300   this is the problem, and linking with the ncurses library should fix it.
301 
302 The "configure" script builds the following files for the basic C library:
303 
304 . Makefile is the makefile that builds the library
305 . config.h contains build-time configuration options for the library
306 . pcre.h is the public PCRE header file
307 . pcre-config is a script that shows the settings of "configure" options
308 . libpcre.pc is data for the pkg-config command
309 . libtool is a script that builds shared and/or static libraries
310 . RunTest is a script for running tests on the basic C library
311 . RunGrepTest is a script for running tests on the pcregrep command
312 
313 Versions of config.h and pcre.h are distributed in the PCRE tarballs under
314 the names config.h.generic and pcre.h.generic. These are provided for the
315 benefit of those who have to built PCRE without the benefit of "configure". If
316 you use "configure", the .generic versions are not used.
317 
318 If a C++ compiler is found, the following files are also built:
319 
320 . libpcrecpp.pc is data for the pkg-config command
321 . pcrecpparg.h is a header file for programs that call PCRE via the C++ wrapper
322 . pcre_stringpiece.h is the header for the C++ "stringpiece" functions
323 
324 The "configure" script also creates config.status, which is an executable
325 script that can be run to recreate the configuration, and config.log, which
326 contains compiler output from tests that "configure" runs.
327 
328 Once "configure" has run, you can run "make". It builds two libraries, called
329 libpcre and libpcreposix, a test program called pcretest, and the pcregrep
330 command. If a C++ compiler was found on your system, "make" also builds the C++
331 wrapper library, which is called libpcrecpp, and some test programs called
332 pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
333 Building the C++ wrapper can be disabled by adding --disable-cpp to the
334 "configure" command.
335 
336 The command "make check" runs all the appropriate tests. Details of the PCRE
337 tests are given below in a separate section of this document.
338 
339 You can use "make install" to install PCRE into live directories on your
340 system. The following are installed (file names are all relative to the
341 <prefix> that is set when "configure" is run):
342 
343   Commands (bin):
344     pcretest
345     pcregrep
346     pcre-config
347 
348   Libraries (lib):
349     libpcre
350     libpcreposix
351     libpcrecpp (if C++ support is enabled)
352 
353   Configuration information (lib/pkgconfig):
354     libpcre.pc
355     libpcrecpp.pc (if C++ support is enabled)
356 
357   Header files (include):
358     pcre.h
359     pcreposix.h
360     pcre_scanner.h      )
361     pcre_stringpiece.h  ) if C++ support is enabled
362     pcrecpp.h           )
363     pcrecpparg.h        )
364 
365   Man pages (share/man/man{1,3}):
366     pcregrep.1
367     pcretest.1
368     pcre.3
369     pcre*.3 (lots more pages, all starting "pcre")
370 
371   HTML documentation (share/doc/pcre/html):
372     index.html
373     *.html (lots more pages, hyperlinked from index.html)
374 
375   Text file documentation (share/doc/pcre):
376     AUTHORS
377     COPYING
378     ChangeLog
379     LICENCE
380     NEWS
381     README
382     pcre.txt       (a concatenation of the man(3) pages)
383     pcretest.txt   the pcretest man page
384     pcregrep.txt   the pcregrep man page
385 
386 If you want to remove PCRE from your system, you can run "make uninstall".
387 This removes all the files that "make install" installed. However, it does not
388 remove any directories, because these are often shared with other programs.
389 
390 
391 Retrieving configuration information on Unix-like systems
392 ---------------------------------------------------------
393 
394 Running "make install" installs the command pcre-config, which can be used to
395 recall information about the PCRE configuration and installation. For example:
396 
397   pcre-config --version
398 
399 prints the version number, and
400 
401   pcre-config --libs
402 
403 outputs information about where the library is installed. This command can be
404 included in makefiles for programs that use PCRE, saving the programmer from
405 having to remember too many details.
406 
407 The pkg-config command is another system for saving and retrieving information
408 about installed libraries. Instead of separate commands for each library, a
409 single command is used. For example:
410 
411   pkg-config --cflags pcre
412 
413 The data is held in *.pc files that are installed in a directory called
414 <prefix>/lib/pkgconfig.
415 
416 
417 Shared libraries on Unix-like systems
418 -------------------------------------
419 
420 The default distribution builds PCRE as shared libraries and static libraries,
421 as long as the operating system supports shared libraries. Shared library
422 support relies on the "libtool" script which is built as part of the
423 "configure" process.
424 
425 The libtool script is used to compile and link both shared and static
426 libraries. They are placed in a subdirectory called .libs when they are newly
427 built. The programs pcretest and pcregrep are built to use these uninstalled
428 libraries (by means of wrapper scripts in the case of shared libraries). When
429 you use "make install" to install shared libraries, pcregrep and pcretest are
430 automatically re-built to use the newly installed shared libraries before being
431 installed themselves. However, the versions left in the build directory still
432 use the uninstalled libraries.
433 
434 To build PCRE using static libraries only you must use --disable-shared when
435 configuring it. For example:
436 
437 ./configure --prefix=/usr/gnu --disable-shared
438 
439 Then run "make" in the usual way. Similarly, you can use --disable-static to
440 build only shared libraries.
441 
442 
443 Cross-compiling on Unix-like systems
444 ------------------------------------
445 
446 You can specify CC and CFLAGS in the normal way to the "configure" command, in
447 order to cross-compile PCRE for some other host. However, you should NOT
448 specify --enable-rebuild-chartables, because if you do, the dftables.c source
449 file is compiled and run on the local host, in order to generate the inbuilt
450 character tables (the pcre_chartables.c file). This will probably not work,
451 because dftables.c needs to be compiled with the local compiler, not the cross
452 compiler.
453 
454 When --enable-rebuild-chartables is not specified, pcre_chartables.c is created
455 by making a copy of pcre_chartables.c.dist, which is a default set of tables
456 that assumes ASCII code. Cross-compiling with the default tables should not be
457 a problem.
458 
459 If you need to modify the character tables when cross-compiling, you should
460 move pcre_chartables.c.dist out of the way, then compile dftables.c by hand and
461 run it on the local host to make a new version of pcre_chartables.c.dist.
462 Then when you cross-compile PCRE this new version of the tables will be used.
463 
464 
465 Using HP's ANSI C++ compiler (aCC)
466 ----------------------------------
467 
468 Unless C++ support is disabled by specifying the "--disable-cpp" option of the
469 "configure" script, you must include the "-AA" option in the CXXFLAGS
470 environment variable in order for the C++ components to compile correctly.
471 
472 Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
473 needed libraries fail to get included when specifying the "-AA" compiler
474 option. If you experience unresolved symbols when linking the C++ programs,
475 use the workaround of specifying the following environment variable prior to
476 running the "configure" script:
477 
478   CXXLDFLAGS="-lstd_v2 -lCsup_v2"
479 
480 
481 Making new tarballs
482 -------------------
483 
484 The command "make dist" creates three PCRE tarballs, in tar.gz, tar.bz2, and
485 zip formats. The command "make distcheck" does the same, but then does a trial
486 build of the new distribution to ensure that it works.
487 
488 If you have modified any of the man page sources in the doc directory, you
489 should first run the PrepareRelease script before making a distribution. This
490 script creates the .txt and HTML forms of the documentation from the man pages.
491 
492 
493 Testing PCRE
494 ------------
495 
496 To test the basic PCRE library on a Unix system, run the RunTest script that is
497 created by the configuring process. There is also a script called RunGrepTest
498 that tests the options of the pcregrep command. If the C++ wrapper library is
499 built, three test programs called pcrecpp_unittest, pcre_scanner_unittest, and
500 pcre_stringpiece_unittest are also built.
501 
502 Both the scripts and all the program tests are run if you obey "make check" or
503 "make test". For other systems, see the instructions in NON-UNIX-USE.
504 
505 The RunTest script runs the pcretest test program (which is documented in its
506 own man page) on each of the testinput files in the testdata directory in
507 turn, and compares the output with the contents of the corresponding testoutput
508 files. A file called testtry is used to hold the main output from pcretest
509 (testsavedregex is also used as a working file). To run pcretest on just one of
510 the test files, give its number as an argument to RunTest, for example:
511 
512   RunTest 2
513 
514 The first test file can also be fed directly into the perltest.pl script to
515 check that Perl gives the same results. The only difference you should see is
516 in the first few lines, where the Perl version is given instead of the PCRE
517 version.
518 
519 The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
520 pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
521 detection, and run-time flags that are specific to PCRE, as well as the POSIX
522 wrapper API. It also uses the debugging flags to check some of the internals of
523 pcre_compile().
524 
525 If you build PCRE with a locale setting that is not the standard C locale, the
526 character tables may be different (see next paragraph). In some cases, this may
527 cause failures in the second set of tests. For example, in a locale where the
528 isprint() function yields TRUE for characters in the range 128-255, the use of
529 [:isascii:] inside a character class defines a different set of characters, and
530 this shows up in this test as a difference in the compiled code, which is being
531 listed for checking. Where the comparison test output contains [\x00-\x7f] the
532 test will contain [\x00-\xff], and similarly in some other cases. This is not a
533 bug in PCRE.
534 
535 The third set of tests checks pcre_maketables(), the facility for building a
536 set of character tables for a specific locale and using them instead of the
537 default tables. The tests make use of the "fr_FR" (French) locale. Before
538 running the test, the script checks for the presence of this locale by running
539 the "locale" command. If that command fails, or if it doesn't include "fr_FR"
540 in the list of available locales, the third test cannot be run, and a comment
541 is output to say why. If running this test produces instances of the error
542 
543   ** Failed to set locale "fr_FR"
544 
545 in the comparison output, it means that locale is not available on your system,
546 despite being listed by "locale". This does not mean that PCRE is broken.
547 
548 [If you are trying to run this test on Windows, you may be able to get it to
549 work by changing "fr_FR" to "french" everywhere it occurs. Alternatively, use
550 RunTest.bat. The version of RunTest.bat included with PCRE 7.4 and above uses
551 Windows versions of test 2. More info on using RunTest.bat is included in the
552 document entitled NON-UNIX-USE.]
553 
554 The fourth test checks the UTF-8 support. It is not run automatically unless
555 PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
556 running "configure". This file can be also fed directly to the perltest script,
557 provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
558 commented in the script, can be be used.)
559 
560 The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
561 features of PCRE that are not relevant to Perl.
562 
563 The sixth test checks the support for Unicode character properties. It it not
564 run automatically unless PCRE is built with Unicode property support. To to
565 this you must set --enable-unicode-properties when running "configure".
566 
567 The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
568 matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
569 property support, respectively. The eighth and ninth tests are not run
570 automatically unless PCRE is build with the relevant support.
571 
572 
573 Character tables
574 ----------------
575 
576 For speed, PCRE uses four tables for manipulating and identifying characters
577 whose code point values are less than 256. The final argument of the
578 pcre_compile() function is a pointer to a block of memory containing the
579 concatenated tables. A call to pcre_maketables() can be used to generate a set
580 of tables in the current locale. If the final argument for pcre_compile() is
581 passed as NULL, a set of default tables that is built into the binary is used.
582 
583 The source file called pcre_chartables.c contains the default set of tables. By
584 default, this is created as a copy of pcre_chartables.c.dist, which contains
585 tables for ASCII coding. However, if --enable-rebuild-chartables is specified
586 for ./configure, a different version of pcre_chartables.c is built by the
587 program dftables (compiled from dftables.c), which uses the ANSI C character
588 handling functions such as isalnum(), isalpha(), isupper(), islower(), etc. to
589 build the table sources. This means that the default C locale which is set for
590 your system will control the contents of these default tables. You can change
591 the default tables by editing pcre_chartables.c and then re-building PCRE. If
592 you do this, you should take care to ensure that the file does not get
593 automatically re-generated. The best way to do this is to move
594 pcre_chartables.c.dist out of the way and replace it with your customized
595 tables.
596 
597 When the dftables program is run as a result of --enable-rebuild-chartables,
598 it uses the default C locale that is set on your system. It does not pay
599 attention to the LC_xxx environment variables. In other words, it uses the
600 system's default locale rather than whatever the compiling user happens to have
601 set. If you really do want to build a source set of character tables in a
602 locale that is specified by the LC_xxx variables, you can run the dftables
603 program by hand with the -L option. For example:
604 
605   ./dftables -L pcre_chartables.c.special
606 
607 The first two 256-byte tables provide lower casing and case flipping functions,
608 respectively. The next table consists of three 32-byte bit maps which identify
609 digits, "word" characters, and white space, respectively. These are used when
610 building 32-byte bit maps that represent character classes for code points less
611 than 256.
612 
613 The final 256-byte table has bits indicating various character types, as
614 follows:
615 
616     1   white space character
617     2   letter
618     4   decimal digit
619     8   hexadecimal digit
620    16   alphanumeric or '_'
621   128   regular expression metacharacter or binary zero
622 
623 You should not alter the set of characters that contain the 128 bit, as that
624 will cause PCRE to malfunction.
625 
626 
627 File manifest
628 -------------
629 
630 The distribution should contain the following files:
631 
632 (A) Source files of the PCRE library functions and their headers:
633 
634   dftables.c              auxiliary program for building pcre_chartables.c
635                             when --enable-rebuild-chartables is specified
636 
637   pcre_chartables.c.dist  a default set of character tables that assume ASCII
638                             coding; used, unless --enable-rebuild-chartables is
639                             specified, by copying to pcre_chartables.c
640 
641   pcreposix.c             )
642   pcre_compile.c          )
643   pcre_config.c           )
644   pcre_dfa_exec.c         )
645   pcre_exec.c             )
646   pcre_fullinfo.c         )
647   pcre_get.c              ) sources for the functions in the library,
648   pcre_globals.c          )   and some internal functions that they use
649   pcre_info.c             )
650   pcre_maketables.c       )
651   pcre_newline.c          )
652   pcre_ord2utf8.c         )
653   pcre_refcount.c         )
654   pcre_study.c            )
655   pcre_tables.c           )
656   pcre_try_flipped.c      )
657   pcre_ucd.c              )
658   pcre_valid_utf8.c       )
659   pcre_version.c          )
660   pcre_xclass.c           )
661   pcre_printint.src       ) debugging function that is #included in pcretest,
662                           )   and can also be #included in pcre_compile()
663   pcre.h.in               template for pcre.h when built by "configure"
664   pcreposix.h             header for the external POSIX wrapper API
665   pcre_internal.h         header for internal use
666   ucp.h                   header for Unicode property handling
667 
668   config.h.in             template for config.h, which is built by "configure"
669 
670   pcrecpp.h               public header file for the C++ wrapper
671   pcrecpparg.h.in         template for another C++ header file
672   pcre_scanner.h          public header file for C++ scanner functions
673   pcrecpp.cc              )
674   pcre_scanner.cc         ) source for the C++ wrapper library
675 
676   pcre_stringpiece.h.in   template for pcre_stringpiece.h, the header for the
677                             C++ stringpiece functions
678   pcre_stringpiece.cc     source for the C++ stringpiece functions
679 
680 (B) Source files for programs that use PCRE:
681 
682   pcredemo.c              simple demonstration of coding calls to PCRE
683   pcregrep.c              source of a grep utility that uses PCRE
684   pcretest.c              comprehensive test program
685 
686 (C) Auxiliary files:
687 
688   132html                 script to turn "man" pages into HTML
689   AUTHORS                 information about the author of PCRE
690   ChangeLog               log of changes to the code
691   CleanTxt                script to clean nroff output for txt man pages
692   Detrail                 script to remove trailing spaces
693   HACKING                 some notes about the internals of PCRE
694   INSTALL                 generic installation instructions
695   LICENCE                 conditions for the use of PCRE
696   COPYING                 the same, using GNU's standard name
697   Makefile.in             ) template for Unix Makefile, which is built by
698                           )   "configure"
699   Makefile.am             ) the automake input that was used to create
700                           )   Makefile.in
701   NEWS                    important changes in this release
702   NON-UNIX-USE            notes on building PCRE on non-Unix systems
703   PrepareRelease          script to make preparations for "make dist"
704   README                  this file
705   RunTest                 a Unix shell script for running tests
706   RunGrepTest             a Unix shell script for pcregrep tests
707   aclocal.m4              m4 macros (generated by "aclocal")
708   config.guess            ) files used by libtool,
709   config.sub              )   used only when building a shared library
710   configure               a configuring shell script (built by autoconf)
711   configure.ac            ) the autoconf input that was used to build
712                           )   "configure" and config.h
713   depcomp                 ) script to find program dependencies, generated by
714                           )   automake
715   doc/*.3                 man page sources for the PCRE functions
716   doc/*.1                 man page sources for pcregrep and pcretest
717   doc/index.html.src      the base HTML page
718   doc/html/*              HTML documentation
719   doc/pcre.txt            plain text version of the man pages
720   doc/pcretest.txt        plain text documentation of test program
721   doc/perltest.txt        plain text documentation of Perl test program
722   install-sh              a shell script for installing files
723   libpcre.pc.in           template for libpcre.pc for pkg-config
724   libpcrecpp.pc.in        template for libpcrecpp.pc for pkg-config
725   ltmain.sh               file used to build a libtool script
726   missing                 ) common stub for a few missing GNU programs while
727                           )   installing, generated by automake
728   mkinstalldirs           script for making install directories
729   perltest.pl             Perl test program
730   pcre-config.in          source of script which retains PCRE information
731   pcrecpp_unittest.cc          )
732   pcre_scanner_unittest.cc     ) test programs for the C++ wrapper
733   pcre_stringpiece_unittest.cc )
734   testdata/testinput*     test data for main library tests
735   testdata/testoutput*    expected test results
736   testdata/grep*          input and output for pcregrep tests
737 
738 (D) Auxiliary files for cmake support
739 
740   cmake/COPYING-CMAKE-SCRIPTS
741   cmake/FindPackageHandleStandardArgs.cmake
742   cmake/FindReadline.cmake
743   CMakeLists.txt
744   config-cmake.h.in
745 
746 (E) Auxiliary files for VPASCAL
747 
748   makevp.bat
749   makevp_c.txt
750   makevp_l.txt
751   pcregexp.pas
752 
753 (F) Auxiliary files for building PCRE "by hand"
754 
755   pcre.h.generic          ) a version of the public PCRE header file
756                           )   for use in non-"configure" environments
757   config.h.generic        ) a version of config.h for use in non-"configure"
758                           )   environments
759 
760 (F) Miscellaneous
761 
762   RunTest.bat            a script for running tests under Windows
763 
764 Philip Hazel
765 Email local part: ph10
766 Email domain: cam.ac.uk
767 Last updated: 21 March 2009

source navigation ]   [ diff markup ]   [ identifier search ]   [ freetext search ]   [ file search ]  

This page was automatically generated by the LXR engine.
Visit the LXR main site for more information.