The definition of letters and digits is controlled by PCRE's low-valued character tables, and may vary if locale-specific matching is taking place (see "Locale support" in the pcreapi page). The \s characters are HT (9), LF (10), FF (12), CR (13), and space (32).Ī "word" character is an underscore or any character less than 256 that is a letter or digit. This makes it different from the the POSIX "space" class. If the current matching point is at the end of the subject string, all of them fail, since there is no character to match.įor compatibility with Perl, \s does not match the VT character (code 11). They each match one character of the appropriate type. These character type sequences can appear both inside and outside character classes. Any given character matches one, and only one, of each pair. For example:Įach pair of escape sequences partitions the complete set of characters into two disjoint sets. Any subsequent digits stand for themselves. Inside a character class, or if the decimal number is greater than 9 and there have not been that many capturing subpatterns, PCRE re-reads up to three octal digits following the backslash, and generates a single byte from the least significant 8 bits of the value. A description of how this works is given later, following the discussion of parenthesized subpatterns. If the number is less than 10, or if there have been at least that many previous capturing left parentheses in the expression, the entire sequence is taken as a back reference. Outside a character class, PCRE reads it and any following digits as a decimal number. The handling of a backslash followed by a digit other than 0 is complicated. Make sure you supply two digits after the initial zero if the pattern character that follows is itself an octal digit. Thus the sequence \0\x\07 specifies two binary zeros followed by a BEL character (code value 7). In both cases, if there are fewer than two digits, just those that are present are used. Thus \cz becomes hex 1A, but \c.Īfter \0 up to two further octal digits are read. Then bit 6 of the character (hex 40) is inverted. The precise effect of \cx is as follows: if x is a lower case letter, it is converted to upper case. This is different from Perl in that $ and are handled as literals in \Q.\E sequences in PCRE, whereas in Perl, $ and cause variable interpolation. If you want to remove the special meaning from a sequence of characters, you can do so by putting them between \Q and \E. An escaping backslash can be used to include a whitespace or # character as part of the pattern. If a pattern is compiled with the PCRE_EXTENDED option, whitespace in the pattern (other than in a character class) and characters between a # outside a character class and the next newline character are ignored. In particular, if you want to match a backslash, you write \\. This escaping action applies whether or not the following character would otherwise be interpreted as a metacharacter, so it is always safe to precede a non-alphanumeric with backslash to specify that it stands for itself. This use of backslash as an escape character applies both inside and outside character classes.įor example, if you want to match a * character, you write \* in the pattern. Firstly, if it is followed by a non-alphanumeric character, it takes away any special meaning that character may have. The backslash character has several uses. The following sections describe the use of each of the metacharacters. Most characters stand for themselves in a pattern, and match the corresponding characters in the subject. There is also a summary of UTF-8 features in the section on UTF-8 support in the main PCRE page.Ī regular expression is a pattern that is matched against a subject string from left to right. How this affects pattern matching is mentioned in several places below. To use this, you must build PCRE to include UTF-8 support, and then call pcre_compile() with the PCRE_UTF8 option. However, there is now also support for UTF-8 character strings. The original operation of PCRE was on strings of one-byte characters. This description of PCRE's regular expressions is intended as reference material. Jeffrey Friedl's "Mastering Regular Expressions", published by O'Reilly, covers regular expressions in great detail. Regular expressions are also described in the Perl documentation and in a number of books, some of which have copious examples. The syntax and semantics of the regular expressions supported by PCRE are described below. Callouts PCRE Regular Expression Details.Atomic Grouping and Possessive Quantifiers.Regular Expression Reference Table Of ContentsĪtomic Grouping and Possessive Quantifiers