Meta-Character

Description

^

This meta-character, the caret, matches the beginning of a string or, if the /m option isused, match the beginning of a line. It is one oftwo pattern anchors, the other anchor is the $.

.

This meta-character will match any single character except for the newline character unless the /s option is specified. If the /s option is specified, then the newline will also be matched.

$

This meta-character will match the end of a string or,if the /m option is used, match the end of a line.It is one of two pattern anchors; the other anchoris the ^.

|

This meta-character, called alternation, lets you specify two values that can cause the match to succe|ed. For instance, m/a|b/ means that the $_variable must contain the "a" or "b" character forthe match to succeed.

*

This meta-character indicates that the "thing" immediately to the left should be matched zero or more times in order to be evaluated as true (thus .*matches any number of characters).

+

This meta-character indicates that the "thing" immediately to the left should be matched one or more times in order to be evaluated as true.

?

This meta-character indicates that the "thing" immediately to the left should be matched zero or one times to be evaluated as true. When used inconjunction with the +, ?, or {n, m} meta-characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string.

Meta-Brackets

Description

()

The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the "Special Variables" chapter for moredetails.

(?...)

If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified; this is new to Perl 5.

(?#comment)

Extension: comment is any text.

(?:regx)

Extension: regx is any regular expression but () are not saved as a backreference.

(?=regx)

Extension: Allows matching of zero-width positive lookahead characters (that is, the regular expression is matched but not returned as being matched).

(?!regx)

Extension: Allows matching of zero-width negative lookahead characters (that is, negated form of (=regx)).

(?options)

Extension: Applies the specified options to the pattern bypassing the need for the option to specified in the normal way. Valid options are: i (case insenstive), m (treat as multiple lines), s (treat as single line), and x (allow whitespace and comments).

{n, m}

Braces let you specify how many times the "thing" immediately to the left should be matched. {n} means that it should be matched exactly n times. {n,} means it must be matched at least n times. {n, m} means that it must be matched at least n times but not more than m times.

[]

Square brackets let you create a character class. For instance, m/[abc]/ evaluates to True if any of "a", "b", or "c" is contained in $_. The square brackets are a more readable alternative to the alternation meta-character.

Meta-Sequences

Description

\

This meta-character "escapes" the character which follows. This means that any special meaning normally attached to that character is ignored. For instance, if you need to include a dollar sign in a pattern, you must use \$ to avoid Perl's variable interpolation. Use \\ to specify the backslash character in your pattern.

\nnn

Any octal byte where nnn represents the octal number; this allows any character to be specified by its octal number.

\a

The alarm character; this is a special character which, when printed, produces a warning bell sound.

\A

This meta-sequence represents the beginning of the string. Its meaning is not affected by the /m option.

\b

This meta-sequence represents the backspace character inside a character class; otherwise, it represents a word boundary. A word boundary is the spot between word (\w) and non-word (\W) characters. Perl thinks that the \W meta-sequence matches the imaginary characters of the end of the string.

\B

Match a non-word boundary.

\cn

Any control character where n is the character (for example, \cY for Ctrl+Y).

\d

Match a single digit character.

\D

Match a single non-digit character.

\e

The escape character.

\E

Terminate the \L or \U sequence.

\f

The form feed character.

\G

Match only where the previous m//g left off.

\l

Change the next character to lowercase.

\L

Change the following characters to lowercase until a \E sequence is encountered.

\n

The newline character.

\Q

Quote regular expression meta-characters literally until the \E sequence is encountered.

\r

The carriage return character.

\s

Match a single whitespace character.

\S

Match a single non-whitespace character.

\t

The tab character.

\u

Change the next character to uppercase.

\U

Change the following characters to uppercase until a \E sequence is encountered.

\v

The vertical tab character.

\w

Match a single word character. Word characters are the alphanumeric and underscore characters.

\W

Match a single non-word character.

\xnn

Any hexadecimal byte.

\Z

This meta-sequence represents the end of the string. Its meaning is not affected by the /m option.

\$

The dollar character.

\@

The ampersand character.

\%

The percent character.


0 comments:

Write a Comment!