Amazing Perl

AvatarThe amazing world of perl for engineers!

Backreferences

As we mentioned in the "Backslashed Tokens" section earlier in this chapter, pattern matching produces quantities that are known as backreferences. These quantities are the parts of your string in which the match succeeded. You need to tell Perl to store them by surrounding the relevant parts of your regular expression with parentheses, and you can refer to them after the match as \1, \2, and so on. The following example determines whether the user typed three consecutive four-letter words:

while (<>)  {
                       /\b(\S{4})\s(\S{4})\s(\S{4})\b/ && print "Gosh, you said $1 $2 $3!\n";
   }

The first four-letter word lies between a word boundary (\b) and some white space (\s), and consists of four non-white-space characters (\S). If there is a match on the expression \b(\S{4})\s-if a four-letter word is found-the matching substring is stored in the special variable \1, and the search continues. When the search is complete, you can refer to the backreferences as $1, $2, and so on.

What if you don't know in advance how many matches to expect? Perform the match in an array context; Perl returns the matches in an array. Consider this example:

 @hits = ("Yon Yonson, Wisconsin" =~ /(\won)/g);
print "Matched on ", join(', ', @hits), ".\n";

We'll start at the right side and work backward. The regular expression (\won) means that we match any alphanumeric character followed by on and store all three characters. The g option after the // operator means that we want to do this for the entire string, even after we find a match. The =~ operator means that we carry out this operation on a given string (Yon Yonson, Wisconsin). Finally, the whole thing is evaluated in an array context, so Perl returns the array of matches, and we store it in the @hits array. Following is the output from this example:

Matched on Yon, Yon, son, con.

0 comments:

Post a Comment