Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perl & Regular Expressions (RegEx). Regular Expressions A regular expression (regex for short) is a way to describe a text pattern to search for within.

Similar presentations


Presentation on theme: "Perl & Regular Expressions (RegEx). Regular Expressions A regular expression (regex for short) is a way to describe a text pattern to search for within."— Presentation transcript:

1 Perl & Regular Expressions (RegEx)

2 Regular Expressions A regular expression (regex for short) is a way to describe a text pattern to search for within a string. Regexes can be simple strings, which generally match letter-for-letter, or more complex expressions written using Perls regex grammar. Examples and explanations will follow.

3 Regular Expressions The Match Operator m/PATTERN/ is Perls built-in match operator. It operates on a scalar (or by default on $_). It searches the scalar for the text described by PATTERN and returns 1 for success or for failure. To test if PATTERN exists in the string, use the =~ binding operator. The !~ binding operator is equivalent to !( =~ ). Important: using == and != instead of =~ and !~ will result in wrong results.

4 Regular Expressions The Match Operator #!/usr/bin/perl $name = Amir Sahar; if ($name =~ m/Dov/) { print I thought that I was Amir?!\n; } elsif ($name =~ m/Amir/) { print whew… I still remember my name\n; } else { print what the hell is my name?\n; }

5 Regular Expressions The Match Operator The m// operator interpolates its contents like double-quoted strings. This lets you use variables in your search patterns. The m// can use any nonalphanumeric, nonwhitespace delimiter instead of the / delimiter. This can come in handy if you are trying to match a pattern that includes / such as a path name. Instead of: m/\/usr\/local\/bin/ Prefer: m!/usr/local/bin! or m(/usr/local/bin) When using a delimiter other than /, the m must be specified; otherwise it may be omitted: /PATTERN/

6 Regular Expressions Built-in Match Variables There are a few useful built-in variables that can help you with processing your pattern matching results: $& - contains the matched string $` - contains everything before the matched string $ - contains anything after the matched string Notice that if the match fails, these variables will not change from their previous values, therefore they cant be used for testing for a successful match. Will produce: before:I am running after:of funny examples matched:out #!/usr/bin/perl $problem = I am running out of funny examples; $problem =~ m/out/; print before:$`\nafter:$'\nmatched:$&\n;

7 The Substitute Operator In addition to the m/PATTERN/ operator, a s/PATTERN/SUBSTITUTION/ operator exists. This operator searches for PATTERN in the string and replaces it with SUBSTITUTION. The SUBSTITUTION is treated as a double quoted string including interpolation of variables. s/// is used the same way as m// The script above will produce: I wish that this course would continue forever #!/usr/bin/perl $wish = I wish that this course would be over; $wish =~ s/be over/continue forever/; print $wish\n;

8 Metasymbols So far weve seen patterns which match a fixed text sequence. More generic text patterns can be described using various predefined metasymbols and metacharacters. the most useful metasymbols are described in the following slides.

9 Metasymbols Each of the metasymbols in this table matches a single character in the text being searched: MatchesMetasymbol Any alphanumeric character, as well as underscore \w Any character that \w doesnt match\W Any digit character\d Any non digit character\D Any whitespace character\s Any non whitespace character\S Anything but a newline (\n). This is a dont- care match.

10 Metasymbols #!/usr/bin/perl $number = 1234; print huh?\n if $number =~ /\D/; # wont print anything print well, I knew that\n if $number =~ /\w/; # will print: well, I knew that print matching a dont care\n if $number =~ /./; # will print: matching a dont care print of course there are no whitespace in numbers\n unless $number =~ /\s/; # will print: of course there are no whitespace in numbers

11 Metasymbols The following metasymbols help describe where in the string to search for PATTERN: Meaning(meta)symbol Beginning of string (or line if string contains multiple lines) ^ End of string (or line if string contains multiple lines)$ Word boundary (between \W and \w or vice versa)\b Anything but word boundaries\B Beginning of string only\A End of string only\Z

12 Metasymbols $string = beetlejuice beetlemania beetles; $string =~ /^beetlemania/; # wont match anything $string =~ /^beetlejuice$/; # wont match anything $string =~ /\bbeetle\b/; # wont match anything $string =~ /\bbeetle/; # will match the first word $string =~ /\bbeetles$/; # will match the last word

13 Metacharacters In general, metacharacters are characters that have a special meaning when they appear in regular expressions. If you want to search for one of the metacharacters themselves, you must prefix it with a backslash: \. The characters are: \ | ( ) [ ]{ } ^ $ * + ?. Therefore, if you are searching for a ?, your match pattern should look like this: m/is this a question\?/ Well see more about what these characters mean in the following slides.

14 Quantifiers In order to search for a repeated pattern, it is possible to specify the fact that your search PATTERN should be repeated using a quantifier after the PATTERN. Perl has three basic quantifiers: 1. ? – Succeeds if the preceding pattern appears 0 or 1 times 2. * – Succeeds if the preceding pattern appears 0 or more times in succession 3. + – Succeeds if the preceding pattern appears 1 or more times in succession Note that quantifiers have no meaning on their own; they always modify the immediately preceding regex symbol

15 Quantifiers $fruit = fifteen (15) bananas; $fruit =~ /e+/; print $&\n; # will print ee $fruit =~ /an*/; print $&\n; # will print an $fruit =~ /(an)+/; print $&\n; # will print anan $fruit =~ /e*/; print $&\n; # will not print, however it will succeed print ok if $fruit =~ m{(abc)?};# will print ok print ok if $fruit =~ m{(abc)+};# will not print anything, and fail print ok if $fruit =~ /tef*en/;# will print ok $fruit =~ /\w+/; print $&\n; # will print fifteen $fruit =~ /\b\d+\b/; print $&\n; # will print 15 $fruit =~ /\(.*\)/; print $&\n; # will print (15) print not found if $fruit !~ /^banana/;# will print not found

16 Quantifiers More precise specification of repeated matches is possible with these additional quantifiers. Generally speaking, quantifiers will try to match as many times as possible if the maximal match version is used or as few times as possible if the minimal match version is used. MeaningMinimal match Maximal match Match 0 or more times *?* Match 1 or more times +?+?+ Match 0 or 1 times?? Match exactly COUNT times {COUNT} Match at least MIN times {MIN,}?{MIN,} Match at least MIN but not more than MAX times {MIN,MAX} ? {MIN,MAX}

17 Quantifiers $phrase = Hold your horses; $phrase =~ /.+o/; print $&\n; # will print Hold your ho $phrase =~ /.+?o/; print $&\n; # will print Ho print match\n if $phrase =~ /.+H/; # will print nothing print match\n if $phrase =~ /.*H/; # will print match print match\n if $phrase =~ /^H.{14}s$/; # will print match print hold is a word\n if $phrase =~ /\BHold/; # will print nothing

18 Alternatives If you wish to match one of many possible subexpressions, use the | token to separate them and the round parentheses to enclose them. The script above is expected to produce: I am in the right course #!/usr/bin/perl $course = Perl course; print I am in the right course if $course =~ /(Perl | Tcl | C) course/;

19 Character Sets To match any one of a set of possible characters, use the square brackets to surround them: [0-9] is the same as \d To match anything except the characters in the square brackets put a caret sign (^) after the opening bracket: [^0-9] is the same as \D Keep in mind the difference between the square brackets, which group a set of characters and the round parentheses, which group alternative expressions: [fee|fie|foe] is the same as [feio|] m%(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/\d{4}%#matches a date dd/mm/yyyy Why not m%([0-2]\d)/(0\d|1[0-2])/\d{4}%

20 Captured Matches In order to remember your matches for further reference, Perl has the built in $1, $2, $3, … variables. Each variable contains the contents of a match that was surrounded by round parentheses. The parentheses are numbered according to the order of the opening parentheses, from the leftmost one towards the rightmost one. Notice that these variables get clobbered every time a match is performed, therefore it is good practice to save them in your own variables. A subroutine call may overwrite them without your knowledge. If the result of the match operator is taken in list context, the elements of the resulting list are what $1, $2… would have returned.

21 Captured Matches $song = oh lord wont you buy me a Mercedes = ($song =~ /(.*o)([a-z\s]*)(.*)/); $1 and $matches[0] will equal oh lord wont yo $2 and $matches[1] will equal u buy me a $3 and $matches[2] will equal Mercedes Benz $1, $2 etc. can be used in the substitution pattern: $time = "12:34"; $time =~ s/(..):(..)/$2:$1/; print $time Is expected to produce: 34:12

22 Modifiers The match rules for a pattern can be modified by certain flags that can be used after the closing delimiter of the match operator: MeaningModifier Ignore alphabetic case distinctions (case insensitive). /i Let. match all newlines in the string/s Let ^ and $ match next to embedded newlines in the string. /m Ignore (most) whitespace and permit comments in pattern. /x Compile pattern once only./o

23 Modifiers print match if Perl =~ /perl/i; # will print match cause ignoring case print match if line 1\nLine 2 =~ /^l.*2/s; # will print match cause. include \n print match if line 1\nLine 2 =~ /^l.*2/m; # will print nothing print match if line 1\nLine 2 =~ /^L.*2/m; # will print match The following 3 regexes all match the same thing: m/\w+:(\s+\w+)\s*\d+/; # A word, colon, spaces, word, spaces, digits. m/\w+: (\s+ \w+) \s* \d+/x; # A word, colon, spaces, word, spaces, digits. m{ \w+: # Match a word and a colon. ( # (begin group) \s+ # Match one or more spaces. \w+ # Match another word. ) # (end group) \s* # Match zero or more spaces. \d+ # Match some digits }x;

24 Modifiers An additional modifier is /g, the global match. It behaves slightly differently for m// and s///. For s///, the PATTERN is replaced throughout EXPR as many times as it is found. For m//, the PATTERN is repetitively matched each time from where the last match left off.

25 Regular Expressions The following script Is expected to produce: banana The following script Is expected to produce: 1 bqnqnq #!/usr/bin/perl $fruit = banana; $counter = 0; while ($fruit =~ m/a/g) { print ++$counter, \n; } print $fruit \n; #!/usr/bin/perl $fruit = banana; $counter = 0; while ($fruit =~ s/a/q/g) { print ++$counter, \n; } print $fruit \n;


Download ppt "Perl & Regular Expressions (RegEx). Regular Expressions A regular expression (regex for short) is a way to describe a text pattern to search for within."
Ads by Google