Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression.

Similar presentations


Presentation on theme: "Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression."— Presentation transcript:

1 Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression / pattern matching 3. To learn the special cases occurred in regular expression / pattern matching

2 COMP111 Lecture 16 / Slide 2  In Perl, we can make Shakespeare a regular expression by enclosing it in slashes: if(/Shakespeare/){ print $_; }  What is tested in the if-statement ? Answer: $_.  Can you write a even shorter statement using &&? Simple Uses of Regular Expressions

3 COMP111 Lecture 16 / Slide 3 if(/Shakespeare/){ print $_; }  The previous example tests only one line, and prints out the line if it contains Shakespeare.  To work on all lines, add a loop: while(<>){ if(/Shakespeare/){ print; } Simple Uses of Regular Expressions

4 COMP111 Lecture 16 / Slide 4  What if we are not sure how to spell Shakespeare ?  Certainly the first part is easy Shak, and there must be a r near the end.  How can we express our idea? grep:grep "Shak.*r" movie > result Perl:while(<>){ if(/Shak.*r/){ print; } .* means “ zero or more of any character ”. Simple Uses of Regular Expressions

5 COMP111 Lecture 16 / Slide 5  The dot “. ” matches any single character except the newline ( \n ).  For example, the pattern /a./ matches any two-letter sequence that starts with a and is not “ a\n ”.  Use \. if you really want to match the period. $ cat test hi hi bob. $ cat sub3 test #!/usr/local/bin/perl5 -w while(<>){ if(/\./){ print; } } $ sub3 test hi bob. $ Single-Character Patterns

6 COMP111 Lecture 16 / Slide 6  If you want to specify one out of a group of characters to match use [ ]: /[abcde]/ This matches a string containing any one of the first 5 lowercase letters, while: /[aeiouAEIOU]/ matches any of the 5 vowels in either upper or lower case. Single-Character Groups (1)

7 COMP111 Lecture 16 / Slide 7  If you want ] in the group, put a backslash before it, or put it as the first character in the list: /[abcde]]/# matches [abcde] + ] /[abcde\]]/# okay /[]abcde]/# also okay  Use - for ranges of characters (like a through z ): /[0123456789]/# any single digit /[0-9]/# same  If you want - in the list, put a backslash before it, or put it at the beginning/end: /[X-Z]/# matches X, Y, Z /[X\-Z]/# matches X, -, Z /[XZ-]/# matches X, Z, - /[-XZ]/# matches -, X, Z Single-Character Groups (2)

8 COMP111 Lecture 16 / Slide 8  More range examples: /[0-9\-]/ # match 0-9, or minus /[0-9a-z]/ # match any digit or lowercase letter /[a-zA-Z0-9_]/ # match any letter, digit, underscore  There is also a negated character group, which starts with a ^ immediately after the left bracket. This matches any single character not in the list. /[^0123456789]/# match any single non-digit /[^0-9]/# same /[^aeiouAEIOU]/# match any single non-vowel /[^\^]/ # match any single character except ^ Single-Character Groups (3)

9 COMP111 Lecture 16 / Slide 9  For convenience, some common character groups are predefined: PredefinedGroupNegatedNegated Group \d (a digit)[0-9]\D (non-digit)[^0-9] \w (word char)[a-zA-Z0-9_]\W (non-word)[^a-zA-Z0-9_] \s (space char)[ \t\n]\S (non-space)[^ \t\n]  \d matches any digit  \w matches any letter, digit, underscore  \s matches any space, tab, newline  You can use these predefined groups in other groups: /\da-fA-F/# match any hexadecimal digit Single-Character Groups (4)

10 COMP111 Lecture 16 / Slide 10  The split function allows you to break a string into fields.  split takes a regular expression and a string, and breaks up the line wherever the pattern occurs. $ cat split1 #!/usr/local/bin/perl5 -w $line = "Bill Shakespeare in love with Bill Gates"; @fields = split(/ /,$line); # split $line using space as delimiter print "$fields[0] $fields[3] $fields[6]\n"; $ split1 Bill love Gates $ Split (1)

11 COMP111 Lecture 16 / Slide 11  You can use $_ with split.  split defaults to look for space delimiters. $ cat split2 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in love with Bill Gates"; @fields = split; # split $line using space (default) as delimiter print "$fields[0] $fields[3] $fields[6]\n"; $ split2 Bill love Gates $ Split (2)

12 COMP111 Lecture 16 / Slide 12  How would we match a pattern that starts and ends with the same letter or word?  For this, we need to remember the pattern.  Use ( ) around any pattern to put that part of the string into memory (it has no effect on the pattern itself).  To recall memory, include a backslash followed by an integer. /Bill(.)Gates\1/ Pattern Memory (1)

13 COMP111 Lecture 16 / Slide 13  Example: /Bill(.)Gates\1/ This example matches a string starting with Bill, followed by any single non-newline character, followed by Gates, followed by that same single character.  So, it matches: Bill!Gates!Bill-Gates- but not: Bill?Gates!Bill-Gates_ (Note that /Bill.Gates./ would match all four) Pattern Memory (2)

14 COMP111 Lecture 16 / Slide 14 Pattern Memory (3)  More examples: /a(.)b(.)c\2d\1/  This example matches a string starting with a, a character (#1), followed by b, another single character (#2), c, the character #2, d, and the character #1.  So it matches: a-b!c!d-.

15 COMP111 Lecture 16 / Slide 15 Pattern Memory (4)  The reference part can have more than a single character.  For example: /a(.*)b\1c/  This example matches an a, followed by any number of characters (even zero), followed by b, followed by the same sequence of characters, followed by c.  So it matches: aBillbBillc and abc, but not: aBillbBillGatesc.

16 COMP111 Lecture 16 / Slide 16 Or  How about picking from a set of alternatives when there is more than one character in the patterns.  The following example matches either Gates or Clinton or Shakespeare : /Gates|Clinton|Shakespeare/  For single character alternatives, /[abc]/ is the same as /a|b|c/.

17 COMP111 Lecture 16 / Slide 17 Anchoring Patterns  Anchors requires that the pattern be at the beginning or end of the line.  ^ matches the beginning of the line (only if ^ is the first character of the pattern): /^Bill/ # match lines that begin with Bill /^Gates/ # match lines that begin with Gates /Bill\^/ # match lines containing Bill^ somewhere /\^/ # match lines containing ^  $ matches the end of the line (only if $ is the last character of the pattern): /Bill$/ # match lines that end with Bill /Gates$/ # match lines that end with Gates /$Bill/ # match with contents of scalar $Bill /\$/ # match lines containing $

18 COMP111 Lecture 16 / Slide 18 Using =~ (1)  What if you want to match a different variable than $_ ?  Answer: Use =~.  Examples: $name = "Bill Shakespeare"; $name =~ /^Bill/;# true $name =~ /(.)\1/;# also true (matches ll) if($name =~ /(.)\1/){ print "$name\n"; }

19 COMP111 Lecture 16 / Slide 19 Using =~ (2)  An example using =~ to match : $ cat match1 #!/usr/local/bin/perl5 -w print "Quit (y/n)? "; if( =~ /^[yY]/){ print "Quitting\n"; exit; } print "Continuing\n"; $ match1 Quit (y/n)? y Quitting $

20 COMP111 Lecture 16 / Slide 20 Ignoring Case  In the previous examples, we used [yY] and [nN] to match either upper or lower case.  Perl has an “ ignore case ” option for pattern matching: /somepattern/i $ cat match1a #!/usr/local/bin/perl5 -w print "Quit (y/n)? "; if( =~ /^y/i){ print "Quitting\n"; exit; } print "Continuing\n"; $ match1a Quit (y/n)? Y Quitting $

21 COMP111 Lecture 16 / Slide 21 Slash and Backslash  If your pattern has a slash character ( / ), you must precede each with a backslash ( \ ): $ cat slash1 #!/usr/local/bin/perl5 -w print "Enter path: "; $path = ; if($path =~ /^\/usr\/local\/bin/){ print "Path is /usr/local/bin\n"; } $ slash1 Enter path: /usr/local/bin Path is /usr/local/bin $

22 COMP111 Lecture 16 / Slide 22  If your pattern has lots of slash characters ( / ), you can also use a different pattern delimiter with the form: m#somepattern#  The # can be any non-alphanumeric character. $ cat slash1a #!/usr/local/bin/perl5 -w print "Enter path: "; $path = ; if($path =~ m#^/usr/local/bin#){ #if($path =~ m@^/usr/local/bin@){# also works print "Path is /usr/local/bin\n"; } $ slash1a Enter path: /usr/local/bin Path is /usr/local/bin $ Different Pattern Delimiters

23 COMP111 Lecture 16 / Slide 23  After a successful pattern match, the variables $1, $2, $3, … are set to the same values as \1, \2, \3, …  You can use $1, $2, $3, … later in your program. $ cat read1 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in Love"; /(\w+)\W+(\w+)/; # match first two words # $1 is now "Bill" and $2 is now "Shakespeare" print "The first name of $2 is $1\n"; $ read1 The first name of Shakespeare is Bill Special Read-Only Variables (1)

24 COMP111 Lecture 16 / Slide 24  You can also use $1, $2, $3, … by placing the match in a list context: $ cat read2 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in Love"; ($first, $last) = /(\w+)\W+(\w+)/; print "The first name of $last is $first\n"; $ read2 The first name of Shakespeare is Bill Special Read-Only Variables (2)

25 COMP111 Lecture 16 / Slide 25  Other read-only variables:  $& is the part of the string that matched the pattern.  $` is the part of the string before the match  $’ is the part of the string after the match $ cat read3 #!/usr/local/bin/perl5 -w $_ = "Bill Shakespeare in Love"; / in /; print "Before: $`\n"; print "Match: $&\n"; print "After: $'\n"; $ read3 Before: Bill Shakespeare Match: in After: Love Special Read-Only Variables (3)

26 COMP111 Lecture 16 / Slide 26 Repeat {n}  /(fred){5,15}/  Match from five to fifteen repetitions of “fred”  /a{5,}/  Match five or more times repetitions of “a”  /\w{8}/  Match exactly 8 word characters.


Download ppt "Regular Expression (1) Learning Objectives: 1. To understand the concept of regular expression 2. To learn commonly used operations involving regular expression."

Similar presentations


Ads by Google