Presentation is loading. Please wait.

Presentation is loading. Please wait.

$address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/

Similar presentations


Presentation on theme: "$address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/"— Presentation transcript:

1 $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/

2 Introduction to Regular Expressions Introduction to Regular Expressions It’s all about patterns Character Classes match any text of a certain type Repetition operators specify a recurring pattern Search flags change how the RegEx operates In this presentation… green denotes a character class yellow denotes a repetition quantifier orange denotes a search flag or other symbol My examples use Perl syntax It’s all about patterns Character Classes match any text of a certain type Repetition operators specify a recurring pattern Search flags change how the RegEx operates In this presentation… green denotes a character class yellow denotes a repetition quantifier orange denotes a search flag or other symbol My examples use Perl syntax

3 Introduction to Regular Expressions Basic syntax All RegEx statements must begin and end with / /something/ Escaping reserved characters is crucial /(i.e. / is invalid because ( must be closed However, /\(i\.e\. / is valid for finding ‘(i.e. ’ Reserved characters include:. * ? + ( ) [ ] { } / \ | Also some characters have special meanings based on their position in the statement Basic syntax All RegEx statements must begin and end with / /something/ Escaping reserved characters is crucial /(i.e. / is invalid because ( must be closed However, /\(i\.e\. / is valid for finding ‘(i.e. ’ Reserved characters include:. * ? + ( ) [ ] { } / \ | Also some characters have special meanings based on their position in the statement

4 Regular Expression Matching Regular Expression Matching Text Matching A RegEx can match plain text ex. if ($name =~ /Dan/) { print “match”; } But this will match Dan, Danny, Daniel, etc… Full Text Matching with Anchors Might want to match a whole line (or string) ex. if ($name =~ /^Dan$/) { print “match”; } This will only match Dan ^ anchors to the front of the line $ anchors to the end of the line Text Matching A RegEx can match plain text ex. if ($name =~ /Dan/) { print “match”; } But this will match Dan, Danny, Daniel, etc… Full Text Matching with Anchors Might want to match a whole line (or string) ex. if ($name =~ /^Dan$/) { print “match”; } This will only match Dan ^ anchors to the front of the line $ anchors to the end of the line

5 Regular Expression Matching Order of results The search will begin at the start of the string This can be altered, don’t ask yet Every character is important Any plain text in the expression is treated literally Nothing is neglected (close doesn’t count) / s/ is not the same as / s/ Far easier to write than to debug! Order of results The search will begin at the start of the string This can be altered, don’t ask yet Every character is important Any plain text in the expression is treated literally Nothing is neglected (close doesn’t count) / s/ is not the same as / s/ Far easier to write than to debug!

6 Regular Expression Char Classes Regular Expression Char Classes Allows specification of only certain allowable chars [dofZ] matches only the letters d, o, f, and Z If you have a string ‘dog’ then /[dofZ]/ would match ‘d’ only even though ‘o’ is also in the class So this expression can be stated “match one of either d, o, f, or Z.” [A-Za-z] matches any letter [a-fA-F0-9] matches any hexadecimal character [^*$/\\] matches anything BUT *, $, /, or \ The ^ in the front of the char class specifies ‘not’ In a char class, you only need to escape \ ( ] - ^ Allows specification of only certain allowable chars [dofZ] matches only the letters d, o, f, and Z If you have a string ‘dog’ then /[dofZ]/ would match ‘d’ only even though ‘o’ is also in the class So this expression can be stated “match one of either d, o, f, or Z.” [A-Za-z] matches any letter [a-fA-F0-9] matches any hexadecimal character [^*$/\\] matches anything BUT *, $, /, or \ The ^ in the front of the char class specifies ‘not’ In a char class, you only need to escape \ ( ] - ^

7 Regular Expression Char Classes Special character classes match specific characters \d matches a single digit \w matches a word character (A-Z, a-z, _) \b matches a word boundary /\bword\b/ \s matches a whitespace character (spc, tab, newln). wildcard matches everything except newlines Use very carefully, you could get anything! To match “anything but…” capitalize the char class i.e. \D matches anything that isn’t a digit Special character classes match specific characters \d matches a single digit \w matches a word character (A-Z, a-z, _) \b matches a word boundary /\bword\b/ \s matches a whitespace character (spc, tab, newln). wildcard matches everything except newlines Use very carefully, you could get anything! To match “anything but…” capitalize the char class i.e. \D matches anything that isn’t a digit

8 Regular Expression Char Classes Character Class Examples $bodyPart =~ /e\w\w/; Matches ear, eye, etc $thing = ‘1, 2, 3 strikes!’; $thing =~ /\s\d/; Matches ‘ 2’ $thing = ‘1, 2, 3 strikes!’; $thing =~ /[\s\d]/; Matches ‘1’ Not always useful to match single characters $phone =~ /\d\d\d-\d\d\d-\d\d\d\d/; There’s a better way… Character Class Examples $bodyPart =~ /e\w\w/; Matches ear, eye, etc $thing = ‘1, 2, 3 strikes!’; $thing =~ /\s\d/; Matches ‘ 2’ $thing = ‘1, 2, 3 strikes!’; $thing =~ /[\s\d]/; Matches ‘1’ Not always useful to match single characters $phone =~ /\d\d\d-\d\d\d-\d\d\d\d/; There’s a better way…

9 Regular Expression Repetition Regular Expression Repetition Repetition allows for flexibility Range of occurrences $weight =~ /\d{2,3}/; Matches any weight from 10 to 999 $name =~ /\w{5,}/; Matches any name longer than 5 letters if ($SSN =~ /\d{9}/) { print “Invalid SSN!”; } Matches exactly 9 digits Repetition allows for flexibility Range of occurrences $weight =~ /\d{2,3}/; Matches any weight from 10 to 999 $name =~ /\w{5,}/; Matches any name longer than 5 letters if ($SSN =~ /\d{9}/) { print “Invalid SSN!”; } Matches exactly 9 digits

10 Regular Expression Repetition General Quantifiers Some more special characters $favoriteNumber =~ /\d*/; Matches any size number or no number at all $firstName =~ /\w+/; Matches one or more characters $middleInitial =~ /\w?/; Matches one or zero characters General Quantifiers Some more special characters $favoriteNumber =~ /\d*/; Matches any size number or no number at all $firstName =~ /\w+/; Matches one or more characters $middleInitial =~ /\w?/; Matches one or zero characters

11 Regular Expression Repetition Greedy vs Nongreedy matching Greedy matching gets the longest results possible Nongreedy matching gets the shortest possible Let’s say $robot = ‘The12thRobotIs2ndInLine’ $robot =~ /\w*\d+/; (greedy) Matches The12thRobotIs2 Maximizes the length of \w $robot =~ /\w*?\d+/; (nongreedy) Matches The12 Minimizes the length of \w Greedy vs Nongreedy matching Greedy matching gets the longest results possible Nongreedy matching gets the shortest possible Let’s say $robot = ‘The12thRobotIs2ndInLine’ $robot =~ /\w*\d+/; (greedy) Matches The12thRobotIs2 Maximizes the length of \w $robot =~ /\w*?\d+/; (nongreedy) Matches The12 Minimizes the length of \w

12 Regular Expression Repetition Greedy vs Nongreedy matching Suppose $txt = ‘something is so cool’; $txt =~ /something/; Matches ‘something’ $txt =~ /so(mething)?/; Matches ‘something’ and the second ‘so’ $txt =~ /so(mething)??/; Matches only ‘so’ and the second ‘so’ Doesn’t really make sense to do this Greedy vs Nongreedy matching Suppose $txt = ‘something is so cool’; $txt =~ /something/; Matches ‘something’ $txt =~ /so(mething)?/; Matches ‘something’ and the second ‘so’ $txt =~ /so(mething)??/; Matches only ‘so’ and the second ‘so’ Doesn’t really make sense to do this

13 Regular Expression Real Life Examples Regular Expression Real Life Examples Using what you’ve learned so far, you can… Validate a standard 8.3 file name $path =~ /^\w{1,8}\.[A-Za-z0-9]{2,3}$/ Account for poorly spelled user input $answer =~ /^ban{1,2}an{1,2}a$/ $iansLastName =~ /^P[ae]t{1,2}ers[oe]n$/ $iansFirstName =~ /^E?[Ii]?[aeo]?n$/ Matches Ian, Ean, Eian, Eon, Ien, Ein At least everyone gets the n right… Using what you’ve learned so far, you can… Validate a standard 8.3 file name $path =~ /^\w{1,8}\.[A-Za-z0-9]{2,3}$/ Account for poorly spelled user input $answer =~ /^ban{1,2}an{1,2}a$/ $iansLastName =~ /^P[ae]t{1,2}ers[oe]n$/ $iansFirstName =~ /^E?[Ii]?[aeo]?n$/ Matches Ian, Ean, Eian, Eon, Ien, Ein At least everyone gets the n right…

14 Alternation Alternation Alternation allows multiple possibilities Let $story = ‘He went to get his mother’; $story =~ /^(He|She)\b.*?\b(his|her)\b.*? (mother|father|brother|sister|dog)/; Also matches ‘She punched her fat brother’ Make sure the grouping is correct! $ans =~ /^(true|false)$/ Matches only ‘true’ or ‘false’ $ans =~ /^true|false$/ (same as /(^true|false$)/) Matches ‘true never’ or ‘not really false’ Alternation allows multiple possibilities Let $story = ‘He went to get his mother’; $story =~ /^(He|She)\b.*?\b(his|her)\b.*? (mother|father|brother|sister|dog)/; Also matches ‘She punched her fat brother’ Make sure the grouping is correct! $ans =~ /^(true|false)$/ Matches only ‘true’ or ‘false’ $ans =~ /^true|false$/ (same as /(^true|false$)/) Matches ‘true never’ or ‘not really false’

15 Grouping for Backreferences Grouping for Backreferences Backreferences With all these wildcards and possible matches, we usually need to know what the expression finally ended up matching. Backreferences let you see what was matched Can be used after the expression has evaluated or even inside the expression itself Handled very differently in different languages Numbered from left to right, starting at 1 Backreferences With all these wildcards and possible matches, we usually need to know what the expression finally ended up matching. Backreferences let you see what was matched Can be used after the expression has evaluated or even inside the expression itself Handled very differently in different languages Numbered from left to right, starting at 1

16 Grouping for Backreferences Perl backreferences Used inside the expression $txt =~ /\b(\w+)\s+\1\b/ Finds any duplicated word, must use \1 here Used after the expression $class =~ /(.+?)-(\d+)/ The first word between hyphens is stored in the Perl variable $1 (not \1) and the number goes in $2 print “I am in class $1, section $2”; Perl backreferences Used inside the expression $txt =~ /\b(\w+)\s+\1\b/ Finds any duplicated word, must use \1 here Used after the expression $class =~ /(.+?)-(\d+)/ The first word between hyphens is stored in the Perl variable $1 (not \1) and the number goes in $2 print “I am in class $1, section $2”;

17 Grouping for Backreferences Java backreferences Annoying but still useful Pattern p = Pattern.compile(“(.+?)-(\\d+)”); Matcher m = p.matcher(mySchedule); m.find(); System.out.println(“I am in class ” + m.group(1) + “, section ” + m.group(2)); Ugly, but usually better than the alternative m.group() returns the entire string matched Java backreferences Annoying but still useful Pattern p = Pattern.compile(“(.+?)-(\\d+)”); Matcher m = p.matcher(mySchedule); m.find(); System.out.println(“I am in class ” + m.group(1) + “, section ” + m.group(2)); Ugly, but usually better than the alternative m.group() returns the entire string matched

18 Grouping for Backreferences Javascript backreferences Used inside the expression Not supported Used after the expression /(.+?)-(\d+)/.test(class); alert(RegExp.$1); str = str.replace(/(\S+)\s+(\S+)/, “$2 $1”); RegExp supports all of Perl’s special backreference variables (wait a few slides) Javascript backreferences Used inside the expression Not supported Used after the expression /(.+?)-(\d+)/.test(class); alert(RegExp.$1); str = str.replace(/(\S+)\s+(\S+)/, “$2 $1”); RegExp supports all of Perl’s special backreference variables (wait a few slides)

19 Grouping for Backreferences PHP/Python backreferences Allows the use of specifically named backreferences Groups also maintain their numbers.NET backreferences Allows named backreferences If you try to access named groups by number, stuff breaks Check the web for info on how to use backreferences in these and other languages. PHP/Python backreferences Allows the use of specifically named backreferences Groups also maintain their numbers.NET backreferences Allows named backreferences If you try to access named groups by number, stuff breaks Check the web for info on how to use backreferences in these and other languages.

20 Grouping without Backreferences Grouping without Backreferences Sometimes you just need to make a group If important groups must be backreferenced, disable backreferencing for any unimportant groups $sentence =~ /(?:He|She) likes (\w+)\./; I don’t care if it’s a he or she All I want to know is what he/she likes Therefore I use (?:) to forgo the backreference $1 will contain that thing that he/she likes Sometimes you just need to make a group If important groups must be backreferenced, disable backreferencing for any unimportant groups $sentence =~ /(?:He|She) likes (\w+)\./; I don’t care if it’s a he or she All I want to know is what he/she likes Therefore I use (?:) to forgo the backreference $1 will contain that thing that he/she likes

21 Matching Modes Matching Modes Matching has different functional modes Modes can be set by flags outside the expression (only in some languages & implementations) $name =~ /[a-z]+/i; i turns off case sensitivity $xml =~ /title=“([\w ]*)”.*keywords=“([\w ]*)”/s; s enables. to match newlines $report =~ /^\s*Name:[\s\S]*?The End.\s*$/m; m allows newlines between ^ and $ Matching has different functional modes Modes can be set by flags outside the expression (only in some languages & implementations) $name =~ /[a-z]+/i; i turns off case sensitivity $xml =~ /title=“([\w ]*)”.*keywords=“([\w ]*)”/s; s enables. to match newlines $report =~ /^\s*Name:[\s\S]*?The End.\s*$/m; m allows newlines between ^ and $

22 Matching Modes Matching has different functional modes Modes can be set by flags inside the expression (except in Javascript and Ruby) $password =~ /^[a-z](?i)[a-jp-xz0-9]{4,11}$/; If an insane web site specifies that your password must begin with a lowercase letter followed by 4 to 11 upper/lower alphanumeric characters excluding k through o and y. $element =~ /^(?i)[A-Z](?-i)[a-z]?$/; (?i) makes the first letter case insensitive (if they type o, but meant O, we still know they mean oxygen). (?-i) makes sure the second letter is lowercase, otherwise it’s 2 elements Matching has different functional modes Modes can be set by flags inside the expression (except in Javascript and Ruby) $password =~ /^[a-z](?i)[a-jp-xz0-9]{4,11}$/; If an insane web site specifies that your password must begin with a lowercase letter followed by 4 to 11 upper/lower alphanumeric characters excluding k through o and y. $element =~ /^(?i)[A-Z](?-i)[a-z]?$/; (?i) makes the first letter case insensitive (if they type o, but meant O, we still know they mean oxygen). (?-i) makes sure the second letter is lowercase, otherwise it’s 2 elements

23 Regular Expression Replacing Regular Expression Replacing Replacements simplify complex data modification Generally the first part of a replace command is the regular expression and the second part is what to replace the matched text with Usually a backreference variable can be used in the replacement text to refer to a group matched in the expression The RegEx engine continues searching at the point in the string following the replacement Replacements use all the same syntax, but have several unique features and are implemented very differently in various languages. Replacements simplify complex data modification Generally the first part of a replace command is the regular expression and the second part is what to replace the matched text with Usually a backreference variable can be used in the replacement text to refer to a group matched in the expression The RegEx engine continues searching at the point in the string following the replacement Replacements use all the same syntax, but have several unique features and are implemented very differently in various languages.

24 Regular Expression Replacing Perl replacement syntax $phone =~ s/\D//; Removes the first non-digit character in a phone # Note that leaving the replacement blank deletes $html =~ s/^(\s*)/$1\t/; Adds a tab to a line of HTML using backreferences $sample =~ s/[abc]/[ABC]/; Might not do what is expected The second part is NOT a regular expression, it’s a string Perl replacement syntax $phone =~ s/\D//; Removes the first non-digit character in a phone # Note that leaving the replacement blank deletes $html =~ s/^(\s*)/$1\t/; Adds a tab to a line of HTML using backreferences $sample =~ s/[abc]/[ABC]/; Might not do what is expected The second part is NOT a regular expression, it’s a string

25 Regular Expression Replacing Java replacement syntax (sucks) Pattern p = Pattern.compile(“\\\\\\\\server(\\d)”); p.matcher(netPath).replaceAll(“\\\\workstation$1”); Yes, you actually have to use 8 \’s to make \\ Any \ in the expression needs to be doubled Matcher should parse replacement for $1 This has the same effect but is slightly faster than netPath.replaceAll(“\\\\\\\\server(\\d)”, “\\\\workstation$1”); No, you can’t seem to use.replace()… Java replacement syntax (sucks) Pattern p = Pattern.compile(“\\\\\\\\server(\\d)”); p.matcher(netPath).replaceAll(“\\\\workstation$1”); Yes, you actually have to use 8 \’s to make \\ Any \ in the expression needs to be doubled Matcher should parse replacement for $1 This has the same effect but is slightly faster than netPath.replaceAll(“\\\\\\\\server(\\d)”, “\\\\workstation$1”); No, you can’t seem to use.replace()…

26 Replacement Modes Replacement Modes Replacements can be performed singly or globally The examples I have been using replace only single occurrences of patterns Use the g flag to force the expression to scan the entire string $phone =~ s/\D//g; Removes all non-digits in the phone number $myGarage =~ s/Jeep|Cougar/Boeing/g; Gives me jets in exchange for cars Don’t use it if it’s not necessary Replacements can be performed singly or globally The examples I have been using replace only single occurrences of patterns Use the g flag to force the expression to scan the entire string $phone =~ s/\D//g; Removes all non-digits in the phone number $myGarage =~ s/Jeep|Cougar/Boeing/g; Gives me jets in exchange for cars Don’t use it if it’s not necessary

27 Combining Replace and Match Modes Combining Replace and Match Modes Combining modes is easy To combine modes, just append the flags $alphabet =~ /Q//gi; Get rid of the pesky letter Q (and q too) $response =~ /(?im)“([aeiou].*?)”(?-m)(.*)/; This example sucks. Point is you can combine modes inside the statement, too. Combining modes is easy To combine modes, just append the flags $alphabet =~ /Q//gi; Get rid of the pesky letter Q (and q too) $response =~ /(?im)“([aeiou].*?)”(?-m)(.*)/; This example sucks. Point is you can combine modes inside the statement, too.

28 References for Learning More References for Learning More Tutorials for other programming languages http://www.regular-expressions.info/ In-depth syntax http://kobesearch.cpan.org/htdocs/perl/perlreref.html Code Search (ex: ‘ip address regex’) http://www.google.com/codesearch


Download ppt "$address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/ $address =~ m/(\d*.*)\n(.*?, ([A-Z]{2}) (\d{5})-?(\d{0,5})/"

Similar presentations


Ads by Google