Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.

Similar presentations


Presentation on theme: "Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching."— Presentation transcript:

1 Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.

2 Forming RegExps Strings Variables Patterns

3 Strings and Variables /Joey Ramone/ - match a specific string. /$name/, where $name = “Joey Ramone” - match the string stored in a variable. /Joey $name/ - matching a pattern defined by a mixture of strings and variables.

4 Character classes abc – match “abc”. – match any single character (i.e. a.b). [abc] – match “a” or “b” or “c” [0123456789] – match “0” or “1” or …or “9” [0-9] – same as previous [a-z] – match “a” or “b” or …or “z” [A-Z] – same as previous only with caps [] – match any single occurrence of any of the characters found within. [0-9a-zA-Z-] – match any alphanumeric or the minus sign

5 Negated character classes [^0-9] – match any single character that is not a numeric digit [^aeiouAEIOU] – match any single character that is not a vowel Works only for single characters We’ll discuss matching negated strings of characters later.

6 Escape characters \ - use the backslash to match any special character as the character itself. /\$name/ - match the literal string “$name”. /a\.b/ - match the literal string “a.b” rather than “a” followed by any character, followed by “b”.

7 Convenience character classes \d (a digit) - [0-9] \D (digits, not!) - [^0-9] \w (word char) - [a-zA-Z0-9_] \W (words, not!) - [^a-zA-Z0-9_] \s (space char) - [ \r\t\n\f] \S (space, not!) - [^ \r\t\n\f]

8 Sequences + - one or more of preceding pattern /[a-zA-Z]+/ (match a string of alpha characters such as a name). ? (match zero or one instance of preceding character). /[a-zA-Z]+-?[a-zA-Z]+ (Now we can match hyphenated names).

9 Sequences * (match zero or more of preceding pattern) Example – list of names: –George Harrison –Paul McCartney –Richard “Ringo” Starkey –John Winston Lennon /[a-zA-Z]+ [a-zA-Z]+/ (match first and last name) /[a-zA-Z]+ [a-zA-Z\”]* [a-zA-Z]+/ (match first name, middle name, if it exists, and last name)

10 Sequences {k} – match k instances of preceding pattern. Example: floating point numbers to 2 decimal places –/[0-9]+\.[0-9]{2} {k,j} – match at least k instances of preceding pattern, but no more than j. Example: floating point numbers that may or may not have a decimal component. –/[0-9]+\.?[0-9]{0,2}/

11 Grouping /(John|Paul|George|Ringo)/ – matches any one of either “John”, “Paul”, “George”, or “Ringo” /((John|Paul|George|Ringo) )+/ Matches the Beatles names listed in any order. –John Paul George Ringo –Paul George John Ringo –Ringo Paul George John Actually, this will also match: –Paul Paul Paul Paul Paul Paul Paul Paul Paul Be careful about what assumptions you make.

12 Problem Write a regular expression that will match social security number. Format: 555-55-5555

13 A solution /[0-9]{3}-[0-9]{2}-[0-9]{4}/

14 Problem Write a regular expression that will match a phone number. Formats –319-337-3663 –319.337.3663

15 A solution /[0-9]{3}[\.-][0-9]{3}[\.-][0-9]{4}

16 Add another format 3193373663

17 A solution /[0-9]{3}[\.-]?[0-9]{3}[\.-]?[0- 9]{4}/

18 Problem Write a regular expression that will match an email address. Legal characters for names are: –Letters, numbers, “-”, and “_” Legal characters for domain names are: –Letters only Assume form: username@machine.domain.suffix

19 A solution /[a-z0-9-_]+\@[a-z]+(\.[a-z]+){2}/ More general version: /[a-z0-9-_]+\@[a-z]+(\.[a-z]+)+/

20 Problem Write a regular expression that will match an HTML anchor start tag. Assume anchor tag is of the form: – some anchor text

21 A solution / Actually, quotes are not required So it should be: –/ ]+”?>/ How would we assign the url to a variable?

22 A solution ($url) = ($htmlText =~ m/ ]”?>/);

23 Take Away There is almost always a pattern that will match what you want it to match. The best way to learn is to simply jump in and start writing your own patterns. If you have a question about how to construct one, feel free to ask me. One typically learns Perl by asking people with more experience.


Download ppt "Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching."

Similar presentations


Ads by Google