Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.

Similar presentations


Presentation on theme: "CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions."— Presentation transcript:

1 CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions

2 CIT 383: Administrative Scripting Topics 1.Creating Regexp objects 2.Regular expression syntax 3.Pattern matching 4.Substitution

3 CIT 383: Administrative Scripting Regular Expressions Used to match patterns against strings.  UNIX commands: egrep, awk, sed  Ruby provides an expanded regexp syntax. Applications of regular expressions  Find every login failure in a log file.  Find every address you received email from.  Find every IP address in a file.

4 CIT 383: Administrative Scripting Creating a Regexp object Three methods re = Regexp.new('^\s*[a-z]') re = /^\s*[a-z]/ re = %r|^\s*[a-z]| Modifiers i: ignore case when matching text m: multiline match, allow. to match \n x: extended syntax with comments + whitespace o: perform #{} interpolations only once

5 CIT 383: Administrative Scripting Pattern Syntax Characters match themselves except., |, (, ), [, ], {, }, +, \, ^, $, *, ? Use \ to escape, i.e. \| will match a | The. metacharacter matches any character. Anchors require match to match at start or end ^ matches the beginning of a line $ matches the end of a line \A matches the beginning of a string \Z matches the end of a string

6 CIT 383: Administrative Scripting Regexp Escape Sequences Similar to double quotes \t is tab \n is newline etc. Word boundaries /red/ matches “red”, “bred”, “reddened” /\bred\b/ matches only “red” \B matches nonword boundaries /\brub\B/ matches “ruby” but not “rub”

7 CIT 383: Administrative Scripting Character Classes Set of characters between brackets [aeiou] will match any vowel [0123456789] will match any digit Special characters aren’t special inside []’s Additional syntax [A-Z] is a range including all capital letters [A-Za-z0-9] is a range of alphanumerics [^A-Z] is a range of anything but capital letters

8 CIT 383: Administrative Scripting Special Character Classes Abbreviations \d is [0-9] \D is [^0-9] \s is [ \t\r\n\f] \S is [^ \t\r\n\f] \w is [A-Za-z0-9_] \W is [^A-Za-z0-9_] POSIX Classes [:alnum:] is [A-Za-z0-9] [:alpha:] is [A-Za-z] [:digit:] is [0-9] [:xdigit:] is [0-9A-Fa-f] [:lower:] is [a-z] [:upper:] is [A-Z] [:space:] is [ \t\r\n\f]

9 CIT 383: Administrative Scripting Alternation Vertical bar matches pattern before or after it pattern1|pattern2 Precedence red|blue matches either “red” or “blue” red ball|blue sky matches “red ball” or “blue sky” but not “red blue sky” or “red ball sky” Use parentheses to group in an expression red (ball|blue) sky

10 CIT 383: Administrative Scripting Repetition Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or more occurrences of re re+ matches one or more occurrences of re re? matches zero or one occurrences of re re{n} matches exactly n occurrences of re re{n,} matches n or more occurrences of re re{n,m} matches at least n and at most m occurrences of re

11 CIT 383: Administrative Scripting Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&\1ails/ will match  Ruby & Rails  ruby & rails /(\w+) \1/ will match a repeated word Greedy and non-greedy matching is greedy, will match “ perl>” is non-greedy, will match “ ”

12 CIT 383: Administrative Scripting Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is better than PHP.” After successful match, can retrieve details: data = Regexp.last_match data.string: the string that was compared data.to_s: the part of the string that matched data.pre_match: portion of string before match data.post_match: portion of string after match data[1]: what first set of () matched data[2]: what second set of () matched data.captures: what all sets of parentheses matched

13 CIT 383: Administrative Scripting Pattern Matching Methods Slicing “ruby123”[/\d+/] # 123 “ruby123”[/([a-z]+)(\d+)/,1] # ruby “ruby123”[/([a-z]+)(\d+)/,2] # 123 r = “ruby123” r.slice(/\d+/) # 123 r.slice!(/\d+/) # 123, r = “ruby” Splitting s = “one, two, three” s.split # [“one,”, “two,”, “three”] s.split(‘, ‘) # [“one, “two”, “three”] s.split(/\s*,\s*/) # [“one”,”two”,”three”]

14 CIT 383: Administrative Scripting Substitutions The String class provides RE substitutions sub(re, str): return string where the first substring matching re is replaced by str sub!(re, str): replace the first substring matching re with str gsub(re, str): return string where the all substrings matching re are replaced by str gsub!(re, str): replace all substrings matching re with str

15 CIT 383: Administrative Scripting Substitution Examples Remove ruby-style quotes line.sub!(/#.*$/, “”) Remove all non-digits line.gsub!(/\D/, “”) Capitalize specified words line.gsub!(/\brails\b/, ‘Rails’) Change “John Smith” to “Smith, John” name.sub!(/(\w+)\s+(\w+)/, ‘\2, \1’) Flip UNIX slashes to Windows slashes path.gsub!(%r|/|, ‘\\’)

16 CIT 383: Administrative ScriptingSlide #16 References 1.Michael Fitzgerald, Learning Ruby, O’Reilly, 2008. 2.David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, 2008. 3.Hal Fulton, The Ruby Way, 2 nd edition, Addison- Wesley, 2007. 4.Robert C. Martin, Clean Code, Prentice Hall, 2008. 5.Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2 nd edition, Pragmatic Programmers, 2005.


Download ppt "CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions."

Similar presentations


Ads by Google