Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.

Similar presentations


Presentation on theme: "Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics."— Presentation transcript:

1 Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics

2 Types & Regular Expressions2 Regular Expressions  Regular expressions are a powerful tool for matching patterns against strings  Available in many languages (AWK, Sed, Perl, Python, Ruby, C/C++, others)  Matching strings with RegExp’s is very efficient and fast

3 Types & Regular Expressions3 RegExp basics  A regular expression is a pattern that can be compared to a string  A regular expression is created using the / / delimiters: /^[abc].*f$/  A regular expression is matched using the =~ (binding) operator  A regular expression match returns true or false if ($mystring =~ /^[abc].*f$/) { }

4 Types & Regular Expressions4 String Matching  Examples of a few simple regular expressions $a = "Fats Waller"; $a =~ /a/ » 1 (true) $a =~ /z/ » nil (false) $a =~ /ll/ » 1 (true)

5 Types & Regular Expressions5 Regular Expression Patterns  Most characters match themselves  Wildcard:. (period) = any character  Anchors ^ = “start of line” $ = “end of line”

6 Types & Regular Expressions6 Character Classes  Character classes: appear within [] pairs Most special Regexp characters (^, $, etc) turned off Escape sequences (\n etc) still work [aeiou] [0-9] ^ as first character = negate the class You can use the literal characters ] and – if they appear first: []-abn-z]

7 Types & Regular Expressions7 Predefined character classes  These work inside or outside []’s: \d = digit = [0-9] \D = non-digit = [^0-9] \s = whitespace, \S = non-whitespace \w = word character [a-zA-Z0-9_] \W = non-word character

8 Types & Regular Expressions8 Repetition in Regexps  These quantify the preceding character or class: * = zero or more + = one or more ? = zero or one {m, n} = at least m and at most n {m, } = at least m  High precedence – Only matches one character or class, unless grouped: /^ran*$/ vs. /^r(an)*$/

9 Types & Regular Expressions9 Alternation  | is like “or” – matches either the regexp before the | or the one after  Low precedence – alternates entire regexps unless grouped /red ball|angry sky/ matches “red ball” or “angry sky” not “red ball sky” or “red angry sky) /red (ball|angry) sky/ does the latter

10 Types & Regular Expressions10 Side Effects (Perl Magic)  After you match a regular expression some “special” Perl variables are automatically set: $& – the part of the expression that matched the pattern $‘ – the part of the string before the pattern $’ – the part of the string after the pattern

11 Types & Regular Expressions11 Side effects and grouping  When you use ()’s for grouping, Perl assigns the match within the first () pair to: \1 within the pattern $1 outside the pattern “mississippi” =~ /^.*(iss)+.*$/ » $1 = “iss” /([aeiou][aeiou]).*\1/

12 Types & Regular Expressions12 Repetition and greediness  By default, repetition is greedy, meaning that it will assign as many characters as possible.  You can make a repetition modifier non-greedy by adding ‘?’ a = "The moon is made of cheese“ showRE(a, /\w+/)» > moon is made of cheese showRE(a, /\s.*\s/)» The >cheese showRE(a, /\s.*?\s/)» The >is made of cheese showRE(a, /[aeiou]{2,99}/)» The m >n is made of cheese showRE(a, /mo?o/)» The >n is made of cheese

13 Types & Regular Expressions13 RegExp Substitutions

14 Types & Regular Expressions14 Using RegExps  Repeated regexps with list context and /g  Single matches


Download ppt "Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics."

Similar presentations


Ads by Google