Download presentation
Presentation is loading. Please wait.
1
Perl Regular expression: string manipulation
2
substr function string = substr(string2,start pos (starts with 0), offset) —returns a substring after the start point to offset —string2 is not changed —$str2 = "Hi There"; —$str = substr($str2, 3, 2); –$str = "Th"; # from 4 position to 5 position; substr(string,start pos, offset) = string2 —puts string2 after the start pos and removing old string characters to offset. —$str2 = "Hi There"; $str = "hi"; —substr($str2, 3,3) = $str; #insert and replace –$str2 = "Hi hire"; —substr($str2, 3,0) = $str; #insert only. –$str2 = "Hi hihire";
3
index and rindex index string, substring [, offset] —returns the position before the substring in string, else -1 —with offset, position after the offset, else -1 rindex string, substring [, offset] —return the last occurrence of the substring, else -1 —with offset, the right most position that may be returned. $pos = index $str, $str2 —returns the position where $str2 is found in $str
4
example of substr and index $str = "There there Jim"; $sstr = "Jim"; $replace = "Fred"; substr($str,(index $str,$sstr),3)= $replace; —replace Jim with Fred in $str —$str = "There there Fred"; The substitution operator is an easier way to do this.
5
grep LIST = grep EXPR, LIST LIST = grep BLOCK LIST like map, each element is assigned to the $_, then processed by BLOCK or EXPR, results are put into the list. @new = grep /[a-zA-Z]/, @lines NOTE: altering $_ will alter the original list @list = qw(barney fred dino wilma) @greplist = grep {s/^[bfd]//} @list —@greplist = "arney", "red", "ino" —@list = "arney", "red", "ino", "wilma"
6
s/// Operator (Substitution) $str =~ s/pattern to match/replacement/; —find the first match and replace it $str =~ s/pattern to match/replacement/g; —Find all matches and replace each of them. Simple substitution $str = "3 dogs bit 1 dog"; $str =~ s/dog/cat/; —$str = "3 cats bit 1 dog"; $str =~ s/dog/cat/g; —$str = "3 cats bit 1 cat";
7
s/// Operator (Substitution) (2) s/pattern//; —remove the pattern found $str = "abad"; s/a//g; —$str ="bd"; From substr and index slide $str =~ s/$sstr/$replace/; OR $str =~ s/Jim/Fred/;
8
case insensitive substitution /i ignore case $str = "Dog, dog, dOg"; s/DOG/cat/ig; —$str = "cat, cat, cat"; $str = "Dog, dog, dOg"; s/DOG/cAt/ig; —$str = "cAt, cAt, cAt"; —The replacement string is replaced as written.
9
examples $str = "fred xxx barney"; —$str =~ s/x/boom/; –$str = "fred boomxx barney" —$str =~ s/x/boom/g; –$str = "fred boomboomboom barney"; —$str =~ s/x+/boom/; –$str = "fred boom barney";
10
alternation and group matching | allows an or'd matching $str = "Wilma Flintstone"; $str =~ s/Fred|Wilma|Pebbles/Dino/g; —$str = "Dino Flintstone"; —Replace all instances of Fred or Wilma or Pebbles with Dino. $str = "1st time winner"; $str =~ s/(1st|2nd|3rd) time/Last place/; —$1 is the match, “1st” Entire match is “1st time” —$str = "Last place winner"
11
single character substitution Using [] $str =~ s/[abc]/d/; #sub a, b, or c with d $str =~ s/[Fred]/x/g; —If $str was "Fred", after it would be "xxxx" $str =~ s/[^aeiouAEIOU]/_/g; —replace any non-vowel with an _ Common mistake: $str =~ s/[a-z]/[A-Z]/g; —Should replaces any lower case letter with upper case letters but replace side is literal (not a pattern) —if $str = "hi", then it would be "[A-Z][A-Z]"; —NOTE: $str = uc $str; #upper cases a string.
12
matching quantifiers $str =~ s/a{3}/b/; —first instance of aaa is replace with b $str = "aaaaa"; # use this for the rest of the slide $str =~ s/a{3,}/b/; #max matching —$str = "b" $str =~ s/a{3,}?/b/; #min matching —$str = "baa"; #only sub 3 to make a min match $str =~ s/(a{3,}?)(a*)/b/; —$str = "b"; $1 = "aaa"; $2 = "aa"; $str =~ s/(a{3,})(a*)/b/; —$str = "b"; $1 = "aaaaa"; $2 = ""; $str =~ s/(a{3,}?)(a*?)/b/;# min match on both —$str = "baa"; $1 = "aaa"; $2 = "";
13
matching quantifiers (2) $str = "aaaaab"; # use this for the rest of the slide $str =~ s/a{3,}?b/c/; —$str = "c", why? in order to make the match, it used all the a's to include the b. + 1 or more and ? 0 or 1 time (max match) $str =~ s/(a+)(b?)/c/; —$str = "c", $1 = "aaaaa" and $2 = "b" $str =~ s/(a+?)(b??)/c/; #min match —$str = "caaaab"; $1 ="a"; $2 = "";
14
matching quantifiers (3) Example and perl doesn’t always do what you think. $str = "ddogg"; $str =~ s/d.*g/cat/; —$str = "cat" # max match, makes sense $str = "ddogg"; $str =~ s/d.*?g/cat/; —$str = "catg"; #min match, but not the best min match it can make.
15
matching quantifiers (4) More Examples (with $_ variable) $_ = "a xxx c xxxxx c xxx d"; s/x{1,}/d/g; produces "a d c d c d d" s/x{1,}?/d/g; produces "a ddd c ddddd c ddd d" s/x{1,2}/d/g; prodcues "a dd c ddd c dd d" s/x{1,3}/d/g; produces "a d c dd c d d" s/x{2,2}/d/g; produces "a dx c ddx c dx d" —or s/x{2}/d/g;
16
Anchoring $str = "Fred Flintstone Fred" $str =~ s/Fred/Wilma/g; —Replaces all instances of Fred with Wilma $str =~ s/Fred$/Wilma/g; —Only the last instance, "Fred Flintstone Wilma", even with /g flag $str =~ s/^Fred/Wilma/g; —only the first instance, "Wilma Flintstone Fred", even with the /g flag $str = "abcd"; $str =~ s/^[abc]+/d/; —$str = "dd";
17
Parentheses as memory s/a(.)b(.)c\2d\1/a mess/; —"adbecedd" is converted to "a mess" —"adbecdde" is not converted. s/a(.*)b\1c/a mess/; —"addbddc" changes to "a mess" —"adddbddc" is not changed To kept the pattern found use \1..\9 in replacement s/a(.*)b\1c/What is this: \1/; —"addbddc" converted to "What is this: dd" —again $1 = "dd"
18
metasymbols a very common substitution —s/\s+/ /g; # replace all whitespace with single space. –" a b\t c" changes to " a b c" remove word character duplicates —$str = "11aabbdccaa"; —$str =~ s/(\w)\1/\1/g; –$str = "1abcda" Remove any duplicates —$str = "11,,aa" —$str =~ s/(.)\1/\1/g; –$str ="1,a"
19
Exercise 10 What is the outcome of the following substitutions? Use $_ = "ad dog cd" 1.s/dog//; 2.while (/ /) { s/ / /g;} 3.s/(\w+)\s+(\w+)/$2 $1/g; 4.s/(.+)d/Dd/g; 5.s/(.+?)d/Dd/g; 6.s/(\S+)/=\1=/g; 7.Write a substitution to change each vowel to an X.
20
s/// flags like the match operator /m let ^ and $ match next to embedded \n /s let. match newline /x ignore whitespace and permit comments s/// flags only /g replace globally, ie all occurrences /e evaluate the right side as an expression —in other words, perl interprets the right side as perl code, where you have return value
21
/e flag s/(\d+)/sprintf("%#x",$1)/ge; —covert all numbers to hex —"2581" would converted to "0xb23" return to the leap year with a trinary operator s/(\d+)/ $1 % 4 ? "$1 (not a leap year)" : $1 % 100 ? "$1 (a leap year)" : $1 % 400 ? "$1 (not a leap year)" : "$1 (a leap year)" /gxe "2000" changed to "2000 (a leap year)"
22
tr/// Operator (Transliteration) same as sed, can as use y/// instead of tr/// DOES NOT use pattern matching, instead it scans character by character and replaces each occurrence of a character with a replacement tr/SEARCHLIST/REPLACEMENTLIST/cds; Example: —$str = "AABBCCDDEE"; —$str =~ tr/ABC/XYZ/; –$str = "XXYYZZDDEE"; —$str =~ tr/DE/!/; #if the replacement list is too short, uses the last one as many times as needed. –$str = "XXYYZZ!!!!";
23
tr/// Operator (Transliteration) (2) Duplicates in the Searchlist are ignored —$str = "AABBCCDDEE"; —$str =~ tr/AAB/xyz/; –$str = "xxzzCCDDEE"; /c means letters not in the Searchlist —$str = "AABBCCDDEE"; —$str =~ tr/ABC/x/c; –$str = "AABBCCxxxx";
24
tr/// Operator (Transliteration) (3) /d delete found, but non-replaced characters —Changes tr, so if your replacement list is short, those characters are removed —$str = "AABBCCDDEE"; —$str =~ tr/ABC/xy/d; –$str = "xxyyDDEE"; —$str =~ tr/DE//d; –$str = "xxyy";
25
tr/// Operator (Transliteration) (4) /s removes duplicates in replaced characters —$str = "AABBCCDDEE"; —$str =~ tr/ABC/xyz/s; –$str ="xyzDDEE"; tr/// returns the number of characters found/replaced. $count = ($str =~ tr/ABC/xyz/); —$count = 6; $str = "xxyyzzDDEE"; $count = ($str =~ tr/ABC//); —$count = 6; $str = "AABBCCDDEE"; –No replacement list, so it just counted them and made no replacements. Note s/// would have removed them.
26
More tr/// Examples $str = "AABBCCDDEE"; $str =~ tr/D//d; #delete found characters —$str = "AABBCCEE"; $str = "AABBCCDDEE"; $str =~ tr/ABD/xy/ds; #delete D, sub A for x and B for y and remove duplicates replacements —$str = "xyCCEE"; $str =~ tr/a-zA-Z//dc; —remove any non letters from $str. $str =~ tr/A-Za-z/N-ZA-Mn-za-m/; —rotate the characters by 13 letters for simple encryption.
27
Exercise 11 What is the outcome of the following transliteration? Use $_ = "fred and barney" 1.tr/abcde/ABCDE/; 2.tr/a-z/ABCDE/d; 3.$count = tr/a-z/A-Z/; 4.tr/a-z/_/c; 5.tr/a-m/X/s; 6.tr/aeiou/X/cs; 7.$count = tr/aeiou//c; Change the letters bdr to X and count the number of changes.
28
Q A &
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.