Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 4630 Meeting 21 April 4, 2007. Return to Perl Where are we? What is confusing? What practice do you need?

Similar presentations


Presentation on theme: "CSC 4630 Meeting 21 April 4, 2007. Return to Perl Where are we? What is confusing? What practice do you need?"— Presentation transcript:

1 CSC 4630 Meeting 21 April 4, 2007

2 Return to Perl Where are we? What is confusing? What practice do you need?

3 Ray’s Problem Given a string of the form: 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 = 100 replace the 8 b’s with –one plus sign –two minus signs –five empty strings, signifying close up the spacing to make a number and find which replacements yield a true statement.

4 Ray’s Problem (2) Thoughts on the answer: 1234-56-78+9 = 100 is an example How many possible strings are there? Proof by exhaustion may be the best

5 Regular Expressions Revisited Returning to a fundamental structure Theoretically defined Implemented in grep, egrep, Implemented in awk, gawk, nawk Implemented in Perl

6 RE(2) Theoretically a RE defines a set of strings on an alphabet In implementation matching with a RE checks whether the current string is an element of a set of strings that is constructed from the strings defined theoretically.

7 RE(3) A single character c Theoretically defines the set of strings {c} Which generates the set of matching lines {ScT}, where S and T are arbitrary, possibly empty strings. In implementation, –grep c somelines returns ______________ –awk “/c/” somelines returns ______________ –if (/c/) print {$_;} returns ______________

8 RE(4) so grep c somelines is equivalent to perl re1 <somelines where re1 is the Perl program while { if (/c/) {print $_;} }

9 RE(5) Theoretically if r and s are regular expressions defining languages L and M respectively, then –rs defines the language LM, meaning concatenate a string in L with a string in M Hence, –grep abc somelines –awk “/abc/” somelines –while { if (/abc/) {print $_;}}

10 RE(6) all return the lines that are contained in the set {SabcT} where S and T are arbitrary, possibly empty strings. Details: /a/ defines {a}, /b/ defines {b}, /c/ defines {c} /abc/ defines {abc} by concatenation Lines matching /abc/ are in {SabcT}

11 RE(7) The * operator shows that the previous simple regular expression is repeated 0 or more times. /ab*c/ defines the language formed as the union of the languages defined by /ac/, /abc/, /abbc/, /abbbc/, etc. This is the set {ab n c | n = 0,1,2, …} (an infinite set) Hence /ab*c/ matches any string of the form Sab n cT

12 RE(8) The symbol. designates any character in the alphabet (What is the alphabet we’re using?) except \n which stands for newline. (A Perl definition, check for the various shells and the various awks). Thus. defines the language A-{\n} And. matches any line that contains at least one character. Officially an empty line looks like \n and every line ends with \n

13 RE(9) Exercise: Construct all possible lines of text that will not be matched by /a./ Exercise: Construct all possible lines of text that will be matched by /.a.b./ Exercise: Regardless of their content, what lines of text will not be matched by /.a.b./

14 RE(10) Character Classes Any set of characters enclosed in brackets – The vowels [aeiou] Any range of consecutive ASCII coded characters enclosed in brackets – The lower case letters [a-z] – The digits [0-9] – The hex digits [0-9A-F]

15 RE(12) Including special characters in the set –To get ], use \] or []a-z] (Think about reading this string character by character to learn its meaning.) –To get -, use \- or [a-z-] Complementing (not complimenting) a set –Use ^ as leading character, [^0-9] or [^aeiou] More special characters –To get ^, use \^ or place it away from the first position [a-z^_]

16 RE(13) The Matching Game: [0123456789] [0-9] [0-9\-] [a-z0-9] [a-zA-Z0-9_] [^0-7] [^A-M.,;] [^\^] [0 - 9] [.]

17 RE(14) Short character set names \d means [0-9] \D means [^0-9] \w means [a-zA-Z0-9_] (identifier characters) \W means [^a-zA-Z0-9_] \s means [ \r\t\n\f] \S means [^ \r\t\n\f]

18 RE(15) More repetition symbols b* means zero or more repetitions of b, as does b{0,} b+ means one or more repetitions of b, as does b{1,} b? means zero or one repetitions of b, as does b{0,1} b{5,8} means five, six, seven or eight repetitions of b b{4} means exactly four repetitions of b

19 RE(16) Splitting a string split(/:/,$line) divides $line into substrings at the colons and places the substrings in a list (array) Note: Two adjacent colons :: produce an empty string. split(/:+/,$line) divides $line into nonempty substrings

20 Andy’s Problem Lines from a text file look like 105028|Adam Mrugalski|AJM Residential|1067 Shoecraft rd|Webster|NY|14580||||||ajmresidential@yahoo.com||No ||No|||Thu Dec 21 21:23:23 2006| 105029|robert ritchey|robert industries|po box 472|crockett |ca|94525|510-787- 7290|||||send2rr@gmail.com||No||No|||Fri Dec 22 02:54:54 2006| 105030|Jack Still|WISE TV|PO BOX 280|Coeburn|VA|24230|2763959339|||||wisetv19@msn.c om||No||No||9feet 1inch floor to floor. Connects to balcony. Need oak 4 feet round with landing at top. Send me a quote. J. Still WISE TV |Fri Dec 22 03:18:19 2006|

21 Andy (2) The lines need to be cleaned and parsed into several reports: Phone contact information Email contact information Address labels Full data base, checking for unique entries


Download ppt "CSC 4630 Meeting 21 April 4, 2007. Return to Perl Where are we? What is confusing? What practice do you need?"

Similar presentations


Ads by Google