Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …

Similar presentations


Presentation on theme: "LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …"— Presentation transcript:

1 LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong

2 Adminstrivia Homework 4 not yet graded …

3 Today's Topics Homework 4 review Perl regex

4 Homework 2 Review Sample data file: First try.. just try to detect a repeated word

5 Homework 2 Review Sample data file: Sample output:

6 Homework 2 Review Key: think algorithmically… – think of a specific example first w1w1 w2w2 w3w3 w4w4 w5w5 Compare w 1 with w 2 Compare w 2 with w 3 Compare w 3 with w 4 Compare w 4 with w 5

7 Array indices start from 0… Homework 2 Review Generalize specific example, then code it up Compare w 1 with w 1+1 Compare w 2 with w 2+1 Compare w n-2 with w n-2+1 Compare w n-1 with w n “for” loop implementation words 0,words 1 … words n-1 array @words Array indices end just before $#words…

8 Homework 2 Review

9

10

11

12

13

14 a decent first pass …

15 Homework 2 Review Sample data file: Output:

16 Homework 2 Review Second try.. merging multiple occurrences

17 Homework 2 Review Second try.. merging multiple occurrences Sample data file: Output:

18 Homework 2 Review Third try.. implementing a simple table of exceptions

19 Homework 2 Review Third try.. table of exceptions Sample data file: Output:

20 Perl regex more powerful than simple wildcard matching, e.g. files –rm *.jpg, rm PIC000?.JPG Regular expression pattern matching: – regular expressions are patterns using operators: * (zero or more occurrences), + (one or more occurrences), ? (optional), | (disjunction) – widely used in many areas – theoretically equivalent to Type-3 languages in the Chomsky hierarchy less powerful than Context-free languages etc.

21 Perl regex Perl regular expression (re) matching: –$a =~ /foo/ –/…/ contains a regular expression – will evaluate to true/false depending on what’s contained in $a Perl regular expression (re) match and substitute: –$a =~ s/foo/bar/ –s/…match… /…substitute… / – will modify $a by looking for a single occurrence of match and replacing that with substitute –s/…match… /…substitute… /g –g = flag: global match and substitute

22 Perl regex Typically useful with the standard code template for reading in a file line-by-line: open($txtfile,$ARGV[0]) or die "$ARGV[0] not found!\n"; while ($line = ) { if ($line =~ /..regex../) { do stuff… }

23 Chapter 2: JM character class: Perl lingo

24 Chapter 2: JM

25 Backslash lowercase letter for class Uppercase variant for all but class

26 Unicode and \w \w is [0-9A-Za-z_] Definition is expanded for Unicode: use utf8; use open qw(:std :utf8); my $str = "school école École šola trường स्कूल škole โรงเรียน "; @words = ($str =~ /(\w+)/g); foreach $word (@words) { print "$word\n" } list context

27 Chapter 2: JM

28 Sheeptalk

29 Chapter 2: JM

30 Precedence of operators – Example: Column 1 Column 2 Column 3 … – /Column [0-9]+ */ – /(Column [0-9]+ *)*/ – /house(cat(s|)|)/ Perl: – in a regular expression the pattern matched by within the pair of parentheses is stored in designated variables $1 (and $2 and so on) Precedence Hierarchy: space

31 Chapter 2: JM A shortcut: list context for matching http://perldoc.perl.org/perlretut.html returns a list returns 1 (true) or “” (empty if false)

32 Chapter 2: JM s/([0-9]+)/ / what does this do? Backreferences give Perl regexps more expressive power than finite state automata (fsa)

33 Shortest vs. Greedy Matching default behavior – in Perl RE match: take the longest possible matching string – aka greedy matching This behavior can be changed, see next slide

34 Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched \n"; } Output: –matched Notes: – ? immediately following a repetition operator like * (or +) makes the operator work in non-greedy mode

35 Shortest vs. Greedy Matching from http://www.perl.com/doc/manual/html/pod/perlre.html Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched \n"; } Output: – greedy: matched – shortest: matched (.*?) (.*)

36 Shortest vs. Greedy Matching RE search is supposed to be fast – but searching is not necessarily proportional to the length of the input being searched – in fact, Perl RE matching can can take exponential time (in length) – non-deterministic may need to backtrack (revisit) if it matches incorrectly part of the way through time length linear time length exponential

37 Global Matching: scalar context g flag in the condition of a while-loop

38 Global Matching: list context g flag in list context

39 Split @array = split /re/, string – splits string into a list of substrings split by re. Each substring is stored as an element of @array. Examples (from perlrequick tutorial):

40 Split

41 Matched Positions

42


Download ppt "LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …"

Similar presentations


Ads by Google