Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong.

Similar presentations


Presentation on theme: "LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong."— Presentation transcript:

1 LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong

2 Chapter 2: JM s/([0-9]+)/<\1>/
Backreferences technically give Perl regexs more expressive power than Finite State Automata. I.e. no finite state machine can ever do this.

3 Shortest vs. Greedy Matching
default behavior in Perl RE match: take the longest possible matching string aka greedy matching This behavior can be changed, see next slide

4 Shortest vs. Greedy Matching
from Example: $_ = "The food is under the bar in the barn."; if ( /foo(.*?)bar/ ) { print ”matched <$1>\n"; } Output: greedy (.*): matched <d is under the bar in the > shortest (.*?): matched <d is under the > Notes: ? immediately following a repetition operator like * (or +) makes the operator work in non-greedy mode (.*?) (.*)

5 Regex: exponential time
Regex search is supposed to be fast but searching is not necessarily proportional to the length of the string (or corpus) being searched in fact, Perl RE matching can can take exponential time (in length) non-deterministic may need to backtrack (revisit) if it matches incorrectly part of the way through Consider a?a?a?aaa matching against the string aaa Let's do a perl –e time length exponential time length linear

6 Regex: exponential time
Now consider scaling up a?a?a?aaa, i.e. (a?)nan matching against an Reference:

7 Global Matching: scalar context
g flag in the condition of a while-loop \w is a word character []A-Za-z0-9_]

8 Global Matching: scalar context
Python: please read

9 Global Matching: list context
g flag in list context Disjunction: /(\w+)s|(\w+[a-rt-z])/g; /\w+[a-rt-z]/g

10 Split @array = split /re/, string
splits string into a list of substrings split by re. Each substring is stored as an element Examples (from perlrequick tutorial):

11 Split m allows you to redefine / the regex delimiter character

12 Python: split

13 Matched Positions

14 Matched Positions

15 Variables in regex Example: qr/regex/ $text =~ /(a?){$n}a{$n}/
$n defined earlier {..} is an repetition operator qr/regex/ This operator quotes (and possibly compiles) regex as a regular expression. $re = qr/regex/; $line =~ m!$re $re!;

16 Zipf's Law Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. See: Brown Corpus (1,015,945 words): only 135 words are needed to account for half the corpus. On a Log – Log scale: almost straight line

17 Character Frequency Counting
Sample code is rather interesting (-e flag): (?{ Perl code }) Slightly modified but easier to read: note: lowercase

18 Character Frequency Counting


Download ppt "LING/C SC/PSYC 438/538 Lecture 12 Sandiway Fong."

Similar presentations


Ads by Google