Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.

Similar presentations


Presentation on theme: "LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15."— Presentation transcript:

1 LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15

2 Administrivia reminder –optional homework exercises (from lecture 5) –due tomorrow (usual rules apply) –for those of you who missed one or more questions on homework 1

3 Administrivia homework 2 –out next week –requires access to Microsoft Word –or an alternative Open Office (free download, see openoffice.org)

4 Today’s Topic Regular Expressions (RE)

5 Regular Expressions (formally) equivalent to –finite state automata (FSA), and –regular grammars used in –string pattern matching typically for a single word form search text: unix (e)grep, perl, microsoft word caution: –differences in notation and implementation Regular Grammars FSA Regular Expressions

6 Regular Expressions shorthand for describing sets of strings String –sequence of zero or more characters –(typically, unbroken by spaces) Examples –aaa –john –mary45 –NT$ –  (empty string)

7 Regular Expressions –shorthand string n –exactly n occurrences of string –n = 0,1,2,3,... examples –a 4 b 3 = aaaabbb –(uv) 2 = uvuv –((ab) 2 (ba) 2 ) 2 = ababbabaababbaba Note: –parentheses are used to group sequences of characters (strings)

8 Regular Expressions shorthand for describing sets of strings string + –set of one or more occurrences of string –i.e. the set {string 1, string 2, string 3,... } –Note: set is infinite examples –a + = {a, aa, aaa, aaaa, aaaaa, …} –(abc) + = {abc, abcabc, abcabcabc, …}

9 Regular Expressions shorthand for describing sets of strings string * –set of zero or more occurrences of string –i.e. the set {string 0, string 1, string 2, string 3,... } –string 0 =  (the empty string) examples –a * = {, a, aa, aaa, aaaa, …} –(abc) * = {, abc, abcabc, …} Note: –a a * = a + –a {, a, aa, aaa, aaaa, …} = {a, aa, aaa, aaaa, aaaaa, …} Language = a set of strings

10 Regular Expressions Wildcard Characters matches a range of characters. (period) matches any single character examples –. + ed = set of all strings of length 3 or greater containing ed and having at least one character preceding it worked bed pre-education ed education –. * fix = set of all strings of length 3 or greater containing fix prefix infix infixed suffix fix

11 Regular Expressions Wildcard Characters matches a range of characters [characters] (list of matching characters) matches any single character in the list examples –[s,z]ation organization organisation –[a-z] any character in the range lowercase a to z Note: not uppercase –[0-9] any digit

12 Regular Expressions: grep excerpts from the manpage –The caret ^ and the dollar sign $ are metacharacters that respectively match the empty string at the beginning and end of a line. –The symbol \b matches the empty string at the edge of a word –The symbols \ respectively match the empty string at the beginning and end of a word. terminology –word unbroken sequence of digits, underscores and letters

13 Regular Expressions: grep Excerpts from the manpage –A regular expression may be followed by one of several repetition operators: ? The preceding item is optional and matched at most once. * The preceding item will be matched zero or more times. + The preceding item will be matched one or more times. {n} The preceding item is matched exactly n times {n,} The preceding item is matched n or more times. {n,m} The preceding item is matched at least n times, but not more than m times.

14 Regular Expressions: GNU grep Excerpts from the manpage concatenation –Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions. disjunction – Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either subexpression.

15 Regular Expressions: Examples Regular Expression –gupp(y|ies) examples –guppy –guppies Regular Expression –beds? examples –bed –beds

16 Regular Expressions: Examples Example –\b99 matches 99 in “there are 99 bottles …” –but not in 99 in “there are 299 bottles …” –Note: $99 contains two words, so \b99 will match 99 here –word unbroken sequence of digits, underscores and letters

17 Regular Expressions: Examples Example (sheeptalk) –ba! –baa! –baaa! … regular expression –baa*! –ba+!

18 Regular Expressions: Microsoft Word terminology: –wildcard search

19 Regular Expressions: Microsoft Word


Download ppt "LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15."

Similar presentations


Ads by Google