Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 14 - Friday CS221.

Similar presentations


Presentation on theme: "Week 14 - Friday CS221."— Presentation transcript:

1 Week 14 - Friday CS221

2 Last time What did we talk about last time? String matching

3 Questions?

4 Project 4

5 Assignment 7

6 Regular Expressions Student Lecture

7 Regular Expressions

8 Regular expressions In theoretical CS, a language is a set of strings
Notation called a regular expression can allow us to express some languages precisely and compactly Given an alphabet, we can define regular expressions recursively: Base: The empty string ε, and any individual character in the alphabet is a regular expression Recursion: If r and s are regular expressions, then the following are too: Concatenation: (rs) Alternation: (r | s) Kleene star: (r*) Restriction: Nothing else is a regular expression

9 Examples Let our alphabet = {a, b, c}
ε is a special symbol that means the empty string Let our regular expression be: a | (b | c)* | (ab)* Write 5 strings that match this regular expression ab* (c |ε)

10 Order of precedence For the sake of consistency, regular expressions obey a particular order of precedence * is the highest precedence Concatenation is the next highest Alternation is the lowest Parentheses can be omitted if there is no ambiguity Write (a((bc)*)) with as few parentheses as possible Write a | b* c, using parentheses to mark the precedence of each operation

11 Equivalences Can you describe (a | b)* with another regular expression? What about ( ε | a* | b* )*? Given the regular expression: a*b(a | b)* Write 5 strings that belong to it. Can you describe the strings accepted by it in words? a* | (ab)* Which of the following are accepted by it? a b aaaa abba ababab

12 Examples Let the alphabet be {0, 1}
Find regular expressions for the following languages: The language of all strings of 0's and 1's that have even length and in which the 0's and 1's alternate The language consisting of all strings of 0's and 1's with an even number of 1's The language consisting of all strings of 0's and 1's that do not contain two consecutive 1's The language that gives all binary numbers written in normal form (that is, without leading zeroes, and the empty string is not allowed)

13 Practical notation Regular expressions are used in some programming languages (notably Perl) and in grep and other find and replace tools The notation is generally extended to make it a little easier, as in the following: [ A – C] means any character in that range, [A – C] means ( A | B | C ) [0 – 9] means ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ) [ABC] means (A | B | C ) ABC means the concatenation of A, B, and C A dot stands for any letter: A.C could match AxC, A&C, ABC ^ means NOT, thus [^D – Z] means not the characters D through Z Repetitions: R? means 0 or 1 repetitions of R R* means 0 or more repetitions of R R+ means 1 or more repetitions of R Notations vary and have considerable complexity Use this notation to describe the regular expression for legal Java identifiers

14 Java regular expressions
Java has regular expressions, but they are a little bit annoying to use They use two classes: Pattern Matcher If you want to make a regular expression of, for example ab*(cd)*: Pattern pattern = Pattern.compile("ab*(cd)*");

15 Using Matcher With a regular expression compiled, you can apply it to a String to get a matcher: You can see if text matches the whole regular expression: Matcher matcher = pattern.matcher(text); if( matcher.matches() ) System.out.println("It matches!");

16 More Matcher And what is more valuable, you can use a Matcher to find each matching group in text while( matcher.find() ){ System.out.println("Found the pattern \"" + matcher.group() + "\" starting at " + matcher.start() + " and ending at " + matcher.end()); }

17 Escaping Many characters have special meanings in regular expressions:
\d Any digit \s Whitespace ^ Beginning of a line $ End of a line What if you want to match a $? Use a backslash to escape the $: \$

18 Escaping the escaping Unfortunately, using a backslash has a special meaning inside of Java String literals Thus, you have to escape any backslash with another backslash What if you want to recognize money: $238.00 A dollar sign followed by one or more digits followed by a period followed by two digits Pattern money = Pattern.compile("\\$\\d+\\.\\d\\d");

19 More information For more information on regular expressions in Java, try: It takes practice to get exactly the regular expression you want A great website for testing regular expressions in real time is RegExr: Note that the expressions on RegExr still need to be escaped before you can use them in Java

20 Limitations on regular expressions
Regular expressions are equivalent to DFAs For every regular expression, there is at least one DFA that accepts exactly the same language, and vice versa DFAs only have finite states It's impossible to do tasks that depend on distinguishing an unlimited number of states Regular expressions cannot: Count Detect palindromes Match braces

21 Context-free grammars
For tasks beyond simple recognition and matching, a more powerful tool is needed A context-free grammar (CFG) allow counting, recognizing palindromes, and matching braces The syntax for Java and most other programming languages can be described with a CFG A parser is a tool built from a CFG to recognize and work with such languages Natural (human) languages are even worse, requiring a context-sensitive grammar Natural languages are hard to work with and break lots of rules If you want to know about CFGs and parsers, take Compilers

22 Upcoming

23 Next time… Review everything up to Exam 1

24 Reminders Finish Assignment 7 Work on Project 4 Review up to Exam 1
Due tonight before midnight Work on Project 4 Due next Friday Review up to Exam 1


Download ppt "Week 14 - Friday CS221."

Similar presentations


Ads by Google