Week 14 - Friday CS221.

Slides:



Advertisements
Similar presentations
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Advertisements

Regular Expressions, Backus-Naur Form and Reverse Polish Notation.
Week 13 - Wednesday.  What did we talk about last time?  Exam 3  Before review:  Graphing functions  Rules for manipulating asymptotic bounds  Computing.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Fall 2007CS 2251 Miscellaneous Topics Cloning Patterns Recursion and Grammars.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Languages, grammars, and regular expressions
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Copyright © Cengage Learning. All rights reserved.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Finite-State Machines with No Output
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Grammars CPSC 5135.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Recursive Definations Regular Expressions Ch # 4 by Cohen
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CSE 311 Foundations of Computing I Lecture 19 Recursive Definitions: Context-Free Grammars and Languages Autumn 2012 CSE
Week 13 - Wednesday.  What did we talk about last time?  Exam 3  Before review:  Graphing functions  Rules for manipulating asymptotic bounds  Computing.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
CSE 311 Foundations of Computing I Lecture 18 Recursive Definitions: Context-Free Grammars and Languages Autumn 2011 CSE 3111.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
Week 13 - Friday.  What did we talk about last time?  Regular expressions.
Department of Software & Media Technology
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
CS 3304 Comparative Languages
Regular Expressions, Backus-Naur Form and Reverse Polish Notation
Chapter 3 – Describing Syntax
CS314 – Section 5 Recitation 2
Theory of Computation Lecture #
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
CS 326 Programming Languages, Concepts and Implementation
CS510 Compiler Lecture 4.
Chapter 2 :: Programming Language Syntax
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
Natural Language Processing - Formal Language -
CS314 – Section 5 Recitation 3
Formal Language Theory
Chapter Seven: Regular Expressions
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Context-Free Grammars
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Subject Name: FORMAL LANGUAGES AND AUTOMATA THEORY
Compiler Construction
Chapter 2 :: Programming Language Syntax
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
Chapter 2 :: Programming Language Syntax
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
COMPILER CONSTRUCTION
Presentation transcript:

Week 14 - Friday CS221

Last time What did we talk about last time? String matching

Questions?

Project 4

Assignment 7

Regular Expressions Student Lecture

Regular Expressions

Regular expressions In theoretical CS, a language is a set of strings Notation called a regular expression can allow us to express some languages precisely and compactly Given an alphabet, we can define regular expressions recursively: Base: The empty string ε, and any individual character in the alphabet is a regular expression Recursion: If r and s are regular expressions, then the following are too: Concatenation: (rs) Alternation: (r | s) Kleene star: (r*) Restriction: Nothing else is a regular expression

Examples Let our alphabet = {a, b, c} ε is a special symbol that means the empty string Let our regular expression be: a | (b | c)* | (ab)* Write 5 strings that match this regular expression ab* (c |ε)

Order of precedence For the sake of consistency, regular expressions obey a particular order of precedence * is the highest precedence Concatenation is the next highest Alternation is the lowest Parentheses can be omitted if there is no ambiguity Write (a((bc)*)) with as few parentheses as possible Write a | b* c, using parentheses to mark the precedence of each operation

Equivalences Can you describe (a | b)* with another regular expression? What about ( ε | a* | b* )*? Given the regular expression: a*b(a | b)* Write 5 strings that belong to it. Can you describe the strings accepted by it in words? a* | (ab)* Which of the following are accepted by it? a b aaaa abba ababab

Examples Let the alphabet be {0, 1} Find regular expressions for the following languages: The language of all strings of 0's and 1's that have even length and in which the 0's and 1's alternate The language consisting of all strings of 0's and 1's with an even number of 1's The language consisting of all strings of 0's and 1's that do not contain two consecutive 1's The language that gives all binary numbers written in normal form (that is, without leading zeroes, and the empty string is not allowed)

Practical notation Regular expressions are used in some programming languages (notably Perl) and in grep and other find and replace tools The notation is generally extended to make it a little easier, as in the following: [ A – C] means any character in that range, [A – C] means ( A | B | C ) [0 – 9] means ( 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ) [ABC] means (A | B | C ) ABC means the concatenation of A, B, and C A dot stands for any letter: A.C could match AxC, A&C, ABC ^ means NOT, thus [^D – Z] means not the characters D through Z Repetitions: R? means 0 or 1 repetitions of R R* means 0 or more repetitions of R R+ means 1 or more repetitions of R Notations vary and have considerable complexity Use this notation to describe the regular expression for legal Java identifiers

Java regular expressions Java has regular expressions, but they are a little bit annoying to use They use two classes: Pattern Matcher If you want to make a regular expression of, for example ab*(cd)*: Pattern pattern = Pattern.compile("ab*(cd)*");

Using Matcher With a regular expression compiled, you can apply it to a String to get a matcher: You can see if text matches the whole regular expression: Matcher matcher = pattern.matcher(text); if( matcher.matches() ) System.out.println("It matches!");

More Matcher And what is more valuable, you can use a Matcher to find each matching group in text while( matcher.find() ){ System.out.println("Found the pattern \"" + matcher.group() + "\" starting at " + matcher.start() + " and ending at " + matcher.end()); }

Escaping Many characters have special meanings in regular expressions: \d Any digit \s Whitespace ^ Beginning of a line $ End of a line What if you want to match a $? Use a backslash to escape the $: \$

Escaping the escaping Unfortunately, using a backslash has a special meaning inside of Java String literals Thus, you have to escape any backslash with another backslash What if you want to recognize money: $238.00 A dollar sign followed by one or more digits followed by a period followed by two digits Pattern money = Pattern.compile("\\$\\d+\\.\\d\\d");

More information For more information on regular expressions in Java, try: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html It takes practice to get exactly the regular expression you want A great website for testing regular expressions in real time is RegExr: https://regexr.com/ Note that the expressions on RegExr still need to be escaped before you can use them in Java

Limitations on regular expressions Regular expressions are equivalent to DFAs For every regular expression, there is at least one DFA that accepts exactly the same language, and vice versa DFAs only have finite states It's impossible to do tasks that depend on distinguishing an unlimited number of states Regular expressions cannot: Count Detect palindromes Match braces

Context-free grammars For tasks beyond simple recognition and matching, a more powerful tool is needed A context-free grammar (CFG) allow counting, recognizing palindromes, and matching braces The syntax for Java and most other programming languages can be described with a CFG A parser is a tool built from a CFG to recognize and work with such languages Natural (human) languages are even worse, requiring a context-sensitive grammar Natural languages are hard to work with and break lots of rules If you want to know about CFGs and parsers, take Compilers

Upcoming

Next time… Review everything up to Exam 1

Reminders Finish Assignment 7 Work on Project 4 Review up to Exam 1 Due tonight before midnight Work on Project 4 Due next Friday Review up to Exam 1