Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.

Slides:



Advertisements
Similar presentations
CSE 311 Foundations of Computing I
Advertisements

4b Lexical analysis Finite Automata
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Regular Expressions and DFAs COP 3402 (Summer 2014)
Finite Automata CPSC 388 Ellen Walker Hiram College.
CSE 311: Foundations of Computing Fall 2013 Lecture 23: Finite state machines and minimization.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Compiler Construction
Introduction to Computability Theory
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Fall 2006Costas Busch - RPI1 Regular Expressions.
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
Fall 2006Costas Busch - RPI1 Non-Deterministic Finite Automata.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
1 Non-Deterministic Automata Regular Expressions.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Costas Busch - LSU1 Non-Deterministic Finite Automata.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lecture 05: Theory of Automata:08 Kleene’s Theorem and NFA.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Regular Expressions and Languages A regular expression is a notation to represent languages, i.e. a set of strings, where the set is either finite or contains.
Lexer input and Output i d 3 = 0 LF w i d 3 = 0 LF w id3 = 0 while ( id3 < 10 ) id3 = 0 while ( id3 < 10 ) lexer Stream of Char-s ( lazy List[Char] ) class.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Exercise: (aa)* | (aaa)* Construct automaton and eliminate epsilons.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Lexical Analysis Uses formalism of Regular Languages Uses formalism of Regular Languages Regular Expressions Regular Expressions Deterministic Finite.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 Finite Automata. 2 Introductory Example An automaton that accepts all legal Pascal identifiers: Letter Digit Letter or Digit "yes" "no" 2.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Finite automate.
Lexical analysis Finite Automata
Non Deterministic Automata
Chapter 2 FINITE AUTOMATA.
Non-Deterministic Finite Automata
Non Deterministic Automata
CSE 311: Foundations of Computing
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Instructor: Aaron Roth
CSE 311 Foundations of Computing I
Chapter 1 Regular Language
Lecture 5 Scanning.
Presentation transcript:

Automating Construction of Lexers

Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code for lexer! But how does javacc do it?

A Recap: Simple RE to Programs Regular Expression a r1 r2 (r1|r2) r* Code if (current=a) next else error (code for r1) ; (code for r2) if (current in first(r1)) code for r1 else code for r2 while(current in first(r)) code for r

Regular Expression to Programs How can we write a lexer for (a*b | a) ? aaaab Vs aaaaa Regular Expression Finite state machine (FSA) Program

Finite Automaton (Finite State Machine) A = ( , Q, q 0, , F)  - alphabet Q - states (nodes in the graph) q 0 - initial state (with ‘->' sign in drawing)  - transitions (labeled edges in the graph) F - final states (double circles)

Numbers with Decimal Point digit digit*. digit digit* What if the decimal part is optional?

Automata Tutor A website for learning automata We have posted some exercises for you to try. Create an account for yourself Register to the course –Course Id: 23EPFL-CL –Password: GHL2AQ3I

Exercise Design a DFA which accepts all strings in {a, b}* that has an even length

Exercise Construct a DFA that recognizes all strings over {a, b} that contain "aba" as a substring

Exercise

Solution: the product automaton (0,0) (1,0) (1,1) (0,2) (1,3) (0,3) (0,1) (1,2)

Exercise Design a DFA which accepts all the numbers written in binary and divisible by 2. For example, your automaton should accept the words 0, 10, 100, 110…

Exercise Design a DFA which accepts all the numbers written in binary and divisible by 3. For example your automaton should accept the words 0, 11, 110, 1001, 1100 … Can you prove that the automaton accepts language ? Can you generalize this to any divisor ‘n’ and any base ‘b’ ? –Answers are in the next lecture slides

Kinds of Finite State Automata

Undefined Transitions Undefined transitions lead to a sink state from where no input can be accepted

Epsilon Transitions Epsilon transitions: traversing them does not consume anything (empty word) More generally, transitions labeled by a word: traversing such transition consumes that entire word at a time

Interpretation of Non-Determinism For a given word (string), a path in automaton lead to accepting, another to a rejecting state Does the automaton accept in such case? –yes, if there exists an accepting path in the automaton graph whose symbols give that word

Exercise Construct a NFA that recognizes all strings over {a,b} that contain "aba" as a substring

NFA Vs DFA For every NFA there exists an equivalent DFA that accepts the same set of strings But, NFAs could be exponentially smaller. That is, there are NFAs such that every DFA equivalent to it has exponentially more number of states

Exercise

Solution: DFA –Can you prove that every DFA for this language will have exponentially more states than the NFA ? –Hints: Why is every intermediate state necessary ? –Can you minimize the DFA any further ?

Regular Expressions and Automata Theorem: If L is a set of words, it is describable by a regular expression iff (if and only if) it is the set of words accepted by some finite automaton. Algorithms: regular expression  automaton (important!) automaton  regular expression (cool)

Recursive Constructions Union Concatenation

Recursive Constructions Star

Exercise: (aa)* | (aaa)* Construct an NFA for the regular expression

NFAs to DFAs (Determinisation) keep track of a set of all possible states in which the automaton could be view this finite set as one state of new automaton

NFA to DFA Conversion

{0,5,12,1,6} {2,7,3, 8} a {4,1,9, 10} a {11,6, 2,3} a {4,1,7, 8} {9,10, 2,3} {4,1,11,6} a a a a

NFA to DFA Example {0,5,1 2,1,6} {2,7,3, 8} a {4,1,9, 10} a {11,6, 2,3} a {4,1,7, 8} {9,10, 2,3} {4,1,11,6} a a a a

Remark: Relations and Functions Relation r  B x C r = {..., (b,c1), (b,c2),... } Corresponding function: f : B -> 2 C f = {... (b,{c1,c2})... } f(b) = { c | (b,c)  r } Given a state, next-state function returns the set of new states –for deterministic automaton, the set has exactly 1 element

Clarifications

Running NFA (without epsilons) in Scala def  (q : State, a : Char) : Set[States] = {... } def  '(S : Set[States], a : Char) : Set[States] = { for (q1 <- S, q2 <-  (q1,a)) yield q2 } def accepts(input : MyStream[Char]) : Boolean = { var S : Set[State] = Set(q0) // current set of states while (!input.EOF) { val a = input.current S =  '(S,a) // next set of states } !(S.intersect(finalStates).isEmpty) }

Running NFA in Scala

Minimizing DFAs

Minimizing DFAs: Procedure Maintain a partition A of states Every set in the partition has a different behavior i.e, they have a distinguishing string States within a partition may or may not be equivalent Initially, we have (F, Q - F)

Minimizing DFAs: Procedure [Cont.] Pick any partition P, choose some alphabet ‘a’. Split every partition (including P) by separating the states that has a transition to a state in P on ‘a’, and those that do not. Repeat until no partition can be split. That is, no choice of P and ‘a’ will split any partition

Minimizing DFAs: Procedure A: {0,2,3,4,6} {1,5} split based on {0,2,3,4,6} –A: {0,4,6} {2,3} {1,5} split based on {2,3} –A: {0,4,6} {2,3} {1} {5} split based on {1} –A: {0,6} {4} {2,3} {1} {5} split based on {4} –A: {0,6} {4} {2} {3} {1} {5}

Minimizing DFAs: Procedure The minimal DFA is unique (up to isomorphism) Implication of Myhill-Nerode theorem Food For Thought: Can we minimize NFA ?

Properties of Automatons

Exercise Design a DFA which accepts all the numbers written in binary and divisible by 6. For example your automaton should accept the words 0, 110 (6 decimal) and (18 decimal). –You can construct the product of the following automatons that accept numbers divisible by 2 and 3

Solution: Product Automaton (0,0) (1,1) (1,0) (0,2) (0,1) (1,2)

Exercise: first, nullable For each of the following languages find the first set. Determine if the language is nullable. –(a|b)* (b|d) ((c|a|d)* | a*) Answer: –First = { a, b, d } – not nullabe, the minimal strings belonging to the regex are ‘b’ and ‘d’