Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Lecture 3: Closure Properties & Regular Expressions Jim Hook Tim Sheard Portland State University.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
CPSC 388 – Compiler Design and Construction
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Lexical Analysis Constructing a Scanner from Regular Expressions.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Complexity and Computability Theory I Lecture #5 Rina Zviel-Girshin Leah Epstein Winter
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CIS Automata and Formal Languages – Pei Wang
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Non Deterministic Automata
Nondeterministic Finite Automata (NFA)
RegExps & DFAs CS 536.
CSE 105 theory of computation
Finite-State Machines (FSMs)
Chapter 2 Finite Automata
CS314 – Section 5 Recitation 3
REGULAR LANGUAGES AND REGULAR GRAMMARS
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
CS 154, Lecture 3: DFANFA, Regular Expressions.
Review: Compiler Phases:
COSC 3340: Introduction to Theory of Computation
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Non-Deterministic Finite Automata
Non Deterministic Automata
4b Lexical analysis Finite Automata
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Instructor: Aaron Roth
Instructor: Aaron Roth
CSE 105 theory of computation
Chapter 1 Regular Language
Lecture 5 Scanning.
Part Two : Nondeterministic Finite Automata
CSE 105 theory of computation
Presentation transcript:

Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st - H1 posted. Due Feb 5th.

Nondeterministic Finite Automata CS 536

Previous Lecture Scanner: converts a sequence of characters to a sequence of tokens Scanner and parser: master-slave relationship Scanner implemented using FSMs FSM: DFA or NFA

This Lecture NFAs from a formal perspective Theorem: NFAs and DFAs are equivalent Regular languages and Regular expressions

NFAs, formally M ≡ finite set of states final states the alphabet (characters) transition function start state 1 s1 {s1} {s1, s2} s2

NFA To check if string is in L(M) of NFA M, simulate set of choices it could make 1 1 1 s1 s2 st st s1 s1 s2 st s1 s1 s1 s2 s1 s1 s1 s1 At least one sequence of transitions that: Consumes all input (without getting stuck) Ends in one of the final states

NFA and DFA are Equivalent Two automata M and M’ are equivalent iff L(M) = L(M’) Lemmas to be proven Lemma 1: Given a DFA M, one can construct an NFA M’ that recognizes the same language as M, i.e., L(M’) = L(M) Lemma 2: Given an NFA M, one can construct a DFA M’ that recognizes the same language as M, i.e., L(M’) = L(M)

Proving lemma 2 Lemma 2: Given an NFA M, one can construct a DFA M’ that recognizes the same language as M, i.e., L(M’) = L(M) Idea: we can only be in finitely many subsets of states at any one time possible combinations of states Why?

Why 2^|Q| states? Build DFA that tracks set of states the NFA is in! A B C 0 0 0 = {} 0 0 1 = {C} 0 1 0 = {B} 0 1 1 = {B,C} 1 0 0 = {A} 1 0 1 = {A,C} 1 1 0 = {A,B} 1 1 1 = {A,B,C} Build DFA that tracks set of states the NFA is in!

Defn: let succ(s,c) be the set of choices the NFA x x,y x,y A B C D x,y Defn: let succ(s,c) be the set of choices the NFA could make in state s with character c succ(A,x) = {A,B} succ(A,y) = {A} succ(B,x) = {C} succ(B,y) = {C} succ(C,x) = {D} succ(C,y) = {D}

Build new DFA M’ where Q’ = 2Q x x,y x,y A B C D x,y Build new DFA M’ where Q’ = 2Q succ(A,x) = {A,B} succ(A,y) = {A} succ(B,x) = {C} succ(B,y) = {C} succ(C,x) = {D} succ(C,y) = {D} To build DFA: Add an edge from state S on character c to state S’ if S’ represents the union of states that all states in S could possibly transition to on input c

ɛ-transitions x x AB C B AD E F A Useful for taking union of two FSMs Eg: xn, where n is even or divisible by 3 Useful for taking union of two FSMs In example, left side accepts even n; right side accepts n divisible by 3 x x AB C B AD E F A

Eliminating ɛ-transitions We want to construct ɛ-free FSM M’ that is equivalent to M Definition: eclose(s) = set of all states reachable from s in zero or more epsilon transitions M’ components s is an accepting state of M’ iff eclose(s) contains an accepting state s —c—> t is a transition in M’ iff q —c—> t for some q in eclose(s)

Eliminating ɛ-transitions We want to construct ɛ-free NFA M’ that is equivalent to M Definition: Epsilon Closure eclose(s) = set of all states reachable from s using zero or more epsilon transitions eclose A {A, B, D} B {B} C {C} D {D} E {E} F {F}

Def: eclose(s) = set of all states reachable from s in zero or more epsilon transitions s is an accepting state of M’ iff eclose(s) contains an accepting state s —c—> t is a transition in M’ iff q —c—> t for some q in eclose(s)

Recap NFAs and DFAs are equally powerful any language definable as an NFA is definable as a DFA ɛ-transitions do not add expressiveness to NFAs we showed a simple algorithm to remove epsilons

Regular Languages and Regular Expressions

Regular Language Any language recognized by an FSM is a regular language Examples: Single-line comments beginning with // Integer literals {ε, ab, abab, ababab, abababab, …. } C/C++ identifiers

Regular expressions Pattern describing a language operands: single characters, epsilon operators: from low to high precedence alternation “or”: a | b catenation: a.b, ab, a^3 (which is aaa) iteration: a* (0 or more a’s) aka Kleene star

Why do we need them? Each token in a programming language can be defined by a regular language Scanner-generator input: one regular expression for each token to be recognized by scanner Regular expressions are inputs to a scanner generator

Regexp, cont’d Conventions: a+ is aa* letter is a|b|c|d|…|y|z|A|B|…|Z digit is 0|1|2|…|9 not(x) all characters except x . is any character parentheses for grouping, e.g., (ab)* ɛ, ab, abab, ababab

Regexp, example Hex strings 0(x|X)hexdigit+(L|l|ɛ) start with 0x or 0X followed by one or more hexadecimal digits optionally end with l or L 0(x|X)hexdigit+(L|l|ɛ) where hexdigit = digit|a|b|c|d|e|f|A|…|F

Regexp, example Single-line comments in Java/C/C++ //(not(‘\n’))*’\n’ // this is a comment //(not(‘\n’))*’\n’

Regexp, example C/C++ identifiers: sequence of letters/digits/ underscores; cannot begin with a digit; cannot end with an underscore Example: a, _bbb7, cs_536 Regular expression letter | (letter|_)(letter|digit|_)*(letter|digit)

Recap Regular Languages Languages recognized/defined by FSMs Regular Expressions Single-pattern representations of regular languages Used for defining tokens in a scanner generator

Creating a Scanner Scanner Generator Last lecture: DFA to code This lecture: NFA to DFA Next lecture: Regexp to NFA This lecture: token to Regexp Scanner Scanner Generator