Two issues in lexical analysis

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
1 CIS 461 Compiler Design and Construction Fall 2012 slides derived from Tevfik Bultan et al. Lecture-Module 5 More Lexical Analysis.
Chapter 2 Lexical Analysis Nai-Wei Lin. Lexical Analysis Lexical analysis recognizes the vocabulary of the programming language and transforms a string.
LEXICAL ANALYSIS Phung Hua Nguyen University of Technology 2006.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Decidable Questions About Regular languages 1)Membership problem: “Given a specification of known type and a string w, is w in the language specified?”
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
CS 321 Programming Languages and Compilers Lectures 16 & 17 Introduction to Formal Languages Regular Languages Lexical Analysis.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
Deterministic Finite Automata Nondeterministic Finite Automata.
1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Department of Software & Media Technology
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CIS Automata and Formal Languages – Pei Wang
Finite automate.
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Finite-State Machines (FSMs)
Chapter 2 Finite Automata
The time complexity for e-closure(T).
Recognizer for a Language
Review: NFA Definition NFA is non-deterministic in what sense?
Chapter 2 FINITE AUTOMATA.
Jaya Krishna, M.Tech, Assistant Professor
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
THEORY OF COMPUTATION Lecture One: Automata Theory Automata Theory.
Deterministic Finite Automata And Regular Languages Prof. Busch - LSU.
Recognition of Tokens.
Finite Automata.
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
Chapter 3. Lexical Analysis (2)
Compiler Construction
4b Lexical analysis Finite Automata
Chapter 1 Regular Language
Lecture 5 Scanning.
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Presentation transcript:

Two issues in lexical analysis Specifying tokens (regular expression) Identifying tokens specified by regular expression.

How to recognize tokens specified by regular expressions? A recognizer for a language is a program that takes a string x as input and answers “yes” if x is a sentence of the language and “no” otherwise. In the context of lexical analysis, given a string and a regular expression, a recognizer of the language specified by the regular expression answer “yes” if the string is in the language. A regular expression can be compiled into a recognizer (automatically) by constructing a finite automata which can be deterministic or non-deterministic.

Non-deterministic finite automata (NFA) A non-deterministic finite automata (NFA) is a mathematical model that consists of: (a 5-tuple a set of states Q a set of input symbols a transition function that maps state-symbol pairs to sets of states. A state q0 that is distinguished as the start (initial) state A set of states F distinguished as accepting (final) states. An NFA accepts an input string x if and only if there is some path in the transition graph from the start state to some accepting state. Show an NFA example (page 116, Figure 3.21).

For example, here is an NFA that recognizes the language ???. An NFA is non-deterministic in that (1) same character can label two or more transitions out of one state (2) empty string can label transitions. For example, here is an NFA that recognizes the language ???. An NFA can easily implemented using a transition table. State a b 0 {0, 1} {0} 1 - {2} 2 - {3} a 1 2 3 a b b b

The algorithm that recognizes the language accepted by NFA. Input: an NFA (transition table) and a string x (terminated by eof). output “yes” if accepted, “no” otherwise. S = e-closure({s0}); a = nextchar; while a != eof do begin S = e-closure(move(S, a)); a := next char; end if (intersect (S, F) != empty) then return “yes” else return “no” Note: e-closure({S}) are the state that can be reached from states in S through transitions labeled by the empty string.

Example: recognizing ababb from previous NFA Example2: Use the example in Fig. 3.27 for recognizing ababb Space complexity O(|S|), time complexity O(|S|^2|x|)??

Construct an NFA from a regular expression: Input: A regular expression r over an alphabet Output: An NFA N accepting L( r ) Algorithm (3.3, pages 122): For , construct the NFA For a in , construct the NFA Let N(s) and N(t) be NFA’s for regular s and t: for s|t, construct the NFA N(s|t): For st, construct the NFA N(st): For s*, construct the NFA N(s*): a N(s) N(t) N(s) N(t) N(s)

Example: r = (a|b)*abb. Example: using algorithm 3.3 to construct N( r ) for r = (ab | a)*b* | b.

Using NFA, we can recognize a token in O(|S|^2|X|) time, we can improve the time complexity by using deterministic finite automaton instead of NFA. An NFA is deterministic (a DFA) if no transitions on empty-string for each state S and an input symbol a, there is at most one edge labeled a leaving S. What is the time complexity to recognize a token when a DFA is used?

Algorithm to convert an NFA to a DFA that accepts the same language (algorithm 3.2, page 118) initially e-closure(s0) is the only state in Dstates and it is unmarked while there is an unmarked state T in Dstates do begin mark T; for each input symbol a do begin U := e-closure(move(T, a)); if (U is not in Dstates) then add U as an unmarked state to Dstates; Dtran[T, a] := U; end end; Initial state = e-closure(s0), Final state = ?

Example: page 120, fig 3.27. Question: for a NFA with |S| states, at most how many states can its corresponding DFA have?