Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.

Slides:

Advertisements

Similar presentations

Automata Theory Part 1: Introduction & NFA November 2002.

Advertisements

4b Lexical analysis Finite Automata

Lecture 24 MAS 714 Hartmut Klauck

Finite Automata CPSC 388 Ellen Walker Hiram College.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.

1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.

Lecture # 5. Topics Minimization of DFA Examples What are the Important states of NFA? How to convert a Regular Expression directly into a DFA ?

Introduction to Computability Theory

1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.

Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.

Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.

1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata (cont.) Prof. Amos Israeli.

1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.

1 Pertemuan Lexical Analysis (Scanning) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.

Topic #3: Lexical Analysis

CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.

Finite-State Machines with No Output

Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.

CS308 Compiler Principles Lexical Analyzer Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University Fall 2012.

1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.

Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.

Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.

Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.

4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.

COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.

TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.

1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

1 Languages and Compilers (SProg og Oversættere) Lexical analysis.

Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,

Lexical Analysis: Finite Automata CS 471 September 5, 2007.

Chapter 3 Chang Chi-Chung The Role of the Lexical Analyzer Lexical Analyzer Parser Source Program Token Symbol Table getNextToken error.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.

Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.

Lecture # 4 Chapter 1 (Left over Topics) Chapter 3 (continue)

Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.

Fall 2003CS416 Compiler Design1 Lexical Analyzer Lexical Analyzer reads the source program character by character to produce tokens. Normally a lexical.

Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.

Lecture # 12. Nondeterministic Finite Automaton (NFA) Definition: An NFA is a TG with a unique start state and a property of having single letter as label.

Overview of Previous Lesson(s) Over View  Algorithm for converting RE to an NFA.  The algorithm is syntax- directed, it works recursively up the parse.

Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=

Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.

UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.

1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.

Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.

using Deterministic Finite Automata & Nondeterministic Finite Automata

Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.

1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.

LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:

Deterministic Finite Automata Nondeterministic Finite Automata.

COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.

June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.

Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia

Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.

Department of Software & Media Technology

WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.

Finite automate.

Lecture 2 Lexical Analysis

Chapter 3 Lexical Analysis.

Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.

Lexical analysis Finite Automata

Introduction to Lexical Analysis

Two issues in lexical analysis

Recognizer for a Language

Review: Compiler Phases:

4b Lexical analysis Finite Automata

Chapter 3. Lexical Analysis (2)

4b Lexical analysis Finite Automata

Lecture 5 Scanning.

CSc 453 Lexical Analysis (Scanning)

Presentation transcript:

Overview of Previous Lesson(s)

Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.  The first algorithm is useful in a Lex compiler, because it constructs a DFA directly from a regular expression, without constructing an intermediate NFA. 3

Over View..  The second algorithm minimizes the number of states of any DFA, by combining states that have the same future behavior.  The algorithm itself is quite efficient, running in time O(n log n), where n is the number of states of the DFA.  The third algorithm produces more compact representations of transition tables than the standard, two-dimensional table. 4

Over View...  A state of NFA can be declared as important if it has a non-ɛ out- transition.  NFA has only one accepting state, but this state, having no out- transitions, is not an important state.  By concatenating a unique right endmarker # to a regular expression r, we give the accepting state for r a transition on #, making it an important state of the NFA for (r) #. 5

Over View...  firstpos(n) is the set of positions in the sub-tree rooted at n that correspond to the first symbol of at least one string in the language of the sub-expression rooted at n.  lastpos(n) is the set of positions in the sub-tree rooted at n that correspond to the last symbol of at least one string in the language of the sub expression rooted at n. 6

Over View...  followpos(p), for a position p, is the set of positions q in the entire syntax tree such that there is some string x = a 1 a 2... a n in L((r)#) such that for some i, there is a way to explain the membership of x in L((r)#) by matching a i to position p of the syntax tree and a i+1 to position q 7

Over View...  nullable, firstpos, and lastpos can be computed by a straight forward recursion on the height of the tree. 8

Over View...  Two ways that a position of a regular expression can be made to follow another.  If n is a cat-node with left child C 1 and right child C 2 then for every position i in lastpos(C 1 ), all positions in firstpos(C 2 ) are in followpos(i).  If n is a star-node, and i is a position in lastpos(n), then all positions in firstpos(n) are in followpos(i). 9

Over View...  Ex.DFA for the regular expression r = (a|b)*abb  Putting together all previous steps: Augmented Syntax Tree r = (a|b)*abb# Nullable is true for only star node firstpos & lastpos are showed in tree followpos are: 10

Over View...  Start state of D = A = firstpos(rootnode) = {1,2,3}  Then we compute Dtran[A, a] & Dtran[A, b]  Among the positions of A, 1 and 3 corresponds to a, while 2 corresponds to b.  Dtran[A, a] = followpos(1) U followpos(3) = { l, 2, 3, 4}  Dtran[A, b] = followpos(2) = {1, 2, 3}  State A is similar, and does not have to be added to Dstates.  B = {I, 2, 3, 4 }, is new, so we add it to Dstates.  Proceed to compute its transitions.. 11

12

Contents  Optimization of DFA-Based Pattern Matchers  Important States of an NFA  Functions Computed From the Syntax Tree  Computing nullable, firstpos, and lastpos  Computing followups  Converting a RE Directly to DFA  Minimizing the Number of States of DFA  Trading Time for Space in DFA Simulation  Two dimensional Table  Terminologies 13

Minimizing DFA States  Following FA accepts the language of regular expression (aa + b) * ab(bb) *  Final states are colored yellow while rejecting states are blue. 14

Minimizing DFA States..  Closer examination reveals that states s 2 and s 7 are really the same since they are both final states and both go to s 6 under the input b and both go to s 3 under an a.  So, why not merge them and form a smaller machine?  In the same manner, we could argue for merging states s 0 and s 5.  Merging states like this should produce a smaller automaton that accomplishes exactly the same task as our original one. 15

Minimizing DFA States...  From these observations, it seems that the key to making finite automata smaller is to recognize and merge equivalent states.  To do this, we must agree upon the definition of equivalent states. Two states in a finite automaton M are equivalent if and only if for every string x, if M is started in either state with x as input, it either accepts in both cases or rejects in both cases.  Another way to say this is that the machine does the same thing when started in either state 16

Minimizing DFA States...  Two questions remain.  First, how does one find equivalent states ?  Exactly how valuable is this information?  For a deterministic finite automaton M, the minimum number of states in any equivalent deterministic finite automaton is the same as the number of equivalence groups of M's states.  Equivalent states go to equivalent states under all inputs. 17

Minimizing DFA States...  Now we know that if we can find the equivalence states (or groups of equivalent states) for an automaton, then we can use these as the states of the smallest equivalent machine.  Ex Automaton 18

Minimizing DFA States...  Let us first divide the machine's states into two groups: Final and Non-Final states.  These groups are: Final states = A = {s 2, s 7 } Non Final States = B = {s 0, s 1, s 3, s 4, s 5, s 6 }  Note that these are equivalent under the empty string as input. 19

Minimizing DFA States...  Now we will find out if the states in these groups go to the same group under inputs a and b  The states of group A both go to states in group B under both inputs.  Things are different for the states of group B. 20

Minimizing DFA States...  The following table shows the result of applying the inputs to these states.  For example, the input a leads from s 1 to s 5 in group B and input b leads to to s 2 in group A.  Looking at the table we find that the input b helps us distinguish between two of the states (s1 and s6) and the rest of the states in the group since it leads to group A for these two instead of group B. 21

Minimizing DFA States...  The states in the set {s 0, s 3, s 4, s 5 } cannot be equivalent to those in the set {s 1, s 6 } and we must partition B into two groups.  Now we have the groups: A = {s 2, s 7 }, B = { s 0, s 3, s 4, s 5 }, C = { s 1, s 6 }  The next examination of where the inputs lead shows us that s 3 is not equivalent to the rest of group B.  We must partition again. 22

Minimizing DFA States...  Continuing this process until we cannot distinguish between the states in any group by employing our input tests, we end up with the groups: A = {s 2, s 7 }, B = {s 0, s 4, s 5 }, C = {s 1 }, D = {s 3 }, E = { s 6 } In view of the above theoretical definitions and results, it is easy to argue that all of the states in each group are equivalent because they all go to the same groups under the inputs a and b. 23

Minimizing DFA States...  Building the minimum state finite automaton is now rather straightforward.  We merely use our groups as states and provide the proper transitions. 24

Minimizing DFA States...  State Minimization Algorithm: 25

Trading Time for Space in DFA  The simplest and fastest way to represent the transition function of a DFA is a two-dimensional table indexed by states and characters.  Given a state and next input character, we access the array to find the next state and any special action we must take, e.g. returning a token to the parser.  Since a typical lexical analyzer has several hundred states in its DFA and involves the ASCII alphabet of 128 input characters, the array consumes less than a megabyte. 26

Trading Time for Space in DFA..  Compilers are also appearing in very small devices, where even a megabyte of storage may be too much.  For such situations, there are many methods that can be used to compact the transition table.  For instance, we can represent each state by a list of transitions - that is, character-state pairs - ended by a default state that is to be chosen for any input character not on the list. 27

Two dimensional Table  A more subtle data structure that allows us to combine the speed of array access with the compression of lists with defaults.  A structure of four arrays: 28

Two dimensional Table..  The base array is used to determine the base location of the entries for state s, which are located in the next and check arrays.  The default array is used to determine an alternative base location if the check array tells us the one given by base[s] is invalid.  To compute nextState(s,a) the transition for state s on input a, we examine the next and check entries in location l = base[s] +a  Here character a is treated as an integer.  Range 0 to

Two dimensional Table...  If check[l] = s then this entry is valid, and the next state for state s on input a is next[l]  If check[l] ≠ s then we determine another state t = default[s] and repeat the process as if t were the current state.  Function nextState 30

Terminologies  Tokens  The lexical analyzer scans the source program and produces as output a sequence of tokens, which are normally passed, one at a time to the parser.  Some tokens may consist only of a token name while others may also have an associated lexical value that gives information about the particular instance of the token that has been found on the input.  Lexemes  Each time the lexical analyzer returns a token to the parser, it has an associated lexeme - the sequence of input characters that the token represents. 31

Terminologies..  Patterns  Each token has a pattern that describes which sequences of characters can form the lexemes corresponding to that token.  The set of words, or strings of characters, that match a given pattern is called a language.  Buffering  Because it is often necessary to scan ahead on the input in order to see where the next lexeme ends, it is usually necessary for the lexical analyzer to buffer its input. 32

Terminologies...  Regular Expressions  These expressions are commonly used to describe patterns.  Regular expressions are built from single characters, using union, concatenation, and the Kleene closure, or any-number-of, operator.  Regular Definitions  Complex collections of languages, such as the patterns that describe the tokens of a programming language, are often defined by a regular definition, which is a sequence of statements that each define one variable to stand for some regular expression.  The regular expression for one variable can use previously defined variables in its regular expression. 33

Thank You