Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.

Similar presentations


Presentation on theme: "June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2."— Presentation transcript:

1 June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2

2 June 13, 2016 Prof. Abdelaziz Khamis 2 Chapter 2 – Part 2: Topics Implementation of A TINY Scanner Nondeterministic Finite Automaton (NFA) From Regular Expression to DFA  From Regular Expression to NFA  From NFA to DFA  Minimizing DFA

3 June 13, 2016 Prof. Abdelaziz Khamis 3 Implementation of A TINY Scanner TINY Tokens

4 June 13, 2016 Prof. Abdelaziz Khamis 4 Implementation of A TINY Scanner (Continued) DFA of the TINY Scanner

5 June 13, 2016 Prof. Abdelaziz Khamis 5 Implementation of A TINY Scanner (Continued) What happened to reserved words?  Recognize them as identifiers first  Then look them up in a table of reserved words  Linear search (TINY) is bad. Binary search (ordered list) is better. Hash table is even better. Hash table with size 1 buckets (perfect hash function) is best.

6 June 13, 2016 Prof. Abdelaziz Khamis 6 Implementation of A TINY Scanner (Continued) The code that implements the TINY DFA is contained in the scan.h and scan.c files. (see Appendix B, lines: 550-793 ) The implementation uses the doubly nested case analysis. The principle function of the TINY scanner is getToken (lines 674-793), which returns the next token (recognized according to the TINY DFA) in source file. The tokens are defined as an enumerated type in globals.h ( lines 174-186 ) The states of the scanner are also defined as an enumerated type, but within the scanner itself (lines 612-614).

7 June 13, 2016 Prof. Abdelaziz Khamis 7 Implementation of A TINY Scanner (Continued) The only attribute of each token that is computed by the TINY scanner is the lexeme, or string value of the token recognized, and this is placed in tokenString. The scanner makes use of three global variables which are declared in globals.h, and allocated and initialized in main.c : the file variables source and listing, and the integer variable lineno. The procedure reservedLookup ( lines 658-666 ) performs a lookup of reserved words ( lines 649-656 ) after an identifier is recognized by the principle loop of the getToken procedure.

8 June 13, 2016 Prof. Abdelaziz Khamis 8 Implementation of A TINY Scanner (Continued) A flag variable save is used to indicate whether a character is to be added to tokenString ; this is necessary, since white space, comments, and non- consumed lookaheads should not be included. Character input to the scanner is provided by the getNextChar function (lines 627-642), which fetches characters from lineBuf, a 256-character buffer internal to the scanner. The recognition of numbers and identifiers in TINY requires that the transitions to the final state from the states INNUM and INID be non-consuming. This is implemented by the ungetNextChar procedure (lines 644-647)

9 June 13, 2016 Prof. Abdelaziz Khamis 9 Nondeterministic Finite Automaton (NFA) In a typical programming language there are many tokens, and each token will be recognized by its own DFA.  If each of these tokens begins with a different character, then it is easy to tie them together by uniting all of the their start states into a single start state.  If some tokens begin with the same character, such as +, +=, and ++, then we cannot simply draw the following diagram, since it is not a DFA.

10 June 13, 2016 Prof. Abdelaziz Khamis 10 Nondeterministic Finite Automaton (NFA) Instead, we must arrange it so that there is a unique transition to be made in each state, such as in the following diagram In principle, we should be able to combine all the tokens into one giant DFA. But how to turn token descriptions, given as regular expressions, into such a DFA?

11 June 13, 2016 Prof. Abdelaziz Khamis 11 Nondeterministic Finite Automaton (NFA) The simplest algorithm for translating a regular expression into a DFA proceeds via an intermediate construction, in which an NFA is derived from the regular expression, and then the NFA is used to construct an equivalent DFA. Deterministic vs. Non-Deterministic  Deterministic: exactly one new state for a given (state, character) pair  Non-deterministic: 0 or more new states for each (state, character) pair Add  -transitions (move without character) Example 2.10, Page 58.

12 June 13, 2016 Prof. Abdelaziz Khamis 12 From Regular Expression to NFA The construction of an NFA follows the structure of a regular expression. The  -transitions are used to “glue together” the machines of each piece of a regular expression to form a machine that corresponds to the whole expression.  Basic regular expressions (Page 64)  Concatenation (Page 65)  Choice among alternatives (Page 65)  Repetition (Page 66) Examples: 2.12 and 2.13, page 67.

13 June 13, 2016 Prof. Abdelaziz Khamis 13 From NFA to DFA To convert an NFA (M) to its equivalent DFA (M’), we need to:  Eliminate  -transitions This involves the construction of  -closures The  -closure of a single state and a set of states (Page 69)  Eliminate multiple transitions on a single input character A conversion algorithm: (Subset construction) Page 70.  Compute the  -closure of the start state of M. This becomes the start state of M ’.  For this set, and for each subsequent set, S, compute transitions on input characters a as follows: Compute the set S ’ a = { t | for some s in S there is a transition from s to t on a }. Then compute S ” a, the  -closure of S ’ a, this defines a new state in M ’, together with a new transition S a S ” a.  Continue with this process until no new states or transitions are created.  Mark as accepting those states that contain an accepting state of M. Examples: 2.15, 2.16, and 2.17 (Pages: 70-71)

14 June 13, 2016 Prof. Abdelaziz Khamis 14 Minimizing DFA An algorithm to minimize the number of states in a DFA:  Create two states, one consisting of all the accepting states and the other consisting of all the non-accepting states.  Consider the transitions on each character a of the alphabet: If all accepting states have transitions on a to accepting states, then this defines an a-transition from the new accepting state to itself. If all accepting states have transitions on a to non-accepting states, then this defines an a-transition from the new accepting state to the new non-accepting state. If there are two accepting states s and t that have transitions on a that land in different sets, then the set of all accepting states must be split according to where their a-transitions land. If any further sets are split, we must return and repeat the process from the beginning. Continue this process of refining the partitions of states until no further splitting of sets occurs. Examples: 2.18 and 2.19 (Page 74)


Download ppt "June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2."

Similar presentations


Ads by Google