Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.

Similar presentations


Presentation on theme: "Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file."— Presentation transcript:

1 Chapter 2 Scanning

2 Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file of characters and dividing it into tokens. Typical examples of tokens are keywords, identifiers, special symbols, as well as multicharacter symbols. The task of scanning is a special case of pattern specification and recognition that based on regular expressions and finite automata. Tokens as logical entities must be clearly distinguished from its value. The string of characters represented by a token is called its string value or lexeme.

3 Dr.Manal AbdulazizCS463 Ch23 The Scanning Process Some tokens have only one lexeme, while other may represent many lexemes. Any value associated to a token is called an attribute, the string value is an example of an attribute. For example, num token may have sting value 123 and also have attribute of the actual value of the token. Scanner collects all the attributes of the token into a single structured data type called token record

4 Dr.Manal AbdulazizCS463 Ch24 Regular Expressions Regular expressions represent patterns of strings of characters. A regular expression r is defined by the set of strings that it matches. This set is called the language generated by the regular expression L (r). The set of legal symbols is called alphabet Σ. Meta character or meta symbols are character that have special meanings

5 Dr.Manal AbdulazizCS463 Ch25 Defining the Regular Expression Basic regular expression: L (a) = {a} from Σ L (ε) = {ε} matches no string L (Φ) = { } contains no string Regular expression operations: –Choice rІs L (rІs) = L (r ) υ L (s) –Concatenation rs –Repetition r* Precedence of operations: repetition concatenation choice. Examples: higher

6 Dr.Manal AbdulazizCS463 Ch26 Expressions of Regular Expressions One or more repetitions: r r* = r + Any character: string A range of characters: [ ] Any character not in the given set: ~ complement ^ Optional expressions: ?

7 Dr.Manal AbdulazizCS463 Ch27 Regular Expressions for Programming Language Programming language tokens fall into several categories: 1.Reserved words or keywords: if, do, while,…. 2.Special symbols: arithmetic operations, assignments, …… It can be single: +, -,, ……or multiple characters: :=, ++,.. 3.Identifiers: sequence of letters and digits beginning with a letter. 4.Literals or constants: numeric constants, string literals, and characters.

8 Dr.Manal AbdulazizCS463 Ch28 Regular Expressions for Programming Language 1.Numbers: digits, decimal numbers, or exponents Nat = [0-9] + Signed nat = (+І-) ? Nat Num = signed nat (“. “ nat)? (E signed nat)? 2.Reserved words and identifiers: reserved = if І while І do І ……. letter = [ a-z A –Z ] digit = [ 0-9 ] identifier = letter ( letter І digit )* 3- Comments: { Pascal }, /* C */, --Ada, …..

9 Dr.Manal AbdulazizCS463 Ch29 Ambiguity To avoid ambiguity, a language definition must give disambiguating rules that will imply which meaning is meant for each case. If, while, do identifier or keyword Using the term reserved word for these words means simply that they cannot be used as an identifier. :=, <>, multiple character symbols Using the principle of longest substring implies that when a string can be a single token or a sequence of several tokens, the single token interpretation is preferred.

10 Dr.Manal AbdulazizCS463 Ch210 Finite Automata Finite automata or finite state machine are mathematical way of describing particular kinds of algorithms. Example: Identifier = letter ( letter І digit )* 1 2 letter digit letter

11 Dr.Manal AbdulazizCS463 Ch211 Finite Automata Circles 1 & 2 represent states, which are locations in the process of the recognition that record how much of the pattern has already been seen. The arrow lines represent transitions that record a change from one state to another upon a match of the character or characters by which they are labeled. State 1 is the start state, or the state at which the recognition process begins. The start state is initiated by drawing an unlabeled arrow coming from nowhere. Accepting states are the states that represent the end of the recognition process, in which we can declare success, and are indicated by drawing a double-line circle in the diagram.

12 Dr.Manal AbdulazizCS463 Ch212 Deterministic Finite Automata DFA is an automata where the next state is uniquely determined by the current state and the current input character. A DFA M consists of an alphabet Σ, a set of states S, a transition function T: S x Σ S, a start state s o Є S, and a set of accepting states A S. The language accepted by M, L(M) is defined to the set of strings of characters c1, c2, ….., cn with ci Є Σ such that there exist states s1 = T(s0,c1), s2 = T(s1,c2),……. Sn = T(sn-1,cn) with sn is an element of A.

13 Dr.Manal AbdulazizCS463 Ch213 Examples statement = letter (letter І digit)* І other Other = ~ letter Other1 = ~ (letter І digit) Start 1 Error 3 Id 2 letter other other1 letter digit

14 Dr.Manal AbdulazizCS463 Ch214 Examples The set of strings that contain exactly one b The set of strings that contain at most one b b ~b b

15 Dr.Manal AbdulazizCS463 Ch215 Examples Nat = [0-9] + Signed nat = (+І-) ? Nat Num = signed nat (“. “ nat)? (E signed nat) nat digit + - Signed nat

16 Dr.Manal AbdulazizCS463 Ch216 Examples digit + -. E + - E num digit Comments { } other Pascal comment

17 Dr.Manal AbdulazizCS463 Ch217 Lookahead and backtracking Scanner should recognize one token at a time and should begin in its start state after each token is recognized, the rest of the statement has to be considered as a lookahead. A typical action when reaching an error state is to either back up in the input (backtracking) or to generate an error token startid finish letter digit Return id [other]

18 Dr.Manal AbdulazizCS463 Ch218 Nondeterministic Automata Each token will be recognized by its own DFA. If these tokens begin with different character, then it is easy to tie them together by uniting all of their start state into a single start state. Example: :=, <=, = : = < = = : < = = = Return assign Return LE Return EQ Return assign Return LE Return EQ

19 Dr.Manal AbdulazizCS463 Ch219 Nondeterministic Automata The second case where there are several tokens begin with the same character Example: Following the previous rule do not yield DFA diagram. < = > [other] Return LE Return NE Return LT

20 Dr.Manal AbdulazizCS463 Ch220 Nondeterministic Automata NFA is quit similar to DFA with expanding the alphabet Σ to include ε. Also the definition of the transition function T is expanded so that each character can lead to more than one state. : = < = = Return assign Return LE Return EQ ε ε ε

21 Dr.Manal AbdulazizCS463 Ch221 Examples 2 13 4 a a ε ε ε r = (a І ε)b* 12 6 4 5 3 10 987 ε ε ε r = (a І c)*b ε ε a c ε ε εε b

22 Dr.Manal AbdulazizCS463 Ch222 Transition Table It is possible to express the DFA as a data structure and then write generic code that will take its actions from the data structure. A simple data structure that is adequate for this purpose is a transition table, or two dimensional array indexed by state and input character that expresses the values of the transition function T. Characters In the alphabet c States sStates representing transitions T(s,c)

23 Dr.Manal AbdulazizCS463 Ch223 Example The equivalent transition table for the DFA of the regular expression r = letter (letter І digit)* letterdigitother 12 2223 3 Input ch state

24 Dr.Manal AbdulazizCS463 Ch224 From Regular Expression to DFA The simplest algorithm for translating a regular expression into a DFA proceeds via an intermediate construction. An NFA is derived from the regular expression, and then the NFA is used to construct an equivalent DFA. There exist an algorithms that translate a regular expression directly into a DFA, but they are more complex and the intermediate construction is also of some interest. Regular expression NFADFAProgram

25 Dr.Manal AbdulazizCS463 Ch225 From Regular Expression to NFA Thomson’s construction uses ε-transitions to glow together the machines of each piece of a regular expression to form a machine that corresponds to the whole expression. Basic regular expression: a ε

26 Dr.Manal AbdulazizCS463 Ch226 From Regular Expression to NFA Concatenation: rs Choice: rІs r ….. s ….. ε r ….. s ….. ε ε ε ε

27 Dr.Manal AbdulazizCS463 Ch227 From Regular Expression to NFA Repetition: r* r ….. ε ε ε ε

28 Dr.Manal AbdulazizCS463 Ch228 Examples r = abІa r = letter (letter І digit)* 6 7 1 8 5432 a ε ε ε b ε ε a 10 1 234 7 8 56 9 letter εε ε ε digit letter ε ε ε ε ε

29 Dr.Manal AbdulazizCS463 Ch229 From NFA to DFA The algorithm used to convert the NFA to its equivalent DFA called subset construction. The algorithm involves: –Eliminating the ε-transitions, which involves constructing ε-closure. –Eliminating multiple transition from a state on a single input character. The ε-closure is the set of all states reachable by ε-transitions from a state or states. Keeping track of the set of states that are reachable by matching a single character eliminates the multiple transition.

30 Dr.Manal AbdulazizCS463 Ch230 Examples r = a* NFA ε –closure: 1= { 1, 2, 4} 2 = {2} 3 = {2, 3, 4} 4 = {4} 1234 ε ε ε aε

31 Dr.Manal AbdulazizCS463 Ch231 Examples The subset construction M for NFA and M for DFA. 1.Compute the ε-closure for the start state of M, this becomes the start state of M. 2.S’ a = {t} transition from s to t on a. 3.Construct ε-closure for every state. 4.Mark as accepting state of M all states that contain accepting states of M. 1,2,42,3,4 aa

32 Dr.Manal AbdulazizCS463 Ch232 Examples r = abІa 1 ={1,2,6} 2 ={2} 3 ={3,4} 4 ={4} 5 ={5,8} 6 ={6} 7 ={7,8} 8 ={8} 6 7 1 8 5432 a ε ε ε b ε ε a ε-closure Transitions: 2 3 6 7 4 5 a a b 3,4,7,8 1,2,6 5,8

33 Dr.Manal AbdulazizCS463 Ch233 Examples r = letter (letter І digit)* 1 ={1} 2 ={2,3} 3 ={3,4,10} 4 ={4,5,7} 5 ={5} 6 ={6,9} 7 ={7} 8 = {8,9} 9 ={4,9,10} 10 ={10} 1 2 5 6 7 8 10 1 234 7 56 9 letter εε ε ε digit letter ε ε ε ε 8 ε-closure ε Transitions letter digit

34 Dr.Manal AbdulazizCS463 Ch234 Examples DFA 1 4,5,6,7,9,10 4,5,7,8,9,10 2,3,4,5,7,10 letter digit letter digitletter digit AB C D

35 Dr.Manal AbdulazizCS463 Ch235 Minimizing Number of States in DFA The subset construction algorithm proceeds as 1.Create two sets: one consisting of all the accepting states and the other consisting of all the nonaccepting states. 2.Consider the transition of each character a of the alphabet: if all accepting states have transitions on a to accepting states, then this defines an a-transition from the accepting state to itself. 3.If there are two accepting states s and t that have transitions on a that land in different sets, then a- transition distinguishes the states s from t. 4.Repeat steps 2 and 3 for all states (new split states). 5.Continue the process of refining the partition of the states of the original DFA into sets until either all sets contain only one element, or no further splitting occurs.

36 Dr.Manal AbdulazizCS463 Ch236 Examples r = letter (letter І digit)* r = (aІε)b* A B,C,D letter digit letter 1 3 2 a b b b 1,2,3 1 2,3 b b a


Download ppt "Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file."

Similar presentations


Ads by Google