Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scanner 2015.03.16. Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.

Similar presentations


Presentation on theme: "Scanner 2015.03.16. Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the."— Presentation transcript:

1 Scanner 2015.03.16

2 Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the program well-formed (semantically) ? Build an IR version of the code for the rest of the compiler The front end deals with form (syntax) & meaning (semantics) Source code Front End Errors Machine code Back End IR

3 The Front End Implementation Strategy Source code Scanner IR Parser Errors tokens ScanningParsing Specify Syntaxregular expressions context-free grammars Implement Recognizer deterministic finite automaton push-down automaton Perform WorkActions on transitions in automaton

4 The Front End Why separate the scanner and the parser? Scanner classifies words Parser constructs grammatical derivations Parsing is harder and slower Separation simplifies the implementation Scanners are simple Scanner leads to a faster, smaller parser token is a pair stream of characters Scanner IR + annotations Parser Errors stream of tokens microsyntaxsyntax Scanner is only pass that touches every character of the input.

5 Scanner Generator Why study automatic scanner construction? Avoid writing scanners by hand Harness the theory from classes like COMP 481 Goals: To simplify specification & implementation of scanners To understand the underlying techniques and technologies Comp 412, Fall 20105 Scanner Generator specifications Scanner source codeparts of speech & words Specifications written as “regular expressions” Represent words as indices into a global table tables or code design time compile time

6 Strings and Languages Alphabet An alphabet  is a finite set of symbols (characters) String A string is a finite sequence of symbols from    s  denotes the length of string s   denotes the empty string, thus  = 0 Language A language is a countable set of strings over some fixed alphabet   Abstract Language Φ  {ε}

7 String Operations Concatenation ( 連接 ) The concatenation of two strings x and y is denoted by xy Identity ( 單位元素 ) The empty string is the identity under concatenation.  s = s  = s Exponentiation Define s 0 =  s i = s i-1 s for i > 0 By Define s 1 = s s 2 = ss

8 Language Operations Union L  M = { s  s  L or s  M } Concatenation L M = { xy  x  L and y  M} Exponentiation L 0 = {  } L i = L i-1 L Kleene closure ( 封閉包 ) L * = ∪ i=0,…,  L i Positive closure L + = ∪ i=1,…,  L i

9 Regular Expressions A convenient means of specifying certain simple sets of strings. We use regular expressions to define structures of tokens. Tokens are built from symbols of a finite vocabulary. Regular Sets The sets of strings defined by regular expressions.

10 Regular Expressions Basis symbols:  is a regular expression denoting language L(  ) = {  } a   is a regular expression denoting L(a) = {a} If r and s are regular expressions denoting languages L(r) and M(s) respectively, then r  s is a regular expression denoting L(r)  M(s) rs is a regular expression denoting L(r)M(s) r * is a regular expression denoting L(r) * (r) is a regular expression denoting L(r) A language defined by a regular expression is called a regular set.

11 Operator Precedence OperatorPrecedenceAssociative *highestleft concatenationSecondleft |lowestleft

12 Algebraic Laws for Regular Expressions LawDescription r | s = s | r| is commutative r | ( s | t ) = ( r | s ) | t| is associative r(st) = (rs)tconcatenation is associative r(s|t) = rs | rt (s|t)r = sr | tr concatenation distributes over | εr = rε = rε is the identity for concatenation r* = ( r |ε)*ε is guaranteed in a closure r** = r** is idempotent

13 Examples of Regular Expressions Identifiers : Letter  (a|b|c| … |z|A|B|C| … |Z) Digit  (0|1|2| … |9) Identifier  Letter ( Letter | Digit ) * Numbers : Integer  (+|-|  ) (0| (1|2|3| … |9)(Digit * ) ) Decimal  Integer. Digit * Real  ( Integer | Decimal ) E (+|-|  ) Digit * Complex  ( Real, Real ) Numbers can get much more complicated! 13 underlining indicates a letter in the input stream shorthand for (a|b|c| … |z|A|B|C| … |Z) ( (a|b|c| … |z|A|B|C| … |Z) | (0|1|2| … |9) ) * Using symbolic names does not imply recursion

14 Finite Automata Finite Automata are recognizers. FA simply say “Yes” or “No” about each possible input string. A FA can be used to recognize the tokens specified by a regular expression Use FA to design of a Lexical Analyzer Generator Two kind of the Finite Automata Nondeterministic finite automata (NFA) Deterministic finite automata (DFA) Both DFA and NFA are capable of recognizing the same languages.

15 NFA Definitions NFA = { S, , , s 0, F } A finite set of states S A set of input symbols Σ  input alphabet, ε is not in Σ A transition function    : S    S A special start state s 0 A set of final states F, F  S (accepting states)

16 Transition Graph for FA is a state is a transition is a the start state is a final state

17 Example 012 3 a bc c a This machine accepts abccabc, but it rejects abcab. This machine accepts (abc + ) +.

18 Transition Table 0 start a 1 3 2 bb a b STATEabε 0{0, 1}{0}- 1-{2}- 2-{3}- 3--- The mapping  of an NFA can be represented in a transition table  (0, a ) = {0,1}  (0, b ) = {0}  (1, b ) = {2}  (2, b ) = {3}

19 DFA DFA is a special case of an NFA There are no moves on input ε For each state s and input symbol a, there is exactly one edge out of s labeled a. Both DFA and NFA are capable of recognizing the same languages.

20 NFA vs DFA 0 start a 1 3 2 bb a b S = {0,1,2,3}  = { a, b } s 0 = 0 F = {3} 012 3 abb b a a a (a | b)*abb

21 Concept


Download ppt "Scanner 2015.03.16. Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the."

Similar presentations


Ads by Google