Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:

Similar presentations


Presentation on theme: "1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:"— Presentation transcript:

1 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion 8Interpretation 9Review Supplementary material: Theoretical foundations (Regular expressions)

2 2 Regular Expressions finite state machine is a good “visual” aid –but it is not very suitable as a specification (its textual description is too clumsy) regular expressions are a suitable specification –a more compact way to define a language that can be accepted by an FSM used to give the lexical description of a programming language –define each “token” (keywords, identifiers, literals, operators, punctuation, etc) –define white-space, comments, etc these are not tokens, but must be recognized and ignored

3 3 Example: Pascal identifier Lexical specification (in English): –a letter, followed by zero or more letters or digits Lexical specification (as a regular expression): –letter. (letter | digit)* |means "or".means "followed by“ (dot may be omitted) *means zero or more instances of ( )are used for grouping

4 4 Operands of a regular expression Operands are same as labels on the edges of an FSM –single characters, or –the special character  (the empty string) "letter" is a shorthand for –a | b | c |... | z | A | B | C |... | Z "digit“ is a shorthand for –0 | 1 | 2 | … | 9 sometimes we put the characters in quotes –necessary when denoting |. * ( )

5 5 Precedence of |. * operators. Consider regular expressions: –letter.letter | digit* –letter.(letter | digit)* Regular Expression Operator Analogous Arithmetic Operator Precedence |pluslowest.timesmiddle *exponentiationhighest

6 6 TEST YOURSELF Question 1: Describe (in English) the language defined by each of the following regular expressions: –letter (letter* | digit*) –(letter | _ ) (letter | digit | _ )* –digit* "." digit* –digit digit* "." digit digit*

7 7 TEST YOURSELF Question 2: Write a regular expression for each of these languages: –The set of all C ++ reserved words Examples: if, while, for, class, int, case, char, true, false –C ++ string literals that begin with ” and end with ” and don’t contain any other ” except possibly in the escape sequence \” Example: ”The escape sequence \” occurs in this string” –C ++ comments that begin with /* and end with */ and don’t contain any other */ within the string Example: /* This is a comment * still the same comment */

8 8 Example: Integer Literals An integer literal with an optional sign can be defined in English as: –“(nothing or + or -) followed by one or more digits” The corresponding regular expression is: –(+|-|  ) (digit.digit*) A new convenient operator ‘+’ –same precedence as ‘*’ –digit digit* is the same as –digit + which means "one or more digits"

9 9 Language Defined by a Regular Expression Recall: language = set of strings Language defined by an automaton –the set of strings accepted by the automaton Language defined by a regular expression –the set of strings that match the expression Regular Exp.Corresponding Set of Strings  {""} a{"a"} a.b.c{"abc"} a | b | c{"a", "b", "c"} (a | b | c)*{"", "a", "b", "c", "aa", "ab",..., "bccabb"...}

10 10 Concept of Reg Exp Generating a String Rewrite regular expression until have only a sequence of letters (string) left Example (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 1 (0|1)* 2 (0|1)* 1 2 (0|1)* 1 2 (0|1) (0|1)* 1 2 (0|1) 1 2 0 Replacement Rules 1) r 1 | r 2 ––> r 1 2) r 1 | r 2 ––> r 2 3) r* ––> r r* 4) r* ––> 

11 11 Non–determinism in Generation Different rule applications may yield different final results Example 1 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 1 (0|1)* 2 (0|1)* 1 2 (0|1)* 1 2 (0|1) (0|1)* 1 2 (0|1) 1 2 0 Example 2 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 0 (0|1)* 2 (0|1)* 0 2 (0|1)* 0 2 (0|1) (0|1)* 0 2 (0|1) 0 2 1

12 12 Concept of Language Generated by Reg Exp Set of all strings generated by a regular expression is the language of the regular expression In general, language may be infinite String generated by regular expression language is often called a “token”

13 13 Examples of Languages and Reg Exp  = { 0, 1,. } –(0 | 1) + "." (0 | 1)* | (0 | 1)* "." (0 | 1) +  binary floating point numbers –(0 0)*  even-length all-zero strings –1* (0 1* 0 1*)*  binary strings with even number of zeros  = { a,b,c, 0, 1, 2 } –(a|b|c)(a|b|c|0|1|2)*  alphanumeric identifiers –(0|1|2) +  trinary numbers

14 14 Reg Exp Notational Shorthand R + one or more strings of R: R(R*) R?optional R: (R|  ) [abcd] one of listed characters: (a|b|c|d) [a-z] one character from this range: (a|b|c|d...|z) [^abc] anything but one of the listed chars [^a-z] any one character not from this range

15 15 Equivalence of FSM and Regular Expressions Theorem: –For each finite state machine M, we can construct a regular expression R such that M and R accept the same language. –[proof omitted] Theorem: –For each regular expression R, we can construct a finite state machine M such that R and M accept the same language. –[proof outline follows]

16 16 Regular Expressions to NFSM (1) For each kind of reg exp, define a NFSM –Notation: NFSM for reg exp M M For   For input a a

17 17 Regular Expressions to NFSM (2) For A. B A B  For A | B B A    

18 18 Regular Expressions to NFSM (3) For A* A   

19 19 Example of RegExp -> NFSM conversion Consider the regular expression (1|0)*1 The NFSM is 1 0 1          A B C D E F G H IJ

20 20 Converting NFSM to DFSM Simulate the NFSM Each state of DFSM – is a non-empty subset of states of the NFSM Start state of DFSM – is the set of NFSM states reachable from the NFSM start state using only  -moves Add a transition S a > S’ to DFSM iff –S’ is the set of NFSM states reachable from any state in S after consuming only the input a, considering  -moves as well

21 21 Remarks on converting NFSM to DFSM An NFSM may be in many states at any time How many different states ? If there are N states, the NFSM must be in some subset of those N states How many subsets are there? 2 N = finitely many For example, if N = 5 then 2 N = 32 subsets

22 22 NFSM -> DFSM Example 1 0 1          A B C D E F G H IJ ABCDHI FGHIABCD EJGHIABCD 0 1 0 1 0 1

23 23 TEST YOURSELF Question 3: First convert each of these regular expressions to a NFSM –(a | b |  ) (a | b) –(ab | ba)* (aa | bb) Question 4: Next convert each resulting NFSM to a DFSM


Download ppt "1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:"

Similar presentations


Ads by Google