Presentation is loading. Please wait.

Presentation is loading. Please wait.

Code Generator Translator Architecture Parser Tokenizer string of characters (source code) string of tokens abstract program string of integers (object.

Similar presentations


Presentation on theme: "Code Generator Translator Architecture Parser Tokenizer string of characters (source code) string of tokens abstract program string of integers (object."— Presentation transcript:

1 Code Generator Translator Architecture Parser Tokenizer string of characters (source code) string of tokens abstract program string of integers (object code)

2 A Tokenizing Machine Another great machine! Ready to Dispense? In (Chars) Out (Tokens)

3 Tokenizing Machine Continued… Ready to Dispense? InOut 3: Tokens come out here 2: Light comes on 1: Characters go in here

4 Tokenizing Machine Continued… Type BL_Tokenizing_Machine_Kernel is modeled by ( buffer : string of character ready_to_dispense : boolean ) constraint … Initial Value (empty_string, false)

5 Tokenizing Machine Continued… Operations m.Insert (ch) m.Dispense (token_text, token_kind) m.Is_Ready_To_Dispense () m.Flush_A_Token (token_text, token_kind) m.Size ()

6 A State-Transition View not ready to dispense ready to dispense Insert Flush_A_Token Insert Size Is_Ready_To_Dispense Dispense

7 Tokenizing BL Programs Token Types KEYWORD IDENTIFIER CONDITION WHITE_SPACE COMMENT ERROR

8 A Very Useful Extension procedure_body Get_Next_Token ( alters Character_IStream& str, produces Text& token_text, produces Integer& token_kind ) { } while (not self.Is_Ready_To_Dispense () and not str.At_EOS ()) { object Character ch; str.Read (ch); self.Insert (ch); } if (self.Is_Ready_To_Dispense ()) { self.Dispense (token_text, token_kind); } else { self.Flush_A_Token (token_text, token_kind); }

9 Another Useful Extension procedure_body Get_Next_Non_Separator_Token ( alters Character_IStream& str, produces Text& token_text, produces Integer& token_kind ) { } self.Get_Next_Token (str, token_text, token_kind); while ((token_kind == WHITE_SPACE) or (token_kind == COMMENT)) { self.Get_Next_Token (str, token_text, token_kind); }

10 How Does Insert Work? buffer: “PROGRAM” ready_to_dispense: false buffer: “PROGRAM ” ready_to_dispense: true #m: m: Here’s another character. ‘ ’

11 The Specification of Insert procedure Insert ( preserves Character ch ) is_abstract; /*! requires self.ready_to_dispense = false ensures self.buffer = #self.buffer * and self.ready_to_dispense = IS_COMPLETE_TOKEN_TEXT (#self.buffer, ch) !*/

12 An Important Math Operation math definition IS_COMPLETE_TOKEN_TEXT ( s: string of character c: character ): boolean is ( s is in OK_STRINGS and s * is not in OK_STRINGS ) or ( is in PREFIX (OK_STRINGS) and s * is not in PREFIX (OK_STRINGS) ) s is a complete “valid” token c can start a “valid” token, but s* doesn’t start a “valid” token

13 Other Math Definitions OK_STRINGS = {s: string of character (IS_KEYWORD (s))} union {s: string of character (IS_IDENTIFIER (s))} union {s: string of character (IS_CONDITION_NAME (s))} union {s: string of character (IS_WHITE_SPACE (s))} union {s: string of character (IS_COMMENT (s))} PREFIX (s_set) = {x: string of character (there exists y: string of character (x * y is in s_set))}

14 PREFIX Examples s_set = {“abc”} PREFIX (s_set) = {“”, “a”, “ab”, “abc”} s_set = {“abc”, “de”} PREFIX (s_set) = {“”, “a”, “ab”, “abc”, “d”, “de”}

15 Tokenizing Machine: Implementation Obvious Representation Text buffer_rep Boolean token_ready Insert (ch)? check if IS_COMPLETE_TOKEN_TEXT (self[buffer_rep], ch), and set self[token_ready] accordingly append ch at end of self[buffer_rep]

16 Tokenizing Machine: Implementation Continued… Dispense (token_text, token_kind)? set token_text to all but the last character of self[buffer_rep] set token_kind to the value of WHICH_KIND (token_text) set self[token_ready] to false

17 Tokenizing Machine: Implementation Continued… How do we “check if IS_COMPLETE_TOKEN_TEXT (self[buffer_rep], ch)”? How do we determine “WHICH_KIND (token_text)”? How do we do these things quickly?

18 Making Decisions Quickly Keep track of the “state” of the buffer by adding one field to the representation: Text buffer_rep Boolean token_ready Integer buffer_state

19 Possible Buffer States How many interestingly different buffer states do you think there may be? Let’s start enumerating them…

20 Buffer States Continued… Initial state (empty buffer) How many states after inserting the first character? ‘B’, ‘D’, ‘E’, ‘I’, ‘P’, ‘T’, ‘W’, ‘n’, ‘r’, ‘t’, identifier (any other letter) white_space (‘ ’, ‘\n’, ‘\t’) comment (‘#’) error (any other character)

21 Buffer States Continued… How many states after inserting the second character? “BE”, “DO”, “EL”, “EN”, “IF”, “IN”, “IS”, “PR”, “TH”, “WH”, “ne”, “ra”, “tr”, identifier (any other id character) white_space (‘ ’, ‘\n’, ‘\t’) comment (any other character but ‘\n’) error (any character that cannot start a new “good” token)

22 A State Transition Diagram: Transitions Out of ‘empty’ Only B D E I P T W n r t identifier white_space comment error empty ‘B’ ‘D’ ‘E’ ‘I’ ‘P’ ‘T’ ‘W’ ‘n’ ‘r’ ‘t’ any other letter ‘ ’,’\n’,’\t’ ‘#’ any other character

23 Structure of Body of Insert case_select (self[buffer_state]) { case empty: // case for buffer = empty_string case B: // case for buffer = “B” case D: // case for buffer = “D” case E: // case for buffer = “E”... case error: // case for buffer holding an error token }

24 A Simplified View Buffer States EMPTY_BS ID_OR_KEYWORD_OR_CONDITION_BS WHITE_SPACE_BS COMMENT_BS ERROR_BS

25 The State Transition Diagram EMPTY_BS ID_OR_KEYWORD_ OR_CONDITION_BS COMMENT_BS WHITE_SPACE_BS ERROR_BS ‘ ’, ‘\n’, ‘\t’ ‘#’ ‘a’..’z’, ‘A’..’Z’ any other character any character except ‘\n’ ‘ ’, ‘\n’, ‘\t’ ‘a’..’z’, ‘A’..’Z’, ‘0’..’9’, ‘-’ any character except ‘a’..’z’, ‘A’..’Z’, ‘ ’, ‘\n’, ‘\t’, ‘#’

26 Useful Private Functions Is_White_Space_Character (ch) Is_Digit_Character (ch) Is_Alphabetic_Character (ch) Is_Identifier_Character (ch) Can_Start_Token (ch) Id_Or_Keyword_Or_Condition (t) Buffer_Type (ch) Token_Kind (bs, str)


Download ppt "Code Generator Translator Architecture Parser Tokenizer string of characters (source code) string of tokens abstract program string of integers (object."

Similar presentations


Ads by Google