Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lexing Discrete Mathematics and Its Applications Baojian Hua

Similar presentations


Presentation on theme: "Lexing Discrete Mathematics and Its Applications Baojian Hua"— Presentation transcript:

1 Lexing Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn

2 Syntax Tree A systematic way to put some program into memory data type definition + a bunch of functions programmer explicit calls them tedious and error-prone But we write programs in ASCII form, so how can we construct the tree automatically? A technique called (automatic) parsing A program clever enough to do this automatically

3 Roadmap Lexer: eat ascii sequence, emit token sequence Parser: eat token sequence, emit abstract syntax trees other part: later in this course LexerParser stream of characters stream of tokens abstract syntax other part

4 What ’ s a Program? // Rethink the grammar we discussed earlier: stm -> id = exp; | id = exp; stm exp -> exp + exp | exp - exp | num | id | (exp) // Nonterminals: stm exp // Terminals: = ; + - ( ) num id

5 Program as We Saw Sample program: xx0 = 3+4; yy1 = xx0-(1+2); Draw the left-most derivation: stm => id = exp; stm => … => xx0 = 3+4; yy1 = xx0-(1+2);

6 Program as We Saw Essentially a sequence of ascii: x x 0 \blank = \blank 3 + 4 ; \n y y 1 \blank = \blank x x 0 - ( 1 + 2 ) ; \n EOF Lexer want to transform it to: xx0 = 3 + 4 ; yy1 = xx0 - ( 1 + 2 ) ; EOF xx0 = 3+4; yy1 = xx0-(1+2);

7 Tokens Terminals appear in any grammar As the output of a lexer Can be classified into several kinds: identifiers, numbers, key words, operators, … How to represent tokens? As strings, ok, but not so flexible As we later want to check its kind As inductive definitions!

8 Tokens // Terminals: = ; + - ( ) num id token -> = | ; | + | - | ( | ) | Num (int_num) | Id (id_name)

9 Token Interface #ifndef TOKEN_H #define TOKEN_H struct token { enum tokenKind {ASSIGN, SEMICOLON, ADD, SUB, LPAREN, RPAREN, NUM, ID, EOF} kind; struct { int num; str id; } value; }; struct token newToken (enum tokenKind kind, int num, str id); #endif

10 Token Implementation struct token newToken (enum tokenKind kind, int num, str id) { struct token t; t.kind = kind; switch (kind) { case NUM: t.value.num = num; break; case ID: t.value.id = id; break; } return t; }

11 Split Input Transform: x x 0 \blank = \blank 3 + 4 ; \n y y 1 \blank = \blank x x 0 - ( 1 + 2 ) ; \n EOF To data structures: ID(xx0) ASSIGN NUM(3) ADD NUM(4) SEMICOLON ID(yy1) ASSIGN ID(xx0) SUB LPAREN NUM(1) ADD NUM(2) RPAREN SEMICOLON EOF xx0 = 3+4; yy1 = xx0-(1+2);

12 First-char Matching Algorithm struct token getToken () { char c = fileGetChar (“test.c”); switch (c) { case ‘a’-’z’, ‘A’-’Z’, ‘_’: // must be identifiers or key words getId (); break; case ‘0’-’9’: // must be an integer getNum (); break; … // similar }

13 First-char Matching Algorithm // Sub-functions are simple: struct token getId () { struct token t; str id; char c = fileGetChar (“test.c”); while (c==‘a’-’z’, ‘A’-’Z’, ‘_’, ‘0’-’9’) strAppendChar (id, c); c = fileGetChar (“test.c”); } t = newToken (ID, 0, id); // or key words? return t; }

14 Summary A lexer: read sequence of ascii as input emit sequence of tokens as output Hand-written lexer is conceptual simple, but still tedious and error-prone Lucky there are many automatic tools doing this: flex, sml-lex, ocamllex, JLex, JFlex, C#Flex, …


Download ppt "Lexing Discrete Mathematics and Its Applications Baojian Hua"

Similar presentations


Ads by Google