Lexing Discrete Mathematics and Its Applications Baojian Hua

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Compiler Baojian Hua Lexical Analysis (II) Compiler Baojian Hua
Abstract Syntax Mooly Sagiv html:// 1.
Abstract Syntax Tree Discrete Mathematics and Its Applications Baojian Hua
CPSC Compiler Tutorial 9 Review of Compiler.
Parsing Discrete Mathematics and Its Applications Baojian Hua
Parsing Discrete Mathematics and Its Applications Baojian Hua
Abstract Syntax Trees Compiler Baojian Hua
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
ML-YACC David Walker COS 320. Outline Last Week –Introduction to Lexing, CFGs, and Parsing Today: –More parsing: automatic parser generation via ML-Yacc.
Parsing Compiler Baojian Hua Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.
Context-Free Grammars Lecture 7
COS 320 Compilers David Walker.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Chapter 2 A Simple Compiler
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Lexical Analysis Compiler Baojian Hua
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Automata and Regular Expression Discrete Mathematics and Its Applications Baojian Hua
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 Chapter 2 A Simple Compiler. 2 Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing.
Prof. Bodik CS 164 Lecture 51 Building a Parser I CS164 3:30-5:00 TT 10 Evans.
LR Parsing Compiler Baojian Hua
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
PART I: overview material
Lab 3: Using ML-Yacc Zhong Zhuang
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lexical Analysis (I) Compiler Baojian Hua
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Syntax Analysis: Create abstract syntax.
Abstract Syntax Trees Compiler Baojian Hua
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
LL(k) Parsing Compiler Baojian Hua
Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University.
Syntax (2).
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
CPSC 388 – Compiler Design and Construction Parsers – Syntax Directed Translation.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
PEPM 2002 Growing Languages with Metamorphic Syntax Macros January 14, 2002 Growing Languages with Metamorphic Syntax Macros Claus Brabrand Michael Schwartzbach.
Announcements/Reading
Programming Languages 2nd edition Tucker and Noonan
A Simple Syntax-Directed Translator
Programming Languages Translator
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Lecture #12 Parsing Types.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 4 Top-Down Parsing Part-1 September 8, 2018
4 (c) parsing.
Syntax-Directed Definition
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Syntax-Directed Translation
Compiler Structures 2. Lexical Analysis Objectives
Presentation transcript:

Lexing Discrete Mathematics and Its Applications Baojian Hua

Syntax Tree A systematic way to put some program into memory data type definition + a bunch of functions programmer explicit calls them tedious and error-prone But we write programs in ASCII form, so how can we construct the tree automatically? A technique called (automatic) parsing A program clever enough to do this automatically

Roadmap Lexer: eat ascii sequence, emit token sequence Parser: eat token sequence, emit abstract syntax trees other part: later in this course LexerParser stream of characters stream of tokens abstract syntax other part

What ’ s a Program? // Rethink the grammar we discussed earlier: stm -> id = exp; | id = exp; stm exp -> exp + exp | exp - exp | num | id | (exp) // Nonterminals: stm exp // Terminals: = ; + - ( ) num id

Program as We Saw Sample program: xx0 = 3+4; yy1 = xx0-(1+2); Draw the left-most derivation: stm => id = exp; stm => … => xx0 = 3+4; yy1 = xx0-(1+2);

Program as We Saw Essentially a sequence of ascii: x x 0 \blank = \blank ; \n y y 1 \blank = \blank x x 0 - ( ) ; \n EOF Lexer want to transform it to: xx0 = ; yy1 = xx0 - ( ) ; EOF xx0 = 3+4; yy1 = xx0-(1+2);

Tokens Terminals appear in any grammar As the output of a lexer Can be classified into several kinds: identifiers, numbers, key words, operators, … How to represent tokens? As strings, ok, but not so flexible As we later want to check its kind As inductive definitions!

Tokens // Terminals: = ; + - ( ) num id token -> = | ; | + | - | ( | ) | Num (int_num) | Id (id_name)

Token Interface #ifndef TOKEN_H #define TOKEN_H struct token { enum tokenKind {ASSIGN, SEMICOLON, ADD, SUB, LPAREN, RPAREN, NUM, ID, EOF} kind; struct { int num; str id; } value; }; struct token newToken (enum tokenKind kind, int num, str id); #endif

Token Implementation struct token newToken (enum tokenKind kind, int num, str id) { struct token t; t.kind = kind; switch (kind) { case NUM: t.value.num = num; break; case ID: t.value.id = id; break; } return t; }

Split Input Transform: x x 0 \blank = \blank ; \n y y 1 \blank = \blank x x 0 - ( ) ; \n EOF To data structures: ID(xx0) ASSIGN NUM(3) ADD NUM(4) SEMICOLON ID(yy1) ASSIGN ID(xx0) SUB LPAREN NUM(1) ADD NUM(2) RPAREN SEMICOLON EOF xx0 = 3+4; yy1 = xx0-(1+2);

First-char Matching Algorithm struct token getToken () { char c = fileGetChar (“test.c”); switch (c) { case ‘a’-’z’, ‘A’-’Z’, ‘_’: // must be identifiers or key words getId (); break; case ‘0’-’9’: // must be an integer getNum (); break; … // similar }

First-char Matching Algorithm // Sub-functions are simple: struct token getId () { struct token t; str id; char c = fileGetChar (“test.c”); while (c==‘a’-’z’, ‘A’-’Z’, ‘_’, ‘0’-’9’) strAppendChar (id, c); c = fileGetChar (“test.c”); } t = newToken (ID, 0, id); // or key words? return t; }

Summary A lexer: read sequence of ascii as input emit sequence of tokens as output Hand-written lexer is conceptual simple, but still tedious and error-prone Lucky there are many automatic tools doing this: flex, sml-lex, ocamllex, JLex, JFlex, C#Flex, …