Chapter 2 A Simple Compiler

Slides:



Advertisements
Similar presentations
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Advertisements

YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
CPSC Compiler Tutorial 9 Review of Compiler.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
1 Chapter 5 Compilers Source Code (with macro) Macro Processor Expanded Code Compiler or Assembler obj.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Terminology l Statement ( 敘述 ) »declaration, assignment containing expression ( 運算式 ) l Grammar ( 文法 ) »a set of rules specify the form of legal statements.
Yu-Chen Kuo1 Chapter 2 A Simple One-Pass Compiler.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 2 A Simple Compiler
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
1 1 Chapter 2 A Simple Compiler Prof Chung. 1 NTHU SSLAB7/2/2015.
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
Compiler Construction 1. Objectives Given a context-free grammar, G, and the grammar- independent functions for a recursive-descent parser, complete the.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
1 Chapter 2 A Simple Compiler. 2 Outlines 2.1 The Structure of a Micro Compiler 2.2 A Micro Scanner 2.3 The Syntax of Micro 2.4 Recursive Descent Parsing.
Chapter 7 Semantic Processing Prof Chung /9/17.
Compiler1 Chapter V: Compiler Overview: r To study the design and operation of compiler for high-level programming languages. r Contents m Basic compiler.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
1 Chapter 3 Describing Syntax and Semantics. 3.1 Introduction Providing a concise yet understandable description of a programming language is difficult.
Context-Free Grammars
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
1 Chapter 1 Introduction. 2 Outlines 1.1 Overview and History 1.2 What Do Compilers Do? 1.3 The Structure of a Compiler 1.4 The Syntax and Semantics of.
CPS 506 Comparative Programming Languages Syntax Specification.
D Goforth COSC Translating High Level Languages.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Introduction to Compiling
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
CSC 4181 Compiler Construction
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Syntax and Semantics Structure of programming languages.
Programming Languages Translator
R.Rajkumar Asst.Professor CSE
Context–free Grammar CFG is also called BNF (Backus-Naur Form) grammar
Presentation transcript:

Chapter 2 A Simple Compiler

Outline The Structure of a Micro Compiler A Micro Scanner The Syntax of Micro Recursive Descent Parsing Translating Micro

Micro Micro: a very simple language Only integers No declarations Variables consist of A..Z, 0..9, and at most 32 characters long. Comments begin with -- and end with end-of-line. Three kinds of statements: assignments, e.g., a := b + c read(list of ids), e.g., read(a, b) write(list of exps), e.g., write(a+b) Begin, end, read, and write are reserved words. Tokens may not extend to the following line.

The Structure of a Micro compiler One-pass type, no explicit intermediate representations used See P. 9, Fig. 1.3 The interface Parser is the main routine. Parser calls scanner to get the next token. Parser calls semantic routines at appropriate times. Semantic routines produce output in assembly language. A simple symbol table is used by the semantic routines.

The structure of a Syntax-Directed Compiler Source Program Tokens Syntactic Scanner Parser Semantic Routines Structure (Character Stream) Intermediate Representation Optimizer Symbol and Attribute Tables (Used by all Phases of The Compiler) Code Generator The structure of a Syntax-Directed Compiler Target Machine Code

A Micro Scanner The Micro Scanner will be a function of no arguments that returns token values There are 14 tokens. typedef enum token_types { BEGIN, END, READ, WRITE, ID, INTLITERAL, LPAREN, RPAREN, SEMICOLON, COMMA, ASSIGNOP, PLUSOP, MINUSOP, SCANEOF } token; extern token scanner(void);

A Micro Scanner (Cont’d.) The scanner returns the longest string that constitutes a token, e.g., in abcdef ab, abc, abcdef are all valid tokens. The scanner will return the longest one (i.e., abcdef).

Continue: skip one iteration getchar() , isspace(), isalpha(), isalnum(), isdigital() ungetc(): push back one character

A Micro Scanner (Cont’d.) How to handle RESERVED words? Reserved words are similar to identifiers. Two approaches: Use a separate table of reserved words Put all reserved words into symbol table initially. Error recovery during scanning Possible errors include id too long illegal characters cross-line strings missing } for comments (in Ada/CS)

A Micro Scanner (Cont’d.) Provision for saving the characters of a token as they are scanned token_buffer, buffer_char(), clear_buffer(), check_reserved() Handle end of file feof(stdin)

Complete Scanner Function for Micro

The Syntax of Micro Micro's syntax is defined by a context-free grammar (CFG) CFG is also called BNF (Backus-Naur Form) grammar CFG consists of a set of production rules, AB C D Z LHS must be a single nonterminal RHS consists 0 or more terminals or nonterminals LHS RHS

The Syntax of Micro (Cont’d.) Two kinds of symbols Nonterminals Delimited by < and > Represent syntactic structures Terminals Represent tokens E.g. <program>  begin <statement list> end Start or goal symbol  : empty or null string

The Syntax of Micro (Cont’d.) E.g. <statement list>  <statement><statement tail> <statement tail>   <statement tail>  <statement><statement tail>

The Syntax of Micro (Cont’d.) Extended BNF: some abbreviations 1. optional: [ ] 0 or 1 <stmt>  if <exp> then <stmt> <stmt>  if <exp> then <stmt> else <stmt> can be written as <stmt>  if <exp> then <stmt> [ else <stmt> ] 2. repetition: { } 0 or more <stmt list>  <stmt> <tail> <tail>   <tail>  <stmt> <tail> <stmt list>  <stmt> { <stmt> }

The Syntax of Micro (Cont’d.) Extended BNF: some abbreviations 3. alternative: | or <stmt>  <assign> <stmt>  <if stmt> can be written as <stmt>  <assign> | <if stmt> Extended BNF == BNF Either can be transformed to the other. Extended BNF is more compact and readable

The Syntax of Micro (Cont’d.)

The derivation of begin ID:= ID + (INTLITERAL – ID); end

The Syntax of Micro (Cont’d.) A CFG defines a language, which is a set of sequences of tokens Syntax errors & semantic errors A:=‘X’+True;

The Syntax of Micro (Cont’d.) Associativity for operators of equal precedenece left associative a + b + c should be interpreted as ( ( a + b ) + c ). right associative a = b = c should be interpreted as (a = (b = c)) Operator precedence a + b * c should be interpreted as (a + (b * c)).

The Syntax of Micro (Cont’d.) A CFG defines the syntax of a language. Each sentence may have several derivations, depending on the order nonterminals are expanded. Each program has exactly one syntax tree, otherwise the grammar is ambiguous.

A grammar fragment defines such a precedence relationship

With parentheses, the desired grouping can be forced

Recursive Descent Parsing Recognize source statements as language constructs or build the parse tree for the statements bottom-up: operator-precedence parsing, Shift-reduce, LR top-down: recursive-descent parsing and LL There are many parsing techniques. Recursive descent is one of the simplest parsing techniques

Recursive Descent Parsing (Cont’d.) Basic idea Each nonterminal has a parsing procedure For symbol on the RHS : a sequence of matching To Match a nonterminal A Call the parsing procedure of A To match a terminal symbol t Call match(t) match(t) calls the scanner to get the next token. If is it t, everything is correct. If it is not t, we have found a syntax error. If a nonterminal has several productions, choose an appropriate one based on the next input token. Parser is started by invoking system_goal().

System Programming (Chap 5)

System Programming (Chap 5) Each nonterminal symbol in the grammar is associated with a procedure <read> ::= READ (<id-list>) <stmt> ::= <assign> | <read> | <write> | <for> Left recursion <dec-list> ::= <dec> | <dec-list>;<dec> Modification <dec-list> ::= <dec> {;<dec>}

System Programming (Chap 5)

Recursive Descent Parsing (Cont’d.)

處理{<statement>}不出現的情形 next_token() : a function that returns the next token. It does not call scanner(void).

<statement> 必出現一次

Translating Micro Target language: 3-addr code (quadruple) OP A, B, C Note that we did not worry about registers at this time. temporaries: Sometimes we need to hold temporary values. E.g. A+B+C ADD A,B,TEMP&1 ADD TEMP&1,C,TEMP&2

Translating Micro (Cont’d.) Action Symbols The bulk of a translation is done by semantic routine Action symbols can be added to a grammar to specify when semantic processing should take place Be placed anywhere in the RHS of a production Translated into procedure call in the parsing procedures #add corresponds to a semantic routine named add() No impact on the languages recognized by a parser driven by a CFG

Semantic Information Semantic routines need certain information to do their work. These information is stored in semantic records. Each kind of grammar symbol has a semantic record

which involved semantic routines Old parsing procedure P. 37 <expression>  <primary> {<add op> <primary> #gen_infix New parsing procedure which involved semantic routines

Semantic Information (Cont’d.) Subroutines for symbol table and temporaries

Semantic Information (Cont’d.) Semantic routines