Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Lesson 8 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Mooly Sagiv and Roman Manevich School of Computer Science
Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing.
9/27/2006Prof. Hilfinger, Lecture 141 Syntax-Directed Translation Lecture 14 (adapted from slides by R. Bodik)
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
LL(1) Parsing LL(1) is a Top Down parsing scheme. Applies productions from goal symbol to derive grammar sentence. First L – Scanner moves from left to.
Context-Free Grammars Lecture 7
Formal Aspects Term 2, Week4 LECTURE: LR “Shift-Reduce” Parsers: The JavaCup Parser-Generator CREATES LR “Shift-Reduce” Parsers, they are very commonly.
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
Chapter 3: Formal Translation Models
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Chapter 2 A Simple Compiler
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
– 1 – CSCE 531 Spring 2006 Lecture 8 Bottom Up Parsing Topics Overview Bottom-Up Parsing Handles Shift-reduce parsing Operator precedence parsing Readings:
1 Chapter 3 Context-Free Grammars and Parsing. 2 Parsing: Syntax Analysis decides which part of the incoming token stream should be grouped together.
Bottom-up parsing Goal of parser : build a derivation
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Lexical and syntax analysis
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
10/13/2015IT 3271 Tow kinds of predictive parsers: Bottom-Up: The syntax tree is built up from the leaves Example: LR(1) parser Top-Down The syntax tree.
LANGUAGE TRANSLATORS: WEEK 17 scom.hud.ac.uk/scomtlm/cis2380/ See Appel’s book chapter 3 for support reading Last Week: Top-down, Table driven parsers.
1 Compiler Construction Syntax Analysis Top-down parsing.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
# 1 CMPS 450 Syntax-Directed Translations CMPS 450 J. Moloney.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CPS 506 Comparative Programming Languages Syntax Specification.
Prof. Necula CS 164 Lecture 8-91 Bottom-Up Parsing LR Parsing. Parser Generators. Lecture 6.
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Bottom-Up Parsing David Woolbright. The Parsing Problem Produce a parse tree starting at the leaves The order will be that of a rightmost derivation The.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Top-Down Parsing.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
How YACC is constructed. How Yacc works To construct a parsing machine for arithmetic expressions, a special case considered to simplify the account of.
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
CS 154 Formal Languages and Computability March 22 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron Mak.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Programming Languages Translator
Bottom-Up Parsing.
Unit-3 Bottom-Up-Parsing.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Predictive Parsing Lecture 9 Wed, Feb 9, 2005.
COMPILER CONSTRUCTION
Presentation transcript:

Parsing and Code Generation Set 24

Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a compiler-compiler program, such as Yacc. To create a parser for a computer language, one constructs a description of the language, which is then employed as input to the compiler-compiler. This description of the language is in the form of a grammar for the language

Grammar 1 1. SENTENCE -> NOUNPHRASE VERB NOUNPHRASE 2. NOUNPHRASE -> the ADJECTIVE NOUN 3. NOUNPHRASE -> the NOUN 4. VERB -> pushed 5. VERB -> helped 6. ADJECTIVE -> pretty 7. ADJECTIVE -> poor 8. NOUN -> man 9. NOUN -> boy 10. NOUN -> cat

Grammar 1 is an example of a context-free grammar (the only kind we will deal with). The grammar consist of 10 productions e.g. production 3 is nounphrase -> the noun Here “nounphrase” is referred to as the lefthand side (lhs) and “the noun” is referred to as the righthand side (rhs)

The set of all lhs’s constitutes the set of nonterminals of the grammar. In this case they are: {SENTENCE, NOUNPHRASE,VERB, ADJECTIVE, NOUN} All the other symbols occurring in the grammar (i.e. in some rhs, but never as any lhs) are the terminals of the grammar. In this case {the,pushed,helped,pretty,poor,…}

The lhs of the first production is called the goal symbol, in this case “sentence”. A derivation of a string in the grammar is a list of strings starting with the goal symbol, in which each string, except the first, is obtained from the preceding one by applying a substitution of one of its symbols using one of the productions as a substitution rule

A string which has a derivation is said to be derivable. Derivable strings that consist entirely of terminal symbols are called sentences of the grammar. E.g. the man helped the poor boy is a sentence of Grammar 1. The set of all sentences of a grammar is called the language defined by the grammar

Grammar 1 (Cont.1) Derivation of the sentence: "the man helped the poor boy“ 1. SENTENCE (goal symbol) 2. ==> NOUNPHRASE VERB NOUNPHRASE (by Rule 1) 3. ==> the NOUN VERB NOUNPHRASE (Rule 3) 4. ==> the man VERB NOUNPHRASE (Rule 8) 5. ==> the man helped NOUNPHRASE 6. ==> the man helped the ADJECTIVE NOUN 7. ==> the man helped the poor NOUN 8. ==> the man helped the poor boy (this derivation shows that "the man helped the poor boy“ is a sentence in the language defined by the grammar.)

Grammar 1 (Cont.2) This derivation may also be represented diagrammatically by a syntax tree:

Typical format of a grammar for a programming language PROGRAM -> PROGRAM STATEMENT PROGRAM -> STATEMENT STATEMENT -> ASSIGNMENT-STATEMENT STATEMENT -> IF-STATEMENT STATEMENT -> DO-STATEMENT... ASSIGNMENT-STATEMENT -> IF-STATEMENT -> DO-STATEMENT ->......

Grammar 2 A simple grammar for arithmetic statements 1. E -> E + T 2. E -> T 3. T -> T * a 4. T -> a

Grammar 2 (Cont.1) Derivation of: a + a * a 1. E Goal Symbol 2. ==> E + T Rule 1 3. ==> E + T * a Rule 3 4. ==> E + a * a Rule 4 5. ==> T + a * a Rule 2 6. ==> a + a * a Rule 4

Grammar 2 (Cont.2) Derivation of: a + a * a written in reverse: 1. a + a * a Given sentential form 2. T + a * a Rule 4 in reverse 3. E + a * a Rule 2 in reverse 4. E + T * a Rule 4 5. E + T Rule 3 in reverse 6. E Rule 1

Syntax Analysis One of the functions of syntax analysis (parsing) is to verify whether the source program is syntactically correct The parser obtains a string of tokens from the scanner and verifies that the string can be generated by the grammar for the source language. (If not, the source program is not syntactically correct.)

There are many approaches to parser construction, including both top-down and bottom-up approaches. The LR parsing method discussed uses a bottom-up approach (generating the derivation in reverse) It constructs a rightmost derivation, in which at any step a production can only be applied to the rightmost nonterminal in the string involved. The method makes use of a parsing machine, which is shown on the next slide, and explained in the slides following

DEFINITIONS. The circles with numbers in them are called states. The labelled directed lines that go from from one state to another are said to represent transitions (as defined three slides on). Thus, e.g. the directed line, labelled with a “T” from state 2 to state 6, is said to represent a transition with respect to T from state 2 to state 6.

The other directed lines, which point from a state to some text involving a production, are said to represent reductions. Thus, e.g. the line pointing down from state 4 to the text: E → T if {*,+, } is said to represent the reduction E → T (substitution of lhs E for rhs T), to made if the next input symbol is a: *, or +, or the end of the source code, as represented by the symbol

LR Parsers -1 Print out the previous slide, and the example of it ’ s use 5 slides on, and follow this explanation along with the example. An LR Parser uses an input buffer of the remaining source code to be read, a symbol stack, a state no. stack, and a parsing machine (represented by a table) to determine what action to take when the next symbol is read from the input buffer of source code. The parser reads tokens (a i ) of the programming language grammar from the input buffer of source code one at a time. Using the combination of the state no. on top of the state stack and the next input symbol, a i, which is next to be read from the remaining source code, the parser consults the parsing machine to determine whether it should perform a transition or reduce action (as defined on the next slide)

LR Parsers -2 When the action determined by the parsing machine is make a transition to state s, the parser pushes the current input symbol a i onto the symbol stack and the new state no. s onto the state stack. s is now the new top of the state no. stack, and a i is the top of the symbol stack, and a i+1 is the new next input symbol. The remaining input to be processed is: a i+1... a n -|) where the symbol -| represents the end of the source file

LR Parsers -3 If the action determined by the action table entry is reduce using production A -> α, the parser performs the following sequence of actions, where r is the no. of symbols in α : Pops r symbols from the symbol stack and r states from the state stack. Consults the action table so as to make a transition with respect to the nonterminal symbol A (the left hand side of the reduce production), and the current state no. now at the top of the state-no stack.

LR Parsers -4 The remaining input remains unchanged (in contrast to transition actions). Some of the reduce actions cause associated code to be generated.

Example The following diagram graphically represents an action table generated for the grammar: E -> E + T | T T -> T * a | a It is called a parsing machine

Using the machine to parse a+a*a Step Number Stack Contents Remaining Input 1 Symbol: empty State: 0 a + a * a -| 2 Symbol: State: a * a -| 3 Symbol: a State: a * a -| 4 Symbol: T State: a * a -| 5 Symbol: E State: a * a -| 6 Symbol: E + State: * a -| 7 Symbol: E + a State: * a -| 8 Symbol: E + T State: a -| 9 Sym : E + T * St: Sym: E + T * a -| 10 State: | 11 Symbol: E + T State: 0 1 E -| 12 ACCEPT -|

Exercise 1. Supply a rightmost top-down derivation for a + a * a. 2. Supply a rightmost top-down derivation for a + a * a, this time without using Symbol Stack

Exercise (Cont.1) Now, look at the derivation given in the parse example in the last second slide. First, cross out all the state numbers and also the end marker, then rewrite that step of the parse in this form: e.g., Stack status: Remaining input Step 7 0 E T 6 + a -| is rewritten as: E + T + a and concatenated together becomes: E + T + a

Exercise (Cont.2) Do this to every step of the derivation, then cross out any duplicates of a given step. Now compare the result with the top-down derivation you obtained above. How are these two sets of results related to each other? Note that the parse provides a "bottom-up" derivation --- which contains the same steps as the top-down derivation but in the reverse order.