Parsing & Scanning Lecture 2 COMP 202 10/25/2004 Derek Ruths Office: DH Rm #3010.

Slides:



Advertisements
Similar presentations
JavaCUP JavaCUP (Construct Useful Parser) is a parser generator
Advertisements

Grammar vs Recursive Descent Parser
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
SIGCSE 2005, St. Louis, MO Design Patterns for Recursive Descent Parsing Dung Nguyen, Mathias Ricken & Stephen Wong Rice University.
Abstract Syntax Mooly Sagiv html:// 1.
1 JavaCUP JavaCUP (Construct Useful Parser) is a parser generator Produce a parser written in java, itself is also written in Java; There are many parser.
Exercise: Balanced Parentheses
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Context-Free Grammars Lecture 7
Environments and Evaluation
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
CS 280 Data Structures Professor John Peterson. Lexer Project Questions? Must be in by Friday – solutions will be posted after class The next project.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
ParsingParsing. 2 Front-End: Parser  Checks the stream of words and their parts of speech for grammatical correctness scannerparser source code tokens.
Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.
Parsing & Scanning Lecture 1 COMP /25/2004 Derek Ruths Office: DH Rm #3010.
The Interpreter Pattern. Defining the Interpreter Intent Given a language, define a representation for its grammar along with an interpreter that uses.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
1 Abstract Syntax Tree--motivation The parse tree –contains too much detail e.g. unnecessary terminals such as parentheses –depends heavily on the structure.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Language Translators - Lee McCluskey LANGUAGE TRANSLATORS: WEEK 21 LECTURE: Using JavaCup to create simple interpreters
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
1 Parsers and Grammar. 2 Categories of Grammar Rules  Declarations or definitions. AttributeDeclaration ::= [ final ] [ static ] [ access ] datatype.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
CSC 4181 Compiler Construction
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.
Introduction to Parsing
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Parsing 2 of 4: Scanner and Parsing
Compiler Design (40-414) Main Text Book:
COMP261 Lecture 18 Parsing 3 of 4.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Programming Languages Translator
Textbook:Modern Compiler Design
Chapter 3 – Describing Syntax
CS 614: Theory and Construction of Compilers
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
PROGRAMMING LANGUAGES
4 (c) parsing.
Parsing Techniques.
Basic Program Analysis: AST
Syntax Analysis Sections :.
Top-Down Parsing CS 671 January 29, 2008.
Parsing & Scanning Lecture 2
CSE401 Introduction to Compiler Construction
Lecture 7: Introduction to Parsing (Syntax Analysis)
Programming Languages
R.Rajkumar Asst.Professor CSE
Programming Language Syntax 5
Introduction to Parsing
Introduction to Parsing
Design Patterns for Recursive Descent Parsing
The Recursive Descent Algorithm
High-Level Programming Language
Compiler Construction
Predictive Parsing Program
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 2, 09/04/2003 Prof. Roy Levow.
Faculty of Computer Science and Information System
Presentation transcript:

Parsing & Scanning Lecture 2 COMP /25/2004 Derek Ruths Office: DH Rm #3010

High-Level View Scanner Text, Source Code, Speech Parser Structured Representation (e.g. Abstract Syntax Tree)

Overview Lecture 1: Intro to scanning Intro to parsing Basics of building a scanner in Java Lab: Implementing a scanner and simple parser Lecture 2: Basic Parsing Theory Design of an Object-Oriented Parser

Parsing: Organizing Things Parser, Tokens x + 3 Abstract Syntax Tree Grammar

What is Grammar? Goal of Grammar: To derive syntactical structure from order and type of tokens. A: You are running.Q: Are you running? Two very similar statements with very different meanings 1) “you” and “are” switch places - token order 2) “.” instead of “?” - token type

Arrangements of Tokens “mmmm... 3 x +... yes” ~ Yoda “Syntax Error!” ~ Parser Not all combinations and orderings of token types are valid.

Syntax vs. Semantics Syntax: the structure of a set of tokens Semantics: the meaning of a set of tokens and a syntactical structure. x = “Hello”... y = 3 + x Syntax: VAR EQUALS NUM PLUS VAR Semantics: y = 3 + “Hello”Invalid semantics Local property Global property

Extension of Calc- Grammar (from lab) Expression: 3 + x + 2Production Rules 1. Expr → Fact 2. Expr → Fact Op Expr 3. Fact → NUM 4. Fact → ID 5. Op → PLUS 6. Op → MINUS How many tokens must be read to decide on the production rule to use? Look Ahead Terminals Non-Terminals LL(1) grammars have only one production. Derivation: Apply rules to transform Expr into a specific token sequence.

Designing Context- Free Grammars Design LL(1) grammars for these languages: A list of names A list of alternating words and numbers The Extended Calc-Grammar with assignments (e.g. “x = y + 2”)

Parsing: Organizing Things Parser, Tokens x + 3 Abstract Syntax Tree Grammar

Representing Syntax Structure Production RuleSyntax Tree Expr → Fact Op Expr Expr FactOpExpr NUM Fact → NUM PLUS Op → PLUS Fact Expr → Fact ID Fact → ID

Implementing a Parser the OO-Way Scanner Text Parser Structured Representation E :: F E1 E1 :: E1a | empty E1a :: + E F :: F1 | F2 F1 :: num F2 :: id

Why use OOP for Parsing? Easy to add a new token Easy to add a new production rule Small change in design Small change in codeLocal change in designLocal change in code

OOP Parsing: Key Ideas Intelligent Token/Terminal Ignorant Non-Terminal Decides production rule to use Only knows its production rules Everything is an object

Everything is an Object E :: F E’ E’ :: empty | + E F :: num | id

Intelligent Tokens Token is given a list of candidate productions Token choses a production and executes it Production creates the correct Non-Terminal But the NT might need to construct another NT! E :: F E1 E1 :: E1a | empty E1a :: + E F :: F1 | F2 F1 :: num F2 :: id

Non-Terminal Factories public class EFact extends ATVFactory { private FFact _fFact; private ITokVisitor _parseF; private E1Fact _e1Fact; private ITokVisitor _parseE1; public EFact(ITokenizer tkz, FFact fFact, E1Fact e1Fact) { _parseF = _fFact.makeVisitor(); _parseE1 = _e1Fact.makeVisitor(); } public ITokVisitor makeVisitor() { return new ITokVisitor() { public Object defaultCase(AToken host, Object param) { return new E((F) host.execute(_parseF, param), (E1) nextToken().execute(_parseE1, param)); } }; } public ITokVisitor makeChainedVisitor(final ITokVisitor successor) { return new ITokVisitor() { public Object defaultCase(AToken host, Object inp) { Object o = host.execute(_fFact.makeChainedVisitor(successor), inp); return (o instanceof F) ? new E((F) o, (E1) nextToken().execute(_parseE1, inp)) : o; } }; } } E :: F E1 E1 :: E1a | empty E1a :: + E F :: F1 | F2 F1 :: num F2 :: id

Intelligent Tokens Token is given a list of candidate productions Token choses a production and executes it Production creates the correct Non-Terminal How can we choose a production? But the NT might need to construct another NT! E :: F E1 E1 :: E1a | empty E1a :: + E F :: F1 | F2 F1 :: num F2 :: id

Choosing a Production: A New Visitor Pattern public abstract class AToken { public abstract Object execute(ITokViz algo, Object param); } public interface ITokViz { public Object defaultCase(AToken host, Object param); } public class NumToken extends AToken { public static interface INumVisitor extends ITokViz { public Object numCase(NumToken host, Object param); } public static abstract class AChainViz implements INumVisitor { private ITokViz _successor; protected AChainViz(ITokViz succ) { _successor = succ; } public Object defaultCase(AToken host, Object param) { return host.execute(_successor, param); } } public Object execute(ITokViz algo, Object param) { return (algo instanceof INumVisitor)? ((INumVisitor) algo).numCase(this, param): algo.defaultCase(this, param); } A list of visitors Handles variant hosts Easy to add tokens! Easy to add productions!

Tying it together: Chaining Productions public class FFact extends ATVFactory { private F1Fact _f1Fact; private F2Fact _f2Fact; public FFact(ITokenizer tkz, F1Fact f1Fact, F2Fact f2Fact) { super(tkz); _f1Fact = f1Fact; _f2Fact = f2Fact; } public ITokVisitor makeVisitor() { return _f1Fact.makeChainedVisitor(_f2Fact.makeVisitor()); } public ITokVisitor makeChainedVisitor(ITokVisitor successor) { return _f1Fact.makeChainedVisitor(_f2Fact.makeChainedVisitor(successor)); } } E :: F E1 E1 :: E1a | empty E1a :: + E F :: F1 | F2 F1 :: num F2 :: id Why will there only be one visitor per token type for any given chain of productions?

Ignorant Non- Terminals E → F E’ E :: F E’ E’ :: empty | + E F :: num | id Our LL(1) Grammar num id E’ empty +

OO Parsing Review Intelligent Tokens Each token provides a visitor type and a way to make that visitor belong to a list Ignorant Non-Terminals (NT) Only knows its production: its factory produces a visitor that represents its production Can’t build other NT types: each NT has a factory that provides a token visitor that can construct the NT.