UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 3: Lexical and Syntactic Analysis.

Slides:



Advertisements
Similar presentations
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Advertisements

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
1 Foundations of Software Design Lecture 24: Compilers, Lexers, and Parsers; Intro to Graphs Marti Hearst Fall 2002.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Building lexical and syntactic analyzers
1 Languages and Compilers (SProg og Oversættere) Parsing.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
CS 330 Programming Languages 09 / 26 / 2006 Instructor: Michael Eckmann.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Dr. Philip Cannata 1 Lexical and Syntactic Analysis Chomsky Grammar Hierarchy Lexical Analysis – Tokenizing Syntactic Analysis – Parsing Hmm Concrete Syntax.
Syntax and Semantics Structure of programming languages.
1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
Bernd Fischer RW713: Compiler and Software Language Engineering.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Syntax and Semantics Structure of programming languages.
D Goforth COSC Translating High Level Languages Note error in assignment 1: #4 - refer to Example grammar 3.4, p. 126.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
ISBN Chapter 3 Describing Syntax and Semantics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
ISBN Chapter 4 Lexical and Syntax Analysis.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CC410: System Programming Dr. Manal Helal – Fall 2014 – Lecture 12–Compilers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Chapter 3 – Describing Syntax
Programming Languages 2nd edition Tucker and Noonan
Introduction to Parsing
Programming Languages Translator
Lexical and Syntax Analysis
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
Syntactic sugar causes cancer of the semicolon.
Course Overview PART I: overview material PART II: inside a compiler
COMPILER CONSTRUCTION
Presentation transcript:

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 3: Lexical and Syntactic Analysis Fall 2009 Marco Valtorta Syntactic sugar causes cancer of the semicolon. A.Perlis

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Contents 3.1 Chomsky Hierarchy 3.2 Lexical Analysis 3.3 Syntactic Analysis

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering 3.1 Chomsky Hierarchy Regular grammar -- least powerful Context-free grammar (BNF) Context-sensitive grammar Unrestricted grammar

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Regular Grammar Simplest; least powerful Equivalent to: –Regular expression –Finite-state automaton Right regular grammar:   T*, B  N A →  B A → 

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example Integer → 0 Integer | 1 Integer |... | 9 Integer | 0 | 1 |... | 9

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Regular Grammars Left regular grammar: equivalent Used in construction of tokenizers (scanners, lexers) Less powerful than context-free grammars Not a regular language { a ⁿ b ⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Context-free Grammars BNF a stylized form of CFG Equivalent to a pushdown automaton For a wide class of unambiguous CFGs, there are table-driven, linear time parsers

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Context-Sensitive Grammars Production: α → β | α | ≤ | β | α, β  (N  T)* i.e., left-hand side can be composed of strings of terminals and nonterminals

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Undecidable Properties of CSGs Given a string  and grammar G:   L(G) L(G) is non-empty Defn: Undecidable means that you cannot write a computer program that is guaranteed to halt to decide the question for all   L(G).

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Unrestricted Grammar Equivalent to: –Turing machine –von Neumann machine –C++, Java That is, can compute any computable function.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Contents 3.1 Chomsky Hierarchy 3.2 Lexical Analysis 3.3 Syntactic Analysis

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Lexical Analysis Purpose: transform program representation Input: printable Ascii characters Output: tokens Discard: whitespace, comments Defn: A token is a logically cohesive sequence of characters representing a single symbol.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example Tokens Identifiers Literals: 123, 5.67, 'x', true Keywords: bool char... Operators: + - * /... Punctuation: ;, ( ) { }

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Other Sequences Whitespace: space tab Comments // any-char* end-of-line End-of-line End-of-file

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Why a Separate Phase? Simpler, faster machine model than parser 75% of time spent in lexer for non-optimizing compiler Differences in character sets End of line convention differs

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Regular Expressions RegExprMeaning xa character x \xan escaped character, e.g., \n { name }a reference to a name M | NM or N M NM followed by N M*zero or more occurrences of M

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering RegExprMeaning M+One or more occurrences of M M?Zero or one occurrence of M [aeiou]the set of vowels [0-9]the set of digits.Any single character

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Clite Lexical Syntax CategoryDefinition anyChar[ -~] Letter[a-zA-Z] Digit[0-9] Whitespace[ \t] Eol\n Eof\004

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CategoryDefinition Keywordbool | char | else | false | float |if | int | main | true | while Identifier{Letter}({Letter} | {Digit})* integerLit{Digit}+ floatLit{Digit}+\.{Digit}+ charLit‘{anyChar}’

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CategoryDefinition Operator = | || | && | == | != | | >= | + | - | * | / | ! | [ | ] Separator ; |. | { | } | ( | ) Comment // ({anyChar} | {Whitespace})* {eol}

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Generators Input: usually regular expression Output: table (slow), code C/C++: Lex, Flex Java: JLex

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Finite State Automata Set of states: representation – graph nodes Input alphabet + unique end symbol State transition function Labelled (using alphabet) arcs in graph Unique start state One or more final states

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Deterministic FSA Defn: A finite state automaton is deterministic if for each state and each input symbol, there is at most one outgoing arc from the state labeled with the input symbol.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering A Finite State Automaton for Identifiers

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Definitions A configuration on an FSA consists of a state and the remaining input. A move consists of traversing the arc exiting the state that corresponds to the leftmost input symbol, thereby consuming it. If no such arc, then: –If no input and state is final, then accept. –Otherwise, error.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering An input is accepted if, starting with the start state, the automaton consumes all the input and halts in a final state.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Example (S, a2i$) ├ (I, 2i$) ├ (I, i$) ├ (I, $) ├ (F, ) Thus: (S, a2i$) ├ * (F, )

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Some Conventions Explicit terminator used only for program as a whole, not each token. An unlabeled arc represents any other valid input symbol. Recognition of a token ends in a final state. Recognition of a non-token transitions back to start state.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recognition of end symbol (end of file) ends in a final state. Automaton must be deterministic. –Drop keywords; handle separately. –Must consider all sequences with a common prefix together.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Lexer Code Parser calls lexer whenever it needs a new token. Lexer must remember where it left off. Greedy consumption goes 1 character too far –peek function –pushback function –no symbol consumed by start state

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering From Design to Code private char ch = ‘ ‘; public Token next ( ) { do { switch (ch) {... } } while (true); }

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Remarks Loop only exited when a token is found Loop exited via a return statement. Variable ch must be global. Initialized to a space character. Exact nature of a Token irrelevant to design.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Translation Rules Traversing an arc from A to B: –If labeled with x: test ch == x –If unlabeled: else/default part of if/switch. If only arc, no test need be performed. –Get next character if A is not start state

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering A node with an arc to itself is a do-while. –Condition corresponds to whichever arc is labeled.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Otherwise the move is translated to a if/switch: –Each arc is a separate case. –Unlabeled arc is default case. A sequence of transitions becomes a sequence of translated statements.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering A complex diagram is translated by boxing its components so that each box is one node. –Translate each box using an outside-in strategy.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering private boolean isLetter(char c) { return ch >= ‘a’ && ch <= ‘z’ || ch >= ‘A’ && ch <= ‘Z’; }

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering private String concat(String set) { StringBuffer r = new StringBuffer(“”); do { r.append(ch); ch = nextChar( ); } while (set.indexOf(ch) >= 0); return r.toString( ); }

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering public Token next( ) { do { if (isLetter(ch) { // ident or keyword String spelling = concat(letters+digits); return Token.keyword(spelling); } else if (isDigit(ch)) { // int or float literal String number = concat(digits); if (ch != ‘.’) return Token.mkIntLiteral(number); number += concat(digits); return Token.mkFloatLiteral(number);

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering } else switch (ch) { case ‘ ‘: case ‘\t’: case ‘\r’: case eolnCh: ch = nextCh( ); break; case eofCh: return Token.eofTok; case ‘+’: ch = nextChar( ); return Token.plusTok; … case ‘&’: check(‘&’); return Token.andTok; case ‘=‘: return chkOpt(‘=‘, Token.assignTok, Token.eqeqTok);

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Source Tokens // a first program // with 2 comments int main ( ) { char c; int i; c = 'h'; i = c + 3; } // main int main ( ) { char Identifierc ;

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering JLex: A Lexical Analyzer Generator for Java Definition of tokens Regular Expressions JLex Java File: Scanner Class Recognizes Tokens We will look at an example JLex specification (adopted from the manual). Consult the manual for details on how to write your own JLex specifications.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering The JLex tool user code (added to start of generated file) % options %{ user code (added inside the scanner class declaration) %} macro definitions % lexical declaration user code (added to start of generated file) % options %{ user code (added inside the scanner class declaration) %} macro definitions % lexical declaration Layout of JLex file: User code is copied directly into the output class JLex directives allow you to include code in the lexical analysis class, change names of various components, switch on character counting, line counting, manage EOF, etc. Macro definitions gives names for useful regexps Regular expression rules define the tokens to be recognised and actions to be taken

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Java.io.StreamTokenizer An alternative to JLex is to use the class StreamTokenizer from java.io The class recognizes 4 types of lexical elements (tokens): number (sequence of decimal numbers eventually starting with the –(minus) sign and/or containing the decimal point) word (sequence of characters and digits starting with a character) line separator end of file

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Parsing Some terminology Different types of parsing strategies –bottom up –top down Recursive descent parsing –What is it –How to implement one given an EBNF specification –(How to generate one using tools – later) (Bottom up parsing algorithms)

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Parsing: Some Terminology Recognition To answer the question “does the input conform to the syntax of the language?” Parsing Recognition + determination of phrase structure (for example by generating AST data structures) (Un)ambiguous grammar: A grammar is unambiguous if there is only at most one way to parse any input (i.e. for syntactically correct program there is precisely one parse tree)

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Different kinds of Parsing Algorithms Two big groups of algorithms can be distinguished: –bottom up strategies –top down strategies Example parsing of “Micro-English” Sentence ::= Subject Verb Object. Subject ::= I | a Noun | the Noun Object::= me | a Noun | the Noun Noun::= cat | mat | rat Verb::= like | is | see | sees Sentence ::= Subject Verb Object. Subject ::= I | a Noun | the Noun Object::= me | a Noun | the Noun Noun::= cat | mat | rat Verb::= like | is | see | sees The cat sees the rat. The rat sees me. I like a cat The rat like me. I see the rat. I sees a rat.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Top-down parsing Thecatseesarat.Thecatseesrat. The parse tree is constructed starting at the top (root). Sentence SubjectVerbObject. Sentence Noun Subject The Noun cat Verb seesa Noun Object Noun rat.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Bottom up parsing Thecatseesarat.Thecat Noun Subject sees Verb arat Noun Object. Sentence The parse tree “grows” from the bottom (leaves) up to the top (root).

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Look-Ahead Derivation LL-Analyse (Top-Down) Left-to-Right Left Derivative Scans string left to right Builds leftmost derivation Look-Ahead Reduction LR-Analyse (Bottom-Up) Left-to-Right Right Derivative Scans string left to right Builds rightmost derivation Top-Down vs. Bottom-Up parsing

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing Recursive descent parsing is a straightforward top- down parsing algorithm. We will now look at how to develop a recursive descent parser from an EBNF specification. Idea: the parse tree structure corresponds to the “call graph” structure of parsing procedures that call each other recursively.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing Sentence ::= Subject Verb Object. Subject ::= I | a Noun | the Noun Object::= me | a Noun | the Noun Noun::= cat | mat | rat Verb::= like | is | see | sees Sentence ::= Subject Verb Object. Subject ::= I | a Noun | the Noun Object::= me | a Noun | the Noun Noun::= cat | mat | rat Verb::= like | is | see | sees Define a procedure parseN for each non-terminal N private void parseSentence() ; private void parseSubject(); private void parseObject(); private void parseNoun(); private void parseVerb(); private void parseSentence() ; private void parseSubject(); private void parseObject(); private void parseNoun(); private void parseVerb();

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing public class MicroEnglishParser { private TerminalSymbol currentTerminal; //Auxiliary methods will go here... //Parsing methods will go here... } public class MicroEnglishParser { private TerminalSymbol currentTerminal; //Auxiliary methods will go here... //Parsing methods will go here... }

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing: Auxiliary Methods public class MicroEnglishParser { private TerminalSymbol currentTerminal private void accept(TerminalSymbol expected) { if (currentTerminal matches expected) currentTerminal = next input terminal ; else report a syntax error }... } public class MicroEnglishParser { private TerminalSymbol currentTerminal private void accept(TerminalSymbol expected) { if (currentTerminal matches expected) currentTerminal = next input terminal ; else report a syntax error }... }

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing: Parsing Methods private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’); } private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’); } Sentence ::= Subject Verb Object.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing: Parsing Methods private void parseSubject() { if (currentTerminal matches ‘ I ’) accept(‘ I ’); else if (currentTerminal matches ‘ a ’) { accept(‘ a ’); parseNoun(); } else if (currentTerminal matches ‘ the ’) { accept(‘ the ’); parseNoun(); } else report a syntax error } private void parseSubject() { if (currentTerminal matches ‘ I ’) accept(‘ I ’); else if (currentTerminal matches ‘ a ’) { accept(‘ a ’); parseNoun(); } else if (currentTerminal matches ‘ the ’) { accept(‘ the ’); parseNoun(); } else report a syntax error } Subject ::= I | a Noun | the Noun

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Recursive Descent Parsing: Parsing Methods private void parseNoun() { if (currentTerminal matches ‘ cat ’) accept(‘ cat ’); else if (currentTerminal matches ‘ mat ’) accept(‘ mat ’); else if (currentTerminal matches ‘ rat ’) accept(‘ rat ’); else report a syntax error } private void parseNoun() { if (currentTerminal matches ‘ cat ’) accept(‘ cat ’); else if (currentTerminal matches ‘ mat ’) accept(‘ mat ’); else if (currentTerminal matches ‘ rat ’) accept(‘ rat ’); else report a syntax error } Noun::= cat | mat | rat

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Algorithm to convert EBNF into a RD parser private void parseN() { parse X } private void parseN() { parse X } N ::= X The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! => JavaCC “Java Compiler Compiler” We can describe the algorithm by a set of mechanical rewrite rules

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Algorithm to convert EBNF into a RD parser // a dummy statement parse  parse N where N is a non-terminal parseN(); parse t where t is a terminal accept(t); parse XY parse X parse Y parse X parse Y

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering Algorithm to convert EBNF into a RD parser parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X|Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }