Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University.

Slides:



Advertisements
Similar presentations
Mooly Sagiv and Roman Manevich School of Computer Science
Advertisements

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Compiler Construction Parsing Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Professor Yihjia Tsai Tamkang University
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 214 review. 2 What we have learnt Generate scanner and parser –We do not program directly –Instead we write the specifications for the scanner and parser.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Winter Compiler Construction T2 – Lexical Analysis (Scanning) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lexical and Syntax Analysis
Syntactic Analysis Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
CPS 506 Comparative Programming Languages Syntax Specification.
Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group
Introduction to Compiling
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Top-Down Parsing.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Chapter 4 - Parsing CSCE 343.
CS510 Compiler Lecture 4.
50/50 rule You need to get 50% from tests, AND
Lexical and Syntax Analysis
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
Lecture #12 Parsing Types.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Textbook:Modern Compiler Design
Syntax Analysis Chapter 4.
Bottom-Up Syntax Analysis
Top-Down Parsing.
4 (c) parsing.
Fall Compiler Principles Lecture 4: Parsing part 3
CS416 Compiler Design lec00-outline September 19, 2018
Syntax Analysis Sections :.
Top-Down Parsing CS 671 January 29, 2008.
Introduction CI612 Compiler Design CI612 Compiler Design.
CPSC 388 – Compiler Design and Construction
R.Rajkumar Asst.Professor CSE
Subject: Language Processor
CS416 Compiler Design lec00-outline February 23, 2019
Fall Compiler Principles Lecture 4: Parsing part 3
Chapter 10: Compilers and Language Translation
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Compiler Construction Parsing I Ran Shaham and Ohad Shacham School of Computer Science Tel-Aviv University

2 Administration Forum Forum Project Teams Project Teams Send me an if you can’t find a team Send me an if you can’t find a team Send me your team if you found one and didn’t send an Send me your team if you found one and didn’t send an Check excel file on website Check excel file on website First PA is at: First PA is at:

3 Programming Assignment 1 Implement a scanner for IC Implement a scanner for IC class Token class Token At least – line, id, value At least – line, id, value Should extend java_cup.runtime.Symbol Should extend java_cup.runtime.Symbol Numeric token ids in sym.java Numeric token ids in sym.java Will be later generated by JavaCup Will be later generated by JavaCup class Compiler class Compiler Testbed - calls scanner to print list of tokens Testbed - calls scanner to print list of tokens [StateList] > { return appropriate symbol } [StateList] > { return appropriate symbol }

4 Programming Assignment 1 class LexicalError class LexicalError Caught by Compiler Caught by Compiler Assume Assume class identifiers starts with a capital letter class identifiers starts with a capital letter Other identifiers starts with a non capital letter Other identifiers starts with a non capital letter

5 sym.java public class sym { public static final int EOF = 0; public static final int EOF = 0; public static final int ID = 1; public static final int ID = 1;......} Defines symbol constant ids Communicate between parser and scanner Actual values don’t matter Unique value for each tokes Will be generated by cup in PA2

6 Token class import java_cup.runtime.Symbol; public class Token extends Symbol { public int getId() {...} public int getId() {...} public Object getValue() {...} public int getLine() {...} public Object getValue() {...} public int getLine() {...}......}

7 JFlex directives to use %cup (integrate with cup) %line (count lines) %type Token (pass type Token) %class Lexer (gen. scanner class)

8 %cup %implements java_cup.runtime.Scanner %implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.Scanner Lex class implements java_cup.runtime.Scanner %function next_token %function next_token Returns the next token Returns the next token %type java_cup.runtime.Symbol %type java_cup.runtime.Symbol Return token Class Return token Class

9 Structure JFlex javac IC.lex Lexical analyzer test.ic tokens Lexer.java sym.java Token.java LexicalError.java Compiler.java

10 Directions Download Java Download Java Download JFlex Download JFlex Download JavaCup Download JavaCup Put JFlex and JavaCup in classpath Put JFlex and JavaCup in classpath Eclipse Eclipse Use ant build.xml Use ant build.xml Import jflex and javacup Import jflex and javacup Apache Ant Apache Ant

11 Directions Use skeleton from the website Use skeleton from the website Read Assignment Read Assignment Use Forum Use Forum

12 Tools Ant Ant Make environment Make environment A build.xml included in the skeleton A build.xml included in the skeleton Download from: Download from: Use: Use: ant – to compile ant – to compile ant scanner – to run JFlex ant scanner – to run JFlex

13 Tools JFlex JFlex Lexical analyzer generator Lexical analyzer generator Download from: Download from: Manual: Manual: Add $MyJFlex/lib/JFlex.jar to your classpath Add $MyJFlex/lib/JFlex.jar to your classpath Use: Use: java JFlex.Main IC.lex java JFlex.Main IC.lex ant scanner – for ant users ant scanner – for ant users

14 Tools Cup Cup Parser generator Parser generator Download from: Download from: Manual: Manual: Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath Put java-cup-11a.jar and java-cup-11a-runtime.jar in your classpath Use: Use: java -jar java-cup-11a.jar java -jar java-cup-11a.jar ant libparser – for ant users ant libparser – for ant users

15 Compiler IC Program ic x86 executable exe Lexical Analysis Syntax Analysis Parsing ASTSymbol Table etc. Inter. Rep. (IR) Code Generation IC compiler

16 Parsing Input: Sequence of Tokens Sequence of TokensOutput: Abstract Syntax Tree Abstract Syntax Tree Decide whether program satisfies syntactic structure Decide whether program satisfies syntactic structure

17 Parsing errors Error detection Error detection Report the most relevant error message Report the most relevant error message Correct line number Correct line number Current v.s. expected token Current v.s. expected token Error recovery Error recovery Recover and continue to the next error Recover and continue to the next error Heuristics for good recovery to avoid many spurious errors Heuristics for good recovery to avoid many spurious errors Search for a semi-column and ignore the statement Search for a semi-column and ignore the statement Ignore the next n errors Ignore the next n errors

18 Parsing Context Free Grammars (CFG) Context Free Grammars (CFG) Captures program structure (hierarchy) Captures program structure (hierarchy) Employ formal theory results Employ formal theory results Automatically create “efficient” parsers Automatically create “efficient” parsers Grammar: S  if E then S else S S  print E E  num

19 From text to abstract syntax 5 + (7 * x) )id*num(+num Lexical Analyzer program text token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) num(5) E EE+ E*E ( E) num(7)id(x) + Num(5) Num(7) id(x) * Abstract syntax tree parse tree valid syntax error

20 From text to abstract syntax )id*num(+num token stream Parser Grammar: E  id E  num E  E + E E  E * E E  ( E ) num E EE+ E*E ( E) id + num x * Abstract syntax tree parse tree valid syntax error Note: a parse tree describes a run of the parser, an abstract syntax tree is the result of a successful run

21 Parsing terminology Symbols סימנים)) : terminals (tokens) + * ( ) id num non-terminals E Derivation (גזירה): E E + E 1 + E 1 + E * E * E * 3 Parse tree (עץ גזירה): 1 E EE+ E E * 2 3 Grammar rules :(חוקי דקדוק) E  id E  num E  E + E E  E * E E  ( E ) Convention: the non-terminal appearing in the first derivation rule is defined to be the initial non-terminal Each step in a derivation is called a production

22 Ambiguity Derivation: E E + E 1 + E 1 + E * E * E * 3 Parse tree: 1 E EE+ E E * 2 3 Derivation: E E * E E * 3 E + E * 3 E + 2 * * 3 Parse tree: E EE* 3 E E Leftmost derivation Rightmost derivation Grammar rules: E  id E  num E  E + E E  E * E E  ( E ) Definition: a grammar is ambiguous ( רב - משמעי ) if there exists an input string that has two different derivations

23 Grammar rewriting Ambiguous grammar: E  id E  num E  E + E E  E * E E  ( E ) Unambiguous grammar: E  E + T E  T T  T * F T  F F  id F  num F  ( E ) E ET+ T F * 3 F 2 T F 1 Derivation: E E + T 1 + T 1 + T * F 1 + F * F * F * 3 Parse tree: Note the difference between a language and a grammar: A grammar represents a language. A language can be represented by many grammars.

24 Parsing methods – Top Down LL(k) LL(k) “L” – left-to-right scan of input “L” – left-to-right scan of input “L” – leftmost derivation “L” – leftmost derivation “k” – predict based on “k” look-ahead tokens “k” – predict based on “k” look-ahead tokens Predict a production for a non-terminal and “k” tokens Predict a production for a non-terminal and “k” tokens

25 Parsing methods – Bottom Up LR(0), SLR(1), LR(1), LALR(1) LR(0), SLR(1), LR(1), LALR(1) “L” – left-to-right scan of input “L” – left-to-right scan of input “R” – right most derivation “R” – right most derivation Decide a production for a RHS and a lookup Decide a production for a RHS and a lookup

26 Top Down – parsing E 1 + E E  T + E E  i T  i T + E E T E T + E E +T E

27 Top Down – parsing Starts with the start symbol Starts with the start symbol Tries to transform it to the input Tries to transform it to the input Also called predictive parsing Also called predictive parsing LL(1) example LL(1) example Grammar: S  if E then S else S S  begin S L S  print E L  end L  ; S L E  num if 5 then print 8 else… Token : rule S if : S  if E then S else S if E then S else S 5 : E  num if 5 then S else S print : print E if 5 then print E else S …

28 Top Down - problems Left Recursion Left Recursion A  Aa A  Aa A  a A  a Non termination Non termination A Aa Aaa Aaaa Aaaaaa….. …

29 Top Down - problems Two rules cannot start with same token Two rules cannot start with same token Can be solved by backtracking Can be solved by backtracking Reduce #backtracks Reduce #backtracks E  T + E E  T E  T + E E  T E T T + E

30 Top Down – solution Two ways Eliminate left recursion Eliminate left recursion Perform left refactoring Perform left refactoring

31 Top Down – solution Step I: left recursion removal E  E + T E  T T  T * F T  F F  id F  (E) E  T + E T  F * T

32 Top Down – solution Step II: left factoring E  T + E E  T T  F * T T  F F  id F  (E) E  T E’ E’  + E E’  ε T  F T’ T’  * T T’  ε F  id F  (E)

33 Top Down – left recursion Non-terminal with two rules starting with same prefix Non-terminal with two rules starting with same prefix Grammar: S  if E then S else S S  if E then S Left-factored grammar: S  if E then S X X  ε X  else S

34 Bottom Up – parsing No problem with left recursion No problem with left recursion Widely used in practice Widely used in practice LR(0), SLR(1), LR(1), LALR(1) LR(0), SLR(1), LR(1), LALR(1) We will focus only on the theory of LR(0) We will focus only on the theory of LR(0) JavaCup implements LALR(1) JavaCup implements LALR(1) Starts with the input Starts with the input Attempt to rewrite it to the start symbol Attempt to rewrite it to the start symbol

35 Bottom Up – parsing 1 + (2) + (3) E + (E) + (3) + E  E + (E) E  i E 12+3 E E + (3) E () () E + (E) E E E E + (2) + (3)

36 Bottom Up - problems Ambiguity Ambiguity E = E + E E = i > (1 + 2) + 3 ???? > 1 + (2 + 3) ????

37 Summary Do PA1 Do PA1 Use forum Use forum Next week Next week Cup Cup LR(0) LR(0)