Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.

Slides:



Advertisements
Similar presentations
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
Advertisements

CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
Specifying Languages CS 480/680 – Comparative Languages.
COP4020 Programming Languages
Winter 2003/4Pls – syntax – Catriel Beeri1 SYNTAX Syntax: form, structure The syntax of a pl: The set of its well-formed programs The rules that define.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
EECS 6083 Intro to Parsing Context Free Grammars
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
ERROR HANDLING Lecture on 27/08/2013 PPT: 11CS10037 SAHIL ARORA.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Context-Free Grammars and Parsing
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
SYNTAX ANALYSIS & ERROR RECOVERY By: Sarthak Swaroop.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
Bernd Fischer RW713: Compiler and Software Language Engineering.
1 Problems with Top Down Parsing  Left Recursion in CFG May Cause Parser to Loop Forever.  Indeed:  In the production A  A  we write the program procedure.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
LESSON 04.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Compiler Principle and Technology Prof. Dongming LU Mar. 18th, 2015.
Syntax Analyzer (Parser)
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Eliminating Left-Recursion Where some of a nonterminal’s productions are left-recursive, top-down parsing is not possible “Immediate” left-recursion can.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Introduction to Parsing
Chapter 3 – Describing Syntax
lec02-parserCFG May 8, 2018 Syntax Analyzer
LESSON 16.
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Chapter 3 Context-Free Grammar and Parsing
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Syntax Analysis Chapter 4.
Syntax Analysis Sections :.
CSE 3302 Programming Languages
Syntax Analysis Sections :.
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
Introduction to Parsing
BNF 9-Apr-19.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
lec02-parserCFG May 27, 2019 Syntax Analyzer
COMPILER CONSTRUCTION
Faculty of Computer Science and Information System
Presentation transcript:

Syntax Analysis Or Parsing

A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly or explicitly) a tree (called as a parse tree) to represent the structure. –The above tree is used later to guide translation.

Parsing During Compilation intermediate representation errors lexical analyzer parser rest of front end symbol table source program parse tree get next token token regular expressions Collecting token information Perform type checking Intermediate code generation uses a grammar to check structure of tokens produces a parse tree syntactic errors and recovery recognize correct syntax report errors

Parsing Responsibilities Syntax Error Identification / Handling Recall typical error types: 1.Lexical : Misspellings 2.Syntactic : Omission, wrong order of tokens 3.Semantic : Incompatible types, undefined IDs 4.Logical : Infinite loop / recursive call Majority of error processing occurs during syntax analysis NOTE: Not all errors are identifiable !! if x<1 thenn y = 5: if ((x 5))) if (x+5) then if (i<9) then... Should be <= not <

Error Detection Much responsibility on Parser –Many errors are syntactic in nature –Modern parsing method can detect the presence of syntactic errors in programs very efficiently –Detecting semantic or logical error is difficult Challenges for error handler in Parser –It should report error clearly and accurately –It should recover from error and continue.. –It should not significantly slow down the processing of correct programs Good news is –Common errors are simple and relatively easy to catch. Errors don’t occur that frequently!! 60% programs are syntactically and semantically correct 80% erroneous statements have only 1 error, 13% have 2 Most error are trivial : 90% single token error 60% punctuation, 20% operator, 15% keyword, 5% other error

Difficult to generate clear and accurate error messages. Example function foo () {... if (...) {... } else {... } Example int myVarr;... x = myVar;... Adequate Error Reporting is Not a Trivial Task Missing } here Not detected until here Misspelled ID here Not detected until here

Error Recovery After first error recovered –Compiler must go on! Restore to some state and process the rest of the input Error-Correcting Compilers –Issue an error message –Fix the problem –Produce an executable Example Error on line 23: “myVarr” undefined. “myVar” was used. May not be a good Idea!! –Guessing the programmers intention is not easy!

Error Recovery May Trigger More Errors! Inadequate recovery may introduce more errors –Those were not programmers errors Example: int myVar flag ;... x := flag;... while (flag==0)... Too many Error message may be obscuring –May bury the real message –Remedy: allow 1 message per token or per statement Quit after a maximum (e.g. 100) number of errors Declaration of flag is discarded Variable flag is undefined

Error Recovery Approaches: Panic Mode Discard tokens until we see a “synchronizing” token. The key... –Good set of synchronizing tokens –Knowing what to do then Advantage –Simple to implement –Does not go into infinite loop –Commonly used Disadvantage –May skip over large sections of source with some errors Example Skip to next occurrence of } end ; Resume by parsing the next statement

Error Recovery Approaches: Phrase-Level Recovery Compiler corrects the program by deleting or inserting tokens...so it can proceed to parse from where it was. The key... Don’t get into an infinite loop Example while (x==4) y:= a + b Insert do to fix the statement

Context Free Grammars (CFG) A context free grammar is a formal model that consists of: Terminals Keywords Token Classes Punctuation Non-terminals Any symbol appearing on the lefthand side of any rule Start Symbol Usually the non-terminal on the lefthand side of the first rule Rules (or “Productions”) BNF: Backus-Naur Form / Backus-Normal Form Stmt ::= if Expr then Stmt else Stmt

Rule Alternative Notations

Context Free Grammars : A First Look assign_stmt  id := expr ; expr  expr operator term expr  term term  id term  real term  integer operator  + operator  - Derivation: A sequence of grammar rule applications and substitutions that transform a starting non-term into a sequence of terminals / tokens.

Derivation Let’s derive: id := id + real – integer ; assign_stmt assign_stmt  id := expr ;  id := expr ;expr  expr operator term  id := expr operator term;expr  expr operator term  id := expr operator term operator term; expr  term  id := term operator term operator term; term  id  id := id operator term operator term; operator  +  id := id + term operator term; term  real  id := id + real operator term; operator  -  id := id + real - term; term  integer  id := id + real - integer; using production :

Example Grammar: Simple Arithmetic Expressions expr  expr op expr expr  ( expr ) expr  - expr expr  id op  + op  - op  * op  / op   9 Production rules Terminals: id + - * /  ( ) Nonterminals: expr, op Start symbol: expr

Notational Conventions Terminals –Lower-case letters early in the alphabet: a, b, c –Operator symbols: +, - –Punctuations symbols: parentheses, comma –Boldface strings: id or if Nonterminals: –Upper-case letters early in the alphabet: A, B, C –The letter S (start symbol) –Lower-case italic names: expr or stmt Upper-case letters late in the alphabet, such as X, Y, Z, represent either nonterminals or terminals. Lower-case letters late in the alphabet, such as u, v, …, z, represent strings of terminals.

Notational Conventions Lower-case Greek letters, such as , , , represent strings of grammar symbols. Thus A   indicates that there is a single nonterminal A on the left side of the production and a string of grammar symbols  to the right of the arrow. If A   1, A   2, …., A   k are all productions with A on the left, we may write A   1 |  2 | …. |  k Unless otherwise started, the left side of the first production is the start symbol. E  E A E | ( E ) | -E | id A  + | - | * | / | 

Derivations Doesn’t contain nonterminals

Derivation

Leftmost Derivation

Rightmost Derivation

Parse Tree

Ambiguous Grammar

More than one Parse Tree for some sentence. –The grammar for a programming language may be ambiguous –Need to modify it for parsing. Also: Grammar may be left recursive. Need to modify it for parsing.

Elimination of Ambiguity Ambiguous A Grammar is ambiguous if there are multiple parse trees for the same sentence. Disambiguation Express Preference for one parse tree over others –Add disambiguating rule into the grammar

Resolving Problems: Ambiguous Grammars Consider the following grammar segment: stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) If E1 then S1 else if E2 then S2 else S3 simple parse tree: stmt expr E1E1 E2E2 S3S3 S1S1 S2S2 then else if stmt

Example : What Happens with this string? If E 1 then if E 2 then S 1 else S 2 How is this parsed ? if E 1 then if E 2 then S 1 else S 2 if E 1 then if E 2 then S 1 else S 2 vs.

Parse Trees: If E1 then if E2 then S1 else S2 Form 1: stmt expr E1E1 S2S2 thenelseif expr E2E2 S1S1 then if stmt expr E1E1 thenif stmt expr E2E2 S2S2 S1S1 then else if stmt Form 2:

Removing Ambiguity Take Original Grammar: stmt  if expr then stmt | if expr then stmt else stmt | other (any other statement) Revise to remove ambiguity: stmt  matched_stmt | unmatched_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other unmatched_stmt  if expr then stmt | if expr then matched_stmt else unmatched_stmt Rule: Match each else with the closest previous unmatched then.

Any Question?