Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.

Slides:



Advertisements
Similar presentations
Chapter 4 Lexical and Syntax Analysis Sections
Advertisements

Chapter 4 Lexical and Syntax Analysis Sections 1-4.
ISBN Chapter 4 Lexical and Syntax Analysis.
ISBN Chapter 4 Lexical and Syntax Analysis.
Slide1 Chapter 4 Lexical and Syntax Analysis. slide2 OutLines: In this chapter a major topics will be discussed : Introduction to lexical analysis, including.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
CS 330 Programming Languages 09 / 23 / 2008 Instructor: Michael Eckmann.
Lexical and Syntax Analysis
Chapter 4 Parsing Sequences. Recursive Descent Parsing expr() – term() lex() +/- lex() term() factor() – if id lex(), if ( expr() right lex(), term()
Lecture 4 Concepts of Programming Languages Arne Kutzner Hanyang University / Seoul Korea.
1 Bottom-up parsing Goal of parser : build a derivation –top-down parser : build a derivation by working from the start symbol towards the input. builds.
ISBN Lecture 04 Lexical and Syntax Analysis.
Chapter 4 Lexical and Syntax Analysis. Chapter 4 Topics Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing.
Lexical and syntax analysis
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Syntax and Semantics Structure of programming languages.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
Chapter 4 Lexical and Syntax Analysis. 4-2 Chapter 4 Topics 4.1 Introduction 4.2 Lexical Analysis 4.3 The Parsing Problem 4.4 Recursive-Descent Parsing.
CS 330 Programming Languages 09 / 26 / 2006 Instructor: Michael Eckmann.
LANGUAGE TRANSLATORS: WEEK 17 scom.hud.ac.uk/scomtlm/cis2380/ See Appel’s book chapter 3 for support reading Last Week: Top-down, Table driven parsers.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Some parts are Copyright © 2004 Pearson Addison-Wesley. All rights reserved.3-1 Programming Language Specification and Translation ICOM 4036 Spring 2009.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Bottom Up Parsing.
ISBN Chapter 4 Lexical and Syntax Analysis.
Syntax and Semantics Structure of programming languages.
CS 330 Programming Languages 09 / 21 / 2006 Instructor: Michael Eckmann.
ISBN Chapter 4 Lexical and Syntax Analysis.
CPS 506 Comparative Programming Languages Syntax Specification.
ISBN Chapter 4 Lexical and Syntax Analysis.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
College of Computer Science and Engineering Course: ICS313
Bottom-Up Parsing David Woolbright. The Parsing Problem Produce a parse tree starting at the leaves The order will be that of a rightmost derivation The.
ISBN Chapter 4 Lexical and Syntax Analysis.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
CS 330 Programming Languages 09 / 20 / 2007 Instructor: Michael Eckmann.
ISBN Chapter 4 Lexical and Syntax Analysis.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.3-1 Language Specification and Translation Lecture 8.
C HAPTER 4 Lexical and Syntax Analysis. C HAPTER 4 T OPICS Introduction Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.3-1 Language Specification and Translation ICOM 4036 Spring 2004 Lecture 3.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Syntax and Semantics Structure of programming languages.
Lecture 4 Concepts of Programming Languages
4.1 Introduction - Language implementation systems must analyze
Lexical and Syntax Analysis
Chapter 4 - Parsing CSCE 343.
Programming Languages Translator
Chapter 4 Lexical and Syntax Analysis.
Compiler design Bottom-up parsing Concepts
Lexical and Syntax Analysis
Compiler design Bottom-up parsing: Canonical LR and LALR
Lexical and Syntax Analysis
4 (c) parsing.
Lexical and Syntax Analysis
Lexical and Syntactic Analysis
R.Rajkumar Asst.Professor CSE
Lexical and Syntax Analysis
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Lexical and Syntax Analysis
Lexical and Syntax Analysis
Lexical and Syntax Analysis
Lexical and Syntax Analysis
4.1 Introduction - Language implementation systems must analyze
Compiler design Bottom-up parsing: Canonical LR and LALR
Presentation transcript:

Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into meaningful units (lexical analysis) and then parse these lexical components into their syntactic uses (syntactic analysis) The goal of lexical analysis is to identify each lexeme in the source code and assign it the proper token The goal of syntactic analysis is to parse the lexemes into a parse tree –During both lexical and syntactic analysis, errors can be detected and reported –Note that this two-step process omits the actual translation from source code to machine language – this is also required, but we will not consider it here

Lexical Analysis Given source code that consists of –reserved words, identifiers, punctuation, blank spaces, comments Identify each item for what it is –if it is a reserved word or punctuation, categorize the type, if it is an identifier, add it to the symbol table (if not already present), if it is blank space or comment, discard (ignore) it How will we perform this operation? –we can use a relatively simple state transition diagram to describe the various entities of interest and then implement this in a program the lexical analysis program’s task is to parse the input and produce each item individually (the lexeme) each item should include its type (the token)

Recognizing Names/Words/Numbers int lex( ) { getChar( ); switch (charClass) { case LETTER: addChar( ); getChar( ); while (charClass == LETTER || charClass == DIGIT) { addChar( ); getChar( ); } return lookup(lexeme); break; case DIGIT: addChar( ); getChar( ); while (charClass == DIGIT) { addChar( ); getChar( ); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */

Parsing The process of –generating a parse tree from a set of input that identifies the grammatical categories of each element of the input –identifying if and where errors occur Parsing is similar whether for a natural language or a programming language –a good parser will continue parsing even after errors have been found this requires a recovery process Two general forms of parsers –Top-down (used in LL parser algorithm) start with LHS rules, map to RHS rules until terminal symbols have been identified, match these against the input –Bottom-up (used in LR parser algorithms) start with RHS rules and input, collapse terminals and non-terminals into non- terminals until you have reached the starting non-terminal –Parsing is an O(n 3 ) problem where n is the number of items in the input if we cannot determine a single token for each lexeme, the problem because O(2 n )! by restricting our parser to work only on the grammar of the given language, we can reduce the complexity to O(n)

Top-Down Parsing Using a BNF of a language, we generate a recursive-decent parser –each of our non-terminal grammatical categories in the BNF are converted into functions (e.g.,,,, etc) –in any given function, when called, it parses the next lexeme using a function called lex( ), and maps it to terminal symbols and/or calls further functions this approach is known as an LL Parser – left-to-right parse, using leftmost derivations –it is simple to generate the recursive-decent parser There are two restrictions that we must make on the grammar –the grammar specifying the language cannot have left recursion if a rule has recursive parts, those parts must not be the first items on the RHS of a rule –the grammars must pass the pairwise disjointness test Algorithms exist to alter a grammar so that it passes both restrictions

Recursive Decent Parser Example Recall our example expression grammar from chapter 3:  {(+ | -) }  {(* | /) }  id | ( ) void expr( ) { term( ); while (nextToken = = PLUS_CODE || nextToken = = MINUS_CODE){ lex( ); term( ); } void term( ) { factor( ); while (nextToken = = MULT_CODE || nextToken = = DIV_CODE) { lex( ); factor( ); } void factor( ) { if(nextToken = = ID_CODE) lex( ); else if(nextToken = = LEFT_PAREN_CODE) { lex( ); expr( ); if(nextToken = = RIGHT_PAREN_CODE) lex( ); else error( ); } else error( ); }

If Statement Example void ifstmt( ) { if (nextToken != IF_CODE) error( ); else { lex( ); if (nextToken != LEFT_PAREN_CODE) error( ); else { boolexpr( ); if (nextToken != RIGHT_PAREN_CODE) error( ); else { statement( ); if(nextToken = = ELSE_CODE) { lex( ); statement( ); } We expect an if statement to look like this: if (boolean expr) statement; optionally followed by: else statement; Otherwise, we return an error

LL Parser Restriction Recall one of our restrictions for the use of the LL parser was that the grammar pass the pairwise disjointness test –The parser will need to be able to select the proper right-hand side rule to apply while parsing if the current rule being applied is of a, should we apply or ( ) to it? –For the parser to be able to know which rule to apply, the first non-terminal on each right-hand side rule must differ consider a rule –  a | a if the parser finds an “a” in the input, which rule should be applied –should it call function B or C?

Pairwise Disjointness Test The pairwise disjointness test examines a grammar to make sure that –all rules in the grammar are pairwise disjoint for the same LHS the book provides a formal definition that we will skip –here are some examples A  aB | bAb | c –is pairwise disjoint A  aB | aAb –is not pairwise disjoint  id | id[ ] –is not pairwise disjoint, but we can make it so:  id   | [ ] –  means empty set

Bottom-Up Parsing Because of the two restrictions placed on grammars to qualify for the LL parser –an alternative approach is the LR parser which does bottom-up parsing LR: Left-to-right parsing, Rightmost derivation The parser is implemented using a pushdown automaton –a stack added to the state diagrams seen earlier The parser has two basic processes –shift – move items from the input onto the stack –reduce – take consecutive stack items and reduce them, for instance, if we have a rule  a and we have a and on the stack, reduce them to while the parser is easy to implement, it relies on an LR parsing table, which is difficult to generate –there are numerous algorithms to generate the parsing table, we will skip how to do that and assume we already have one available

Parser Algorithm Given input S0, a1, …, an, $ –S 0 is the start state –a 1, …, a n are the lexemes that make up the program –$ is a special end of input symbol If action[S m, a i ] = Shift S, then push a i, S onto stack and change state to S If action[S m, a i ] = Reduce R, then use rule R in the grammar and reduce the items on the stack appropriately, changing state to be the state GOTO[S m, R] If action[S m, a i ] = Accept then the parse is complete with no errors If action[S m, a i ] = Error (or the entry in the table is blank) then call error-handling and recovery routine The Parsing table stores the values of action[x, y] and GOTO[x, y]

Example Grammar: 1. E  E + T 2. E  T 3. T  T * F 4. T  F 5. F  (E) 6. F  id Parse of id+id*id$ Stack Input Action 0 id+id*id$ S5 0id5 +id*id$ R6(GOTO[0,F]) 0F3 +id*id$ R4(GOTO[0,T]) 0T2 +id*id$ R2(GOTO[0,E]) 0E1 +id*id$ S6 0E1+6 id*id$ S5 0E1+6id5 *id$ R6(GOTO[6,F]) 0E1+6F3 *id$ R4(GOTO[6,T]) 0E1+6T9 *id$ S7 0E1+6T9*7 id$ S5 0E1+6T9*7id5 $ R6(GOTO[7,F]) 0E1+6T9*7F10 $ R3(GOTO[6,T]) 0E1+6T9 $ R1(GOTO[0,E]) 0E1 $ ACCEPT