lec02-parserCFG July 30, 2018 Syntax Analyzer

Slides:



Advertisements
Similar presentations
lec02-parserCFG March 27, 2017 Syntax Analyzer
Advertisements

Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
COP4020 Programming Languages
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
ERROR HANDLING Lecture on 27/08/2013 PPT: 11CS10037 SAHIL ARORA.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Chapter 4 Syntax Analysis. Syntax Error Handling Example: 1. program prmax(input,output) 2. var 3. x,y:integer; 4. function max(i:integer, j:integer)
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
SYNTAX ANALYSIS & ERROR RECOVERY By: Sarthak Swaroop.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
1 Bottom-Up Parsing  “Shift-Reduce” Parsing  Reduce a string to the start symbol of the grammar.  At every step a particular substring is matched (in.
Bernd Fischer RW713: Compiler and Software Language Engineering.
1 Problems with Top Down Parsing  Left Recursion in CFG May Cause Parser to Loop Forever.  Indeed:  In the production A  A  we write the program procedure.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Introduction to Parsing
Problems with Top Down Parsing
lec02-parserCFG May 8, 2018 Syntax Analyzer
LESSON 16.
Parsing & Context-Free Grammars
Chapter 4 - Parsing CSCE 343.
CS 404 Introduction to Compiler Design
Programming Languages Translator
CS510 Compiler Lecture 4.
Bottom-Up Parsing.
Lexical and Syntax Analysis
Introduction to Parsing (adapted from CS 164 at Berkeley)
Textbook:Modern Compiler Design
Syntax Specification and Analysis
Syntax Analysis Part I Chapter 4
Introduction to Top Down Parser
Syntax Analysis Chapter 4.
Compiler Construction
4 (c) parsing.
Syntax Analysis Sections :.
Parsing Techniques.
Syntax Analysis Sections :.
Syntax Analysis Sections :.
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Top-Down Parsing The parse tree is created top to bottom.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Chapter 4 Top Down Parser.
Bottom Up Parsing.
R.Rajkumar Asst.Professor CSE
Introduction to Parsing
Introduction to Parsing
Bottom-Up Parsing “Shift-Reduce” Parsing
BNF 9-Apr-19.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
lec02-parserCFG May 27, 2019 Syntax Analyzer
COMPILER CONSTRUCTION
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

lec02-parserCFG July 30, 2018 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG). The syntax analyzer (parser) checks whether a given source program satisfies the rules implied by a context-free grammar or not. If it satisfies, the parser creates the parse tree of that program. Otherwise the parser gives the error messages. A context-free grammar gives a precise syntactic specification of a programming language. the design of the grammar is an initial phase of the design of a compiler. a grammar can be directly converted into a parser by some tools. CS 701 Language Processor

The Role of Parser Parser works on a stream of tokens, generated by lexical analyzer. The smallest item is a token. It verifies that the string can be generated by the grammar for the source language token Parser source program Lexical Analyzer parse tree get next token CS 701 Language Processor

Parsers (cont.) We categorize the parsers into two groups: Top-Down Parser the parse tree is created top to bottom, starting from the root. Bottom-Up Parser the parse is created bottom to top; starting from the leaves Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time). Efficient top-down and bottom-up parsers can be implemented only for sub-classes of context-free grammars. LL for top-down parsing LR for bottom-up parsing CS 701 Language Processor

Parsing During Compilation lec02-parserCFG July 30, 2018 Parsing During Compilation regular expressions errors lexical analyzer parser rest of front end symbol table source program parse tree get next token token interm repres also technically part or parsing includes augmenting info on tokens in source, type checking, semantic analysis uses a grammar to check structure of tokens produces a parse tree syntactic errors and recovery recognize correct syntax report errors CS 701 Language Processor

Role Of Parser To identify the language constructs present in a given input program. If the parser determines the input to be valid one, it outputs a representation of the input in the form of a parse tree. If the input is grammatically incorrect, the parser declares the detection of syntax error in the input. This case no parse tree can be produced. CS 701 Language Processor

Error Handling Error handling is one of the most important feature of any modern compiler. It is highly unlikely that a compiler without good error handling capacity will be accepted by the users even if it can produce correct code for the correct program. The most important challenge , here is to have a good guess of possible mistakes that a programmer can do and to come up with strategies to point those errors to the user in a very clear and unambiguous manner. CS 701 Language Processor

Error Handling Contd. The common errors occurring in the programs can be classified into four categories. Lexical error: These errors are mainly spelling mistakes and accidental insertion of foreign characters, for e.g.. ‘$’, if the language does not allow it. They are mostly caught by the lexical analyzer. Syntactic error: These are grammatical mistakes , such as unbalanced parentheses in arithmetic expressions. The parser should be able to catch these errors efficiently. Semantic error: They involves errors due to undefined variables, incompatible operands to an operator etc. These errors can be caught by introducing some extra check during parsing. Logical error: These are errors such as infinite loops. There is not any way to catch the logical error automatically. However use of debugging tools may help the programmer to identify such errors. CS 701 Language Processor

Error Recovery Strategies lec02-parserCFG July 30, 2018 Error Recovery Strategies Thus an important challenge of syntax analysis phase is to detect syntactical errors. However nobody would like a compiler that stops after detecting the first error because, there may be many more such errors. If all or most of these errors can be reported to the user in a single go, the user may correct all of them and resubmit for compilation. But in most of cases, the presence of error in input stream leads the parser into an erroneous state, from where it can’t proceed further until certain portion of its work is undone. The strategies involved in the process broadly known as error recovery strategies. CS 701 Language Processor

Goal of error Handler It should report the presence or error cleanly & accurately. It should recover from each error quickly enough to be able to detect subsequent errors. It should not significantly slow down the processing of correct program. CS 701 Language Processor

Error Recovery Strategies lec02-parserCFG July 30, 2018 Error Recovery Strategies Panic Mode– In this case the parser discards enough number of tokens to reach a descent state on the detection of an error. On discovering an error, the parser discards input symbols one at a time until one of a designated set of synchronized token is found. The synchronizing tokens are usually delimiters, such as semicolon or end. A set of tokens marking the end of language constructs is define to make a synchronizing set { e.g. ‘;’ , ‘}’ } Discard tokens until a “synchro” token is found ( end, “;”, “}”, etc. ) -- Decision of designer -- Problems: skip input miss declaration – causing more errors miss errors in skipped material -- Advantages: simple suited to 1 error per statement, not to go into an infinite loop. CS 701 Language Processor

Compiler May think this as error Phrase Level In this strategy the parser makes some local corrections on the remaining input on detection of an error, so that the resulting input stream give a valid construct of the language. [Example: Replacement of a , by ; or delete an extra ; or insert a missing ;] However, inserting a new character should be done carefully so that extraneous errors are not introduced and also the parsing algorithms can proceed without any problem. Local correction on input -- “,” ”;” -- Also decision of designer -- Not suited to all situations -- Used in conjunction with panic mode to allow less input to be skipped Drawbacks: It has the difficulty in coping with situations in which actual error has occurred before the point of detection. Like For(i=0;j=0, i<10; j<10, i++, j++) Compiler May think this as error CS 701 Language Processor

Error Productions: These involves modifying the grammar of the language to include error situations. In this case the compiler designer has a very good idea about the possible types of errors so that he can modify the grammar appropriately. The erroneous input program is thus, valid for new grammar augmented by error productions so that parser can always proceed. CS 701 Language Processor

Global Correction: This is rather a theoretical approach. Given an incorrect input stream x for grammar G, find another stream y, acceptable by G, so that the number of tokens to be modified to convert x into y is the minimum. Approach is very costly and also not very practical. CS 701 Language Processor

Motivating Grammars Regular Expressions  Basis of lexical analysis lec02-parserCFG July 30, 2018 Motivating Grammars Regular Expressions  Basis of lexical analysis  Represent regular languages Context Free Grammars  Basis of parsing  Represent language constructs  Characterize context free languages CS 701 Language Processor

Context-Free Grammars Inherently recursive structures of a programming language are defined by a context-free grammar. In a context-free grammar, we have: A finite set of terminals (in our case, this will be the set of tokens) A finite set of non-terminals (syntactic-variables) A finite set of productions rules in the following form A   where A is a non-terminal and  is a string of terminals and non-terminals (including the empty string) A start symbol (one of the non-terminal symbol) Example: E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id CS 701 Language Processor

CFG - Terminology L(G) is the language of G (the language generated by G) which is a set of sentences. A sentence of L(G) is a string of terminal symbols of G. If S is the start symbol of G then  is a sentence of L(G) iff S   where  is a string of terminals of G. If G is a context-free grammar, L(G) is a context-free language. Two grammars are equivalent if they produce the same language. S   - If  contains non-terminals, it is called as a sentential form of G. - If  does not contain non-terminals, it is called as a sentence of G. + * CS 701 Language Processor

Context Free Grammars : Concepts & Terminology lec02-parserCFG July 30, 2018 Context Free Grammars : Concepts & Terminology Definition: A Context Free Grammar, CFG, is described by T, NT, S, PR, where: T: Terminals / tokens of the language NT: Non-terminals to denote sets of strings generated by the grammar & in the language S: Start symbol, SNT, which defines all strings of the language PR: Production rules to indicate how T and NT are combined to generate valid strings of the language. PR: NT  (T | NT)* Like a Regular Expression / DFA / NFA, a Context Free Grammar is a mathematical model CS 701 Language Processor

Example : Consider a grammar to generate aruthmatic expressions consisting of numbers and operator synbols +,-,*,/,and ↑.The rules of the grammar can be written as, E  E + E | E – E | E * E | E / E | - E E  ( E ) E  id We can apply the rules to derive the expression “2*(3+5*4)” as follows: E  E*E  E*(E)  E*(E+E)  E*(E+E*E)  E*(E+E*4)  E*(E+E*4)  EA(EA5*4)  EA(E+5*4)  EA(3+5*4)  E*(3+5*4)  2*(3+5*4) CS 701 Language Processor

Example Grammar - Terminology lec02-parserCFG July 30, 2018 Example Grammar - Terminology Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings Non Terminals: A,B,C,S, black strings T or NT: X,Y,Z Strings of Terminals: u,v,…,z in T* Strings of T / NT:  ,  in ( T  NT)* Alternatives of production rules: A 1; A 2; …; A k;  A  1 | 2 | … | 1 First NT on LHS of 1st production rule is designated as start symbol ! E  E A E | ( E ) | -E | id A  + | - | * | / |  CS 701 Language Processor

Example Expressions: ... or equivalently: Nonterminals: E T F lec02-parserCFG July 30, 2018 Example Expressions: E ::= E + T | E - T | T T ::= T * F | T / F | F F ::= num | id Nonterminals: E T F Start symbol: E Terminals: + - * / id num Example: x+2*y ... or equivalently: E ::= E + T E ::= E - T E ::= T T ::= T * F T ::= T / F T ::= F F ::= num F ::= id CS 701 Language Processor

Derivations E  E+E E+E derives from E E  E+E  id+E  id+id we can replace E by E+E to able to do this, we have to have a production rule EE+E in our grammar. E  E+E  id+E  id+id A sequence of replacements of non-terminal symbols is called a derivation of id+id from E. In general a derivation step is A   if there is a production rule A in our grammar where  and  are arbitrary strings of terminal and non-terminal symbols 1  2  ...  n (n derives from 1 or 1 derives n )  : derives in one step  : derives in zero or more steps  : derives in one or more steps * + CS 701 Language Processor

Derivation Example E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) OR E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) At each derivation step, we can choose any of the non-terminal in the sentential form of G for the replacement. If we always choose the left-most non-terminal in each derivation step, this derivation is called as left-most derivation. If we always choose the right-most non-terminal in each derivation step, this derivation is called as right-most derivation. CS 701 Language Processor

Left-Most and Right-Most Derivations Left-Most Derivation E  -E  -(E)  -(E+E)  -(id+E)  -(id+id) Right-Most Derivation E  -E  -(E)  -(E+E)  -(E+id)  -(id+id) We will see that the top-down parsers try to find the left-most derivation of the given source program. We will see that the bottom-up parsers try to find the right-most derivation of the given source program in the reverse order. lm lm lm lm lm rm rm rm rm rm CS 701 Language Processor

Parse Tree Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols. A parse tree can be seen as a graphical representation of a derivation. E  -E E -  -(E) E - ( ) E + - ( )  -(E+E) E + - ( ) id E id + - ( )  -(id+E)  -(id+id) CS 701 Language Processor

Examples of LM / RM Derivations lec02-parserCFG July 30, 2018 Examples of LM / RM Derivations E  E A E | ( E ) | -E | id A  + | - | * | / |  A leftmost derivation of : id + id * id A rightmost derivation of : id + id * id CS 701 Language Processor

Derivations & Parse Tree lec02-parserCFG July 30, 2018 Derivations & Parse Tree E A E  E A E E A *  E * E E A id *  id * E E A id *  id * id CS 701 Language Processor

Parse Trees and Derivations lec02-parserCFG July 30, 2018 Parse Trees and Derivations Consider the expression grammar: E  E+E | E*E | (E) | -E | id Leftmost derivations of id + id * id E + E  id + E E + id E + E  E + E E * id + E  id + E * E + id CS 701 Language Processor

Parse Tree & Derivations - continued lec02-parserCFG July 30, 2018 Parse Tree & Derivations - continued E * id + E * E  id + id * E + id id + id * E  id + id * id E * + id CS 701 Language Processor

Alternative Parse Tree & Derivation lec02-parserCFG July 30, 2018 Alternative Parse Tree & Derivation E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E + * id WHAT’S THE ISSUE HERE ? Two distinct leftmost derivations! CS 701 Language Processor

Resolving Grammar Problems/Difficulties lec02-parserCFG July 30, 2018 Resolving Grammar Problems/Difficulties Regular Expressions : Basis of Lexical Analysis Reg. Expr.  generate/represent regular languages Reg. Languages  smallest, most well defined class of languages Context Free Grammars: Basis of Parsing CFGs  represent context free languages CFLs  contain more powerful languages Reg. Lang. CFLs CS 701 Language Processor

Regular Expressions vs. CFGs lec02-parserCFG July 30, 2018 Regular Expressions vs. CFGs Regular expressions for lexical syntax CFGs are overkill, lexical rules are quite simple and straightforward REs – concise / easy to understand More efficient lexical analyzer can be constructed RE for lexical analysis and CFGs for parsing promotes modularity, low coupling & high cohesion. CS 701 Language Processor

Resolving Problems/Difficulties – (2) lec02-parserCFG July 30, 2018 Resolving Problems/Difficulties – (2) Since Reg. Lang.  Context Free Lang., it is possible to go from reg. expr. to CFGs via NFA. Recall: (a | b)*abb start 3 b 2 1 a CS 701 Language Processor

Resolving Problems/Difficulties – (3) lec02-parserCFG July 30, 2018 Resolving Problems/Difficulties – (3) Construct CFG as follows: Each State I has non-terminal Ai : A0, A1, A2, A3 If then Ai a Aj If then Ai bAj If I is an accepting state, Ai  : A3   If I is a starting state, Ai is the start symbol : A0 i j a : A0 aA0, A0 aA1 : A0 bA0, A1 bA2 : A2 bA3 i j b T={a,b}, NT={A0, A1, A2, A3}, S = A0 PR ={ A0 aA0 | aA1 | bA0 ; A1  bA2 ; A2  bA3 ; A3   } start 3 b 2 1 a CS 701 Language Processor

How Does This CFG Derive Strings ? lec02-parserCFG July 30, 2018 How Does This CFG Derive Strings ? start 3 b 2 1 a vs. A0 aA0, A0 aA1 A0 bA0, A1 bA2 A2 bA3, A3  How is abaabb derived in each ? CS 701 Language Processor

Ambiguity A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. E + id * E  E+E  id+E  id+E*E  id+id*E  id+id*id E id + * E  E*E  E+E*E  id+E*E  id+id*E  id+id*id CS 701 Language Processor

Ambiguity if a>b then if c>d then x = y else x = z stmt  if condition then stmt else stmt | if condition then stmt if a>b then if c>d then x = y else x = z stmt if condition then stmt a>b stmt if condition then stmt else stmt x=z if condition then stmt else stmt c>d x=y a>b x=z if condition then stmt x=y c>d CS 701 Language Processor

Ambiguity (cont.) For the most parsers, the grammar must be unambiguous. unambiguous grammar  unique selection of the parse tree for a sentence We should eliminate the ambiguity in the grammar during the design phase of the compiler. An unambiguous grammar should be written to eliminate the ambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice. CS 701 Language Processor

Ambiguity (cont.) stmt stmt 1 2 stmt  if expr then stmt | if expr then stmt else stmt | otherstmts if E1 then if E2 then S1 else S2 stmt if expr then stmt else stmt E1 if expr then stmt S2 E2 S1 stmt if expr then stmt E1 if expr then stmt else stmt E2 S1 S2 1 2 CS 701 Language Processor

Ambiguity (cont.) We prefer the second parse tree (else matches with closest if). So, we have to disambiguate our grammar to reflect this choice. To remove ambiguity following general rule is followed ‘match each else with the closest previous unmatched then’ The unambiguous grammar will be: stmt  matchedstmt | unmatchedstmt matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts unmatchedstmt  if expr then stmt | if expr then matchedstmt else unmatchedstmt CS 701 Language Processor

Ambiguity – Operator Precedence Ambiguous grammars (because of ambiguous operators) can be disambiguated according to the precedence and associativity rules. E  E+E | E*E | E^E | id | (E) disambiguate the grammar precedence: ^ (right to left) * (left to right) + (left to right) E  E+T | T T  T*F | F F  G^F | G G  id | (E)  CS 701 Language Processor

Left Recursion A grammar is left recursive if it has a non-terminal A such that there is a derivation. A  A for some string  Top-down parsing techniques cannot handle left-recursive grammars. So, we have to convert our left-recursive grammar into an equivalent grammar which is not left-recursive. The left-recursion may appear in a single step of the derivation (immediate left-recursion), or may appear in more than one step of the derivation. + CS 701 Language Processor

Removal of Immediate Left-Recursion (Moore’s Proposal) A  A  |  where  does not start with A  eliminate immediate left recursion A   A’ A’   A’ |  an equivalent grammar In general, A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A  eliminate immediate left recursion A  1 A’ | ... | n A’ A’  1 A’ | ... | m A’ |  an equivalent grammar CS 701 Language Processor

Immediate Left-Recursion -- Example E  E+T | T T  T*F | F F  id | (E)  eliminate immediate left recursion E  T E’ E’  +T E’ |  T  F T’ T’  *F T’ |  F  id | (E) CS 701 Language Processor

Left-Recursion -- Problem A grammar cannot be immediately left-recursive, but it still can be left-recursive. By just eliminating the immediate left-recursion, we may not get a grammar which is not left-recursive. S  Aa | b A  Sc | d This grammar is not immediately left-recursive, but it is still left-recursive. S  Aa  Sca or A  Sc  Aac causes to a left-recursion So, we have to eliminate all left-recursions from our grammar CS 701 Language Processor

Eliminate Left-Recursion -- Algorithm - Arrange non-terminals in some order: A1 ... An - for i from 1 to n do { - for j from 1 to i-1 do { replace each production Ai  Aj  by Ai  1  | ... | k  where Aj  1 | ... | k } - eliminate immediate left-recursions among Ai productions CS 701 Language Processor

Eliminate Left-Recursion -- Example S  Aa | b A  Ac | Sd | f - Order of non-terminals: S, A for S: - we do not enter the inner loop. - there is no immediate left recursion in S. for A: - Replace A  Sd with A  Aad | bd So, we will have A  Ac | Aad | bd | f - Eliminate the immediate left-recursion in A A  bdA’ | fA’ A’  cA’ | adA’ |  So, the resulting equivalent grammar which is not left-recursive is: CS 701 Language Processor

Eliminate Left-Recursion – Example2 S  Aa | b A  Ac | Sd | f - Order of non-terminals: A, S for A: - we do not enter the inner loop. - Eliminate the immediate left-recursion in A A  SdA’ | fA’ A’  cA’ |  for S: - Replace S  Aa with S  SdA’a | fA’a So, we will have S  SdA’a | fA’a | b - Eliminate the immediate left-recursion in S S  fA’aS’ | bS’ S’  dA’aS’ |  So, the resulting equivalent grammar which is not left-recursive is: CS 701 Language Processor

Left-Factoring A predictive parser (a top-down parser without backtracking) insists that the grammar must be left-factored. grammar  a new equivalent grammar suitable for predictive parsing stmt  if expr then stmt else stmt | if expr then stmt when we see if, we cannot now which production rule to choose to write stmt in the derivation. CS 701 Language Processor

Left-Factoring (cont.) In general, A  1 | 2 where  is non-empty and the first symbols of 1 and 2 (if they have one)are different. when processing  we cannot know whether expand A to 1 or A to 2 But, if we re-write the grammar as follows A  A’ A’  1 | 2 so, we can immediately expand A to A’ CS 701 Language Processor

Left-Factoring -- Algorithm For each non-terminal A with two or more alternatives (production rules) with a common non-empty prefix, let say A  1 | ... | n | 1 | ... | m convert it into A  A’ | 1 | ... | m A’  1 | ... | n CS 701 Language Processor

Left-Factoring – Example1 A  abB | aB | cdg | cdeB | cdfB  A  aA’ | cdg | cdeB | cdfB A’  bB | B A  aA’ | cdA’’ A’’  g | eB | fB CS 701 Language Processor

Left-Factoring – Example2 A  ad | a | ab | abc | b  A  aA’ | b A’  d |  | b | bc A’  d |  | bA’’ A’’   | c CS 701 Language Processor