CS252: Systems Programming

Slides:



Advertisements
Similar presentations
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Advertisements

Exercise: Balanced Parentheses
Regular Expressions, Backus-Naur Form and Reverse Polish Notation.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Context-Free Grammars Lecture 7
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CS 153 A little bit about LR Parsing. Background We’ve seen three ways to write parsers:  By hand, typically recursive descent  Using parsing combinators.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
Introduction to Compiling
Syntax and Grammars.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
PZ03A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03A - Pushdown automata Programming Language Design.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Week 14 - Friday.  What did we talk about last time?  Simplifying FSAs  Quotient automata.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
CS 3304 Comparative Languages
Regular Expressions, Backus-Naur Form and Reverse Polish Notation
Chapter 3 – Describing Syntax
Describing Syntax and Semantics
CS510 Compiler Lecture 4.
Writing a Simple DSL Compiler with Delphi
Lexical and Syntax Analysis
Chapter 2 :: Programming Language Syntax
Syntax Specification and Analysis
Table-driven parsing Parsing performed by a finite state machine.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Automata and Languages What do these have in common?
Natural Language Processing - Formal Language -
Context-free Languages
PZ03A - Pushdown automata
CS416 Compiler Design lec00-outline September 19, 2018
Course 2 Introduction to Formal Languages and Automata Theory (part 2)
Compiler Design 4. Language Grammars
Lexical and Syntax Analysis
Introduction CI612 Compiler Design CI612 Compiler Design.
Programming Language Syntax 2
Jaya Krishna, M.Tech, Assistant Professor
Subject Name:Sysytem Software Subject Code: 10SCS52
CS 3304 Comparative Languages
Lecture 4: Lexical Analysis & Chomsky Hierarchy
CS 3304 Comparative Languages
CS416 Compiler Design lec00-outline February 23, 2019
Chapter 2 :: Programming Language Syntax
Chapter 3 Syntactic Analysis I.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Chapter 2 :: Programming Language Syntax
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Sub: Theoretical Foundations of Computer Sciences
CMPE 152: Compiler Design December 4 Class Meeting
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
COMPILER CONSTRUCTION
Pushdown automata Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Faculty of Computer Science and Information System
Presentation transcript:

CS252: Systems Programming Ninghui Li Topic 5: Parsing Prepared by Evan Hanau ehanau@purdue.edu

An Introduction to Parsing with Yacc Context-Free Grammars Yacc Parsing An example Infix Calculator Program

Context-Free Grammar Background: The Context-Free Grammar By CS252 you are already somewhat familiar with Regular Expressions. Regular expressions can be used to describe regular languages, which belong to a larger classification of language types.

Context-Free Grammar In CS, we classify languages on the Chomsky Hierarchy. Type-0 Recursively Enumerable Type-1 Context-Sensitive Type-2 Context-Free Type-3 Regular Type-(i) is a superset of Type-(i+1)

Context-Free Grammar Languages generated by regular expressions belong to type 3. Note: Your specific regular expression engine (e.g. POSIX extended RE) is likely capable of more complex productions. In any case, we need more than regular expressions to parse computer programming languages and shell scripts.

Context-Free Grammar You can do a great deal with regular expressions. Exercise: Create a regular expression that matches on any English phrase that is a palindrome, for instance the string “some men interpret nine memos”.

Context-Free Grammar This is in fact not possible with regex (by its strict CS definition!). You would be limited to palindromes of a finite length only. RE cannot express “anbn”, a string with some number of a’s followed by equal number of b’s The expression a*b* does not require number of a’s equal that of b’s We must use a context-free grammar to describe palindromes and other constructs. More powerful than a regular expression, and useful when some notion of “what came before” is required.

Backus-Naur form BNF or Backus-Naur form is used in CS to describe context-free grammars. It is often used to describe the syntax of programming languages.consists of one or more of the following: <nonterminal> ::= __expression__ Where __expression__ consists of one or more terminals and nonterminals or nothing (epsilon).

US Post Address in Backus-Naur form (from wikipedia) <postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL> <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL> <opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | "" <opt-apt-num> ::= <apt-num> | ""

Context-Free Grammar for Simple Expressions Let’s define a grammar for a primitive add or multiply expression: <expr> ::= <expr> * <expr> | <expr> + <expr> | number In this case, <expr> is a terminal and number, *, and + are the nonterminals.

Context-Free Grammar Clearly, there is some ambiguity here, because operator precedence (sometimes referred to as binding) is not defined. The grammar does not distinguish between 2+2*2+2 = 16 (incorrect under normal rules) or 2+2*2+2 = 8 (correct).

Context-Free Grammar One Solution: Define expressions of different levels: <expr> ::= <add_expr> <add_expr> ::= <add_expr> + <mul_expr> | <mul_expr> <mul_expr> ::= <mul_expr> * number | number Now, multiplication will bind tighter than addition (this may require a few sample expressions to wrap your head around!)

Context-Free Grammar Associativity follows from the above example (Hint: What side of the multiply and add operation did we have the “deeper” production on?)

CFG for palindrome <anbn> =  // empty string <palindrome> ::= letter |  // empty string <palindrome> ::= “a” <palindrome> “a” | “b” <palindrome> “b” …. Or, <palindrome> ::= letter <palindrome> letter However, we need to check the two letters are the same. CFG for anbn: <anbn> =  // empty string | “a” <anbn> “b”

Production rules (constraints) Chomsky Hierarchy (From Wikipedia Page) Grammar Languages Automaton Production rules (constraints) Type-0 Recursively enumerable Turing machine  (no restrictions) Type-1 Context-sensitive Linear-bounded non-deterministic Turing machine (equivalently, Right side no shorter than left) Type-2 Context-free Non-deterministic pushdown automaton Type-3 Regular Finite state automaton and

Chomsky Hierarchy Revisited. Type-0 Recursively Enumerable Type-1 Context-Sensitive Cannot encode all strings r1r2 such that r1 and r2 are two regular expressions that are equivalent Type-2 Context-Free (Pushdown Automaton, i.e, Finite State Automaton with a Stack) Can encode anbn, but not anbncn Type-3 Regular (Finite State Automaton) Can encode a*b*, but not anbn

Why is Context-Free Grammar Called Context Free? In a CFG, the left hand of each production is a single non-terminal, e.g., <palindrome> ::= “a” <palindrome> “a” This means that “a”, followed by a <palindrome>, and by “a” will always be considered as <palindrome>, no matter what is the context, hence context free. In a Context-Sensitive Grammar, left hand of production rules can include other things

An Example Context-Sensitive Grammar for anbncn 1. S  a B C 2. S  a S B C 3. B C  C B 4. a B  a b 5. b B  b b 6. b C  b c 7. c C  c c S→2 aSBC →1 aaBCBC →3 aaBBCC →4 aabBCC →5 aabbCC →6 aabbcC →7 aabbcc

Yacc & Parsing There are many ways to parse BNF grammars, most of which are discussed in a compilers course. Recall: A finite state automaton (FSA) is used for regular expressions. (CS182). For a context-free grammar, we use a pushdown automaton, which combines features of a FSA with a stack.

Yacc & Parsing Yacc generates what is known as a LALR parser, which is generated from the BNF grammar in your Yacc file. This parser is defined in the C source file that Yacc generates. We use Lex to make a lexer to generate our terminals, which are matched with regular expressions before being fed into the parser.

Yacc & Parsing Yacc is capable of generating a powerful parser that will handle many different grammars. Input Characters Rule-Based Behavior LEX Lexer terminals YACC Parser

Yacc & Parsing Recall that parsing combines a state machine with a stack. States go on a stack to keep track of where parsing is. Yacc uses a parse table which defines possible states. Yacc’s parser operates using two primary actions, shift and reduce. shift puts a state on the stack, reduce pops state(s) off the stack and reduces combinations of nonterminals and terminals to a single nonterminal. After a reduction to a rule, Yacc’s parser will optionally run some user-defined code.

Yacc & Parsing A very basic example: <rule> := “hello” “world” “\n” The parser would shift each word, successively pushing each state (.”hello”, .”world”, .”\n”) onto the stack. Then at the end of the rule, reduce everything to <rule> and pop the three states.

A Lex/Yacc Infix Calculator Yacc’s parser is powerful, but is not capable of parsing all grammars. Certain ambiguous grammars may produce what is known as a shift/reduce or reduce/reduce conflict. Yacc will, by default, shift instead of reduce.

Yacc & Parsing Consider the classic shift/reduce conflict example: <ifexp> ::= IF <expr> THEN <stmt> ELSE <stmt> | IF <expr> THEN <stmt> Yacc will have a shift/reduce conflict here, but will go with shift (the top option) by default. It’s greedy!

A Lex/Yacc Infix Calculator To demonstrate the utility of Lex and Yacc (or in our case, Flex and Bison) we provide an example infix calculator. Similar to several of the examples provided on the Lex and Yacc manpage at http://dinosaur.compilertools.net, but with added features

A Lex/Yacc Infix Calculator Make sure to read ALL source code comments, particularly those that describe source file organization. Lex definition file: calculator.l Yacc grammar file: calculator.y AST Classes: ast.cc Symbol table: symtab.cc

A Lex/Yacc Infix Calculator The example calculator application uses Lex and Yacc to parse mathematical expressions and produce an Abstract Syntax Tree, which is then used to evaluate those expressions. It allows the =, +, *, -, +, ^, () and unary minus operators, with appropriate levels of binding and precedence. Examine calculator.y, because it is heavily commented.

A Lex/Yacc Infix Calculator The symbol table (implemented here in simple O(n) access time) maps variables to values. Print the AST after every expression evaluation by running calculator with the –t flag, e.g. “calc –t”.

A Lex/Yacc Infix Calculator A calculator example. Type “2*2^3/3 and press enter: calc> 2*2^3/3 = 5.333333 3.000 / (/) \ (^) 2.000 (*)

Review What are required: Able to write simple Context Free Grammars, similar to those used in implementing FIZ Able to determine whether a string of tokens is accepted by a grammar Able to show how a string of tokens is parsed into some non-terminal (i.e., draw the parsing tree)