Compilation 0368-3133 Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Compilation (Semester A, 2013/14) Lecture 6a: Syntax (Bottom–up parsing) Noam Rinetzky 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lecture 02 – Lexical Analysis Eran Yahav 1. 2 You are here Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1.
Scanning with Jflex.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Lexical Analyzer (Checker)
Compilation (Semester A, 2013/14) Lecture 2: Lexical Analysis Modern Compiler Design: Chapter 2.1 Noam Rinetzky 1.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Lexical Analysis: Regular Expressions CS 671 January 22, 2008.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CPS 506 Comparative Programming Languages Syntax Specification.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
CSc 453 Lexical Analysis (Scanning)
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Scanner Introduction to Compilers 1 Scanner.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
The Role of Lexical Analyzer
Lexical Analysis.
1st Phase Lexical Analysis
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Prof. Necula CS 164 Lecture 31 Lexical Analysis Lecture 3-4.
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
CS 3304 Comparative Languages
CS510 Compiler Lecture 2.
Lecture 2 Lexical Analysis
Chapter 3 Lexical Analysis.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lecture 2: Lexical Analysis Noam Rinetzky
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
Finite-State Machines (FSMs)
Lexical Analysis - An Introduction
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Other Issues - § 3.9 – Not Discussed
Lexical Analysis - An Introduction
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Compilation Lecture 2: Lexical Analysis Syntax Analysis (1): CFLs, CFGs, PDAs Noam Rinetzky 1

2

3

What is a Compiler? source languagetarget language Compiler Executable code exe Source text txt 4

Compiler vs. Interpreter 5

Toy compiler 6

The Real Anatomy of a Compiler 7

Executable code exe Source text txt Lexical Analysis Sem. Analysis Process text input characters Syntax Analysis tokens AST Intermediate code generation Annotated AST Intermediate code optimization IR Code generation IR Target code optimization Symbolic Instructions SI Machine code generation Write executable output MI 8 Lexical Analysis Syntax Analysis

Lexical Analysis 9 Lexical analyzers are also known as “scanners’’

Lexical Analysis: from Text to Tokens Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. x = b*b – 4*a*c txt Token Stream 10

Scan the input Partitions the text into stream of tokens –Numbers –Identifiers –Keywords –Punctuation Tokens usually represented as (kind, value) Defined using regular expressions * “word” in the source language “meaningful” to the syntactical analysis What does Lexical Analysis do? 11

What does Lexical Analysis do? Language: fully parenthesized expressions Context free language Regular languages ( ( ) * 19 ) Expr  Num | LP Expr Op Expr RP Num  Dig | Dig Num Dig  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP  ‘(’ RP  ‘)’ Op  ‘+’ | ‘*’ 12

What does Lexical Analysis do? Language: fully parenthesized expressions Context free language ( ( ) * 19 ) 13 Regular languages Expr  Num | LP Expr Op Expr RP Num  Dig | Dig Num Dig  ‘0’|‘1’|‘2’|‘3’|‘4’|‘5’|‘6’|‘7’|‘8’|‘9’ LP  ‘(’ RP  ‘)’ Op  ‘+’ | ‘*’ LP LP Num Op Num RP Op Num RP

From scanning to parsing ((23 + 7) * x) )?*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: Expr ... | Id Id  ‘a’ |... | ‘z’ Op(*) Id(?) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error 14

Some basic terminology Lexeme (aka symbol) - a series of letters separated from the rest of the program according to a convention (space, semi-column, comma, etc.) Pattern - a rule specifying a set of strings. Example: “an identifier is a string that starts with a letter and continues with letters and digits” –(Usually) a regular expression Token - a pair of (pattern, attributes) 15

Example void match0(char *s) /* find a zero */ { if (!strncmp(s, “0.0”, 3)) return 0. ; } VOID ID(match0) LPAREN CHAR DEREF ID(s) RPAREN LBRACE IF LPAREN NOT ID(strncmp) LPAREN ID(s) COMMA STRING(0.0) COMMA NUM(3) RPAREN RPAREN RETURN REAL(0.0) SEMI RBRACE EOF 16

Regular languages Formal languages –Σ = finite set of letters –Word = sequence of letter –Language = set of words Regular languages defined equivalently by –Regular expressions –Finite-state automata 17

Example: Integers w/o Leading Zeros Digit= 1|2|…|9 Digit0 = 0|Digit Pos= Digit Digit0* Integer = 0 | Pos| -Pos 18

Challenge: Ambiguity If = if Id = Letter (Letter | Digit)* “ if ” is a valid identifiers… what should it be? ‘’ iffy ” is also a valid identifier Solution –Longest matching token –Break ties using order of definitions… Keywords should appear before identifiers 19

Building a Scanner – Take I Input: String Output: Sequence of tokens 20

Building a Scanner – Take I Token nextToken() { char c ; loop: c = getchar(); switch (c){ case ` `: goto loop ; case `;`: return SemiColumn; case `+`: c = getchar() ; switch (c) { case `+': return PlusPlus ; case '=’ return PlusEqual; default: ungetc(c); return Plus; }; case `<`: … case `w`: … } 21

There must be a better way! 22

A better way Automatically generate a scanner Define tokens using regular expressions Use finite-state automata for detection 23

Reg-exp vs. automata Regular expressions are declarative –Good for humans –Not “executable” Automata are operative –Define an algorithm for deciding whether a given word is in a regular language –Not a natural notation for humans 24

Overview Define tokens using regular expressions Construct a nondeterministic finite-state automaton (NFA) from regular expression Determinize the NFA into a deterministic finite-state automaton (DFA) DFA can be directly used to identify tokens 25

Automata theory: a bird’s-eye view 26

Deterministic Automata (DFA) M = ( , Q, , q 0, F) –  - alphabet –Q – finite set of state –q 0  Q – initial state –F  Q – final states –δ : Q    Q - transition function For a word w, M reach some state x –M accepts w if x  F 27

DFA in pictures start a b,c a,b c accepting state start state transition An automaton is defined by states and transitions 28 a,b,c

Accepting Words Words are read left-to-right cba start a b c 29 Missing transition = non-acceptance –“Stuck state”

Words are read left-to-right Accepting Words cba start a b c 30

Words are read left-to-right Accepting Words cba start a b c 31

Words are read left-to-right Accepting Words cba start a b c 32

Rejecting Words cbb start a b c 33 Words are read left-to-right

start Rejecting Words Missing transition means non-acceptance cbb a b c 34

Non-deterministic Automata (NFA) M = ( , Q, , q 0, F) –  - alphabet –Q – finite set of state –q 0  Q – initial state –F  Q – final states –δ : Q  (   {  }) → 2 Q - transition function DFA: δ : Q    Q For a word w, M can reach a number of states X –M accepts w if X ∩ M ≠ {} Possible: X = {} Possible  -transitions 35

NFA Allow multiple transitions from given state labeled by same letter start a a b c c b 36

Accepting words cba start a a b c c b 37

Accepting words Maintain set of states cba start a a b c c b 38

Accepting words cba start a a b c c b 39

Accepting words Accept word if reached an accepting state cba start a a b c c b 40

NFA+Є automata Є transitions can “fire” without reading the input Є start a b c 41

NFA+Є run example cba Є start a b c 42

NFA+Є run example Now Є transition can non-deterministically take place cba Є start a b c 43

NFA+Є run example cba Є start a b c 44

NFA+Є run example cba Є start a b c 45

NFA+Є run example cba Є start a b c 46 Є transitions can “fire” without reading the input

NFA+Є run example cba Word accepted Є start a b c 47

From regular expressions to NFA Step 1: assign expression names and obtain pure regular expressions R 1 …R m Step 2: construct an NFA M i for each regular expression R i Step 3: combine all M i into a single NFA Ambiguity resolution: prefer longest accepting word 48

From reg. exp. to automata Theorem: there is an algorithm to build an NFA+Є automaton for any regular expression Proof: by induction on the structure of the regular expression start 49

R =  R =  R = a a Basic constructs 50

Composition R = R1 | R2  M1 M2    R = R1R2  M1M2  51

Repetition R = R1*  M1    52

53

Naïve approach Try each automaton separately Given a word w: –Try M 1 (w) –Try M 2 (w) –… –Try M n (w) Requires resetting after every attempt 54

Actually, we combine automata 1 2 a a 3 a 4 b 5 b 6 abb 7 8 b a*b+ b a 9 a 10 b 11 a 12 b 13 abab 0     a abb a*b+ abab combines 55

Corresponding DFA b a a a b b b a b b abb a*b+ abab a b 56

Scanning with DFA Run until stuck –Remember last accepting state Go back to accepting state Return token 57

Ambiguity resolution Longest word Tie-breaker based on order of rules when words have same length 58

Examples b a a a b b b a b b abb a*b+ abab a b abaa: gets stuck after aba in state 12, backs up to state (5 8 11) pattern is a*b+, token is ab Tokens: 59

Examples b a a a b b b a b b abb a*b+ abab a b abba: stops after second b in (6 8), token is abb because it comes first in spec 60 Tokens:

Summary of Construction Describe tokens as regular expressions –Decide attributes (values) to save for each token Regular expressions turned into a DFA –Also, records which attributes (values) to keep Lexical analyzer simulates the run of an automata with the given transition table on any input string 61

A Few Remarks Turning an NFA to a DFA is expensive, but –Exponential in the worst case – In practice, works fine The construction is done once per-language –At Compiler construction time –Not at compilation time 62

Implementation 63

Implementation by Example if{ return IF; } [a-z][a-z0-9]*{ return ID; } [0-9]+ { return NUM; } [0-9]”.”[0-9]+|[0-9]*”.”[0-9]+{ return REAL; } (\-\-[a-z]*\n)|(“ “|\n|\t){ ; }.{ error(); } 64 if xy, i, zs98 3,32, , comm\n \n, \t, “ “ ID IF IDerror REAL NUM REAL error w.s. error w.s

int edges[][256]= { /* …, 0, 1, 2, 3,..., -, e, f, g, h, i, j,... */ /* state 0 */ {0, …, 0, 0, …, 0, 0, 0, 0, 0,..., 0, 0, 0, 0, 0, 0}, /* state 1 */ {13, …, 7, 7, 7, 7, …, 9, 4, 4, 4, 4, 2, 4, …, 13, 13}, /* state 2 */ {0, …, 4, 4, 4, 4,..., 0, 4, 3, 4, 4, 4, 4, …, 0, 0}, /* state 3 */ {0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4,, 0, 0}, /* state 4 */{0, …, 4, 4, 4, 4, …, 0, 4, 4, 4, 4, 4, 4, …, 0, 0}, /* state 5 */{0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0}, /* state 6 */ {0, …, 6, 6, 6, 6, …, 0, 0, 0, 0, 0, 0, 0,..., 0, 0}, /* state 7 */ /* state … */... /* state 13 */{0, …, 0, 0, 0, 0, …, 0, 0, 0, 0, 0, 0, 0, …, 0, 0} }; 65 ID IF IDerror REAL NUM REAL error w.s. error w.s

Pseudo Code for Scanner char* input = … ; Token nextToken() { lastFinal = 0; currentState = 1 ; inputPositionAtLastFinal = input; currentPosition = input; while (not(isDead(currentState))) { nextState = edges[currentState][*currentPosition]; if (isFinal(nextState)) { lastFinal = nextState ; inputPositionAtLastFinal = currentPosition; } currentState = nextState; advance currentPosition; } input = inputPositionAtLastFinal + 1; return action[lastFinal]; } 66

Example Input: “if --not-a-com” 67 2 blanks ID IF IDerror REAL NUM REAL error w.s. error w.s

finalstateinput 01if --not-a-com return IF 68 ID IF IDerror REAL NUM REAL error w.s. error w.s

found whitespace finalstateinput 01 --not-a-com 12 --not-a-com 12 --not-a-com not-a-com 69

finalstateinput 01--not-a-com not-a-com 910--not-a-com 910--not-a-com 90 error 70 ID IF IDerror REAL NUM REAL error w.s. error w.s

finalstateinput 01-not-a-com error 71 ID IF IDerror REAL NUM REAL error w.s. error w.s

Concluding remarks Efficient scanner Minimization Error handling Automatic creation of lexical analyzers 72

Efficient Scanners Efficient state representation Input buffering Using switch and gotos instead of tables 73

Minimization Create a non-deterministic automaton (NDFA) from every regular expression Merge all the automata using epsilon moves (like the | construction) Construct a deterministic finite automaton (DFA) –State priority Minimize the automaton –separate accepting states by token kinds 74

Example if { return IF; } [a-z][a-z0-9]* { return ID; } [0-9]+ { return NUM; } 75 Modern compiler implementation in ML, Andrew Appel, (c)1998, Figures 2.7,2.8 ID IF error NUM

Example if { return IF; } [a-z][a-z0-9]* { return ID; } [0-9]+ { return NUM; } 76 Modern compiler implementation in ML, Andrew Appel, (c)1998, Figures 2.7,2.8 ID IF error NUM ID NUM ID IF error NUM

Example 77 ID IF error NUM ID IF error NUM ID NUM ID if { return IF; } [a-z][a-z0-9]* { return ID; } [0-9]+ { return NUM; } ID IF error NUM ID IF NUM error Modern compiler implementation in ML, Andrew Appel, (c)1998, Figures 2.7,2.8

Example 78 if { return IF; } [a-z][a-z0-9]* { return ID; } [0-9]+ { return NUM; } ID IF error NUM ID IF NUM error Modern compiler implementation in ML, Andrew Appel, (c)1998, Figures 2.7,2.8

Error Handling Many errors cannot be identified at this stage Example: “fi (a==f(x))”. Should “fi” be “if”? Or is it a routine name? –We will discover this later in the analysis –At this point, we just create an identifier token Sometimes the lexeme does not match any pattern –Easiest: eliminate letters until the beginning of a legitimate lexeme –Alternatives: eliminate/add/replace one letter, replace order of two adjacent letters, etc. Goal: allow the compilation to continue Problem: errors that spread all over 79

Automatically generated scanners Use of Program-Generating Tools –Specification  Part of compiler –Compiler-Compiler Stream of tokens JFlex regular expressions input program scanner 80

Use of Program-Generating Tools Input: regular expressions and actions Action = Java code Output: a scanner program that Produces a stream of tokens Invoke actions when pattern is matched Stream of tokens JFlex regular expressions input program scanner 81

Line Counting Example Create a program that counts the number of lines in a given input text file 82

Creating a Scanner using Flex int num_lines = 0; % \n ++num_lines;. ; % main() { yylex(); printf( "# of lines = %d\n", num_lines); } 83

Creating a Scanner using Flex initial other newline \n ^\n int num_lines = 0; % \n ++num_lines;. ; % main() { yylex(); printf( "# of lines = %d\n", num_lines); } 84

JFLex Spec File User code: Copied directly to Java file % JFlex directives: macros, state names % Lexical analysis rules: –Optional state, regular expression, action –How to break input to tokens –Action when token matched Possible source of javac errors down the road DIGIT= [0-9] LETTER= [a-zA-Z] YYINITIAL {LETTER} ({LETTER}|{DIGIT})* 85

Creating a Scanner using JFlex import java_cup.runtime.*; % %cup %{ private int lineCounter = 0; %} %eofval{ System.out.println("line number=" + lineCounter); return new Symbol(sym.EOF); %eofval} NEWLINE=\n % {NEWLINE} { lineCounter++; } [^{NEWLINE}] { } 86

Catching errors What if input doesn’t match any token definition? Trick: Add a “catch-all” rule that matches any character and reports an error –Add after all other rules 87

A JFlex specification of C Scanner import java_cup.runtime.*; % %cup %{ private int lineCounter = 0; %} Letter= [a-zA-Z_] Digit= [0-9] % ”\t” { } ”\n” { lineCounter++; } “;” { return new Symbol(sym.SemiColumn);} “++” { return new Symbol(sym.PlusPlus); } “+=” { return new Symbol(sym.PlusEq); } “+” { return new Symbol(sym.Plus); } “while”{ return new Symbol(sym.While); } {Letter}({Letter}|{Digit})* { return new Symbol(sym.Id, yytext() ); } “<=” { return new Symbol(sym.LessOrEqual); } “<” { return new Symbol(sym.LessThan); } 88

Missing Creating a lexical analysis by hand Table compression Symbol Tables Nested Comments Handling Macros 89

Lexical Analysis: What Input: program text (file) Output: sequence of tokens 90

Lexical Analysis: How Define tokens using regular expressions Construct a nondeterministic finite-state automaton (NFA) from regular expression Determinize the NFA into a deterministic finite-state automaton (DFA) DFA can be directly used to identify tokens 91

Lexical Analysis: Why Read input file Identify language keywords and standard identifiers Handle include files and macros Count line numbers Remove whitespaces Report illegal symbols [Produce symbol table] 92

Syntax Analysis (1) Context Free Languages Context Free Grammars Pushdown Automata 93

The Real Anatomy of a Compiler Executable code exe Source text txt Lexical Analysis Sem. Analysis Process text input characters Syntax Analysis tokens AST Intermediate code generation Annotated AST Intermediate code optimization IR Code generation IR Target code optimization Symbolic Instructions SI Machine code generation Write executable output MI 94 Lexical Analysis Syntax Analysis

Frontend: Scanning & Parsing ((23 + 7) * x) )x*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: E ... | Id Id  ‘a’ |... | ‘z’ Op(*) Id(b) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error 95

From scanning to parsing ((23 + 7) * x) )x*)7+23(( RPIdOPRPNumOPNumLP Lexical Analyzer program text token stream Parser Grammar: E ... | Id Id  ‘a’ |... | ‘z’ Op(*) Id(b) Num(23)Num(7) Op(+) Abstract Syntax Tree valid syntax error 96

Parsing Construct a structured representation of the input Challenges –How do you describe the programming language? –How do you check validity of an input? Is a sequence of tokens a valid program in the language? –How do you construct the structured representation? –Where do you report an error? 97

Some foundations 98

Context free languages (CFLs) L 01 = { 0 n 1 n | n > 0 } L polyndrom = {pp’ | p ∊ Σ*, p’=reverse(p)} L polyndrom# = {p#p’ | p ∊ Σ*, p’=reverse(p), # ∉ Σ} 99

Context free grammars (CFG) V – non terminals (syntactic variables) T – terminals (tokens) P – derivation rules –Each rule of the form V  (T  V)* S – start symbol G = (V,T,P,S) 100

What can CFGs do? Recognize CFLs S  0T1 T  0T1 | ℇ 101

~ language-defining power Recognizing CFLs Context Free Grammars (CFG) Nondeterministic push down automata (PDA) 102

Pushdown Automata (PDA) Nondeterministic PDAs define all CFLs Deterministic PDAs model parsers. –Most programming languages have a deterministic PDA –Efficient implementation 103

Intuition: PDA An ε-NFA with the additional power to manipulate one stack 104 stack X Y IF $ Top control (ε-NFA)

Intuition: PDA Think of an ε-NFA with the additional power that it can manipulate a stack PDA moves are determined by: –The current state (of its “ε-NFA”) –The current input symbol (or ε) –The current symbol on top of its stack 105

Intuition: PDA input stack if (oops) then stat:= blah else abort X Y IF $ Top Current control (ε-NFA) 106

Intuition: PDA Moves: –Change state –Replace the top symbol by 0…n symbols 0 symbols = “pop” (“reduce”) 0 < symbols = sequence of “pushes” (“shift”) Nondeterministic choice of next move 107

PDA Formalism PDA = (Q, Σ, Γ, δ, q 0, $, F): –Q: finite set of states –Σ: Input symbols alphabet –Γ: stack symbols alphabet –δ: transition function –q 0 : start state –$: start symbol –F: set of final states 108

The Transition Function δ(q, a, X) = { (p 1, σ 1 ), …,(p n, σ n )} –Input: triplet A state q ∊ Q An input symbol a ∊ Σ or ε A stack symbol X ∊ Γ –Output: set of 0 … k actions of the form (p, σ) A state p ∊ Q σ a sequence X 1 ⋯ X n ∊ Γ* of stack symbols 109

Actions of the PDA Say (p, σ) ∊ δ(q, a, X) –If the PDA is in state q and X is the top symbol and a is at the front of the input –Then it can Change the state to p. Remove a from the front of the input –(but a may be ε). Replace X on the top of the stack by σ. 110

Example: Deterministic PDA Design a PDA to accept {0 n 1 n | n > 1}. The states: –q = We have not seen 1 so far start state –p = we have seen at least one 1 and no 0s since –f = final state; accept. 111

Example: Stack Symbols $ = start symbol. –Also marks the bottom of the stack, –Indicates when we have counted the same number of 1’s as 0’s. X = “counter” –used to count the number of 0s we saw 112

Example: Transitions δ(q, 0, $) = {(q, X$)}. δ(q, 0, X) = {(q, XX)}. –These two rules cause one X to be pushed onto the stack for each 0 read from the input. δ(q, 1, X) = {(p, ε)}. –When we see a 1, go to state p and pop one X. δ(p, 1, X) = {(p, ε)}. –Pop one X per 1. δ(p, ε, $) = {(f, $)}. –Accept at bottom. 113

Actions of the Example PDA q $ 114

Actions of the Example PDA q X$X$

Actions of the Example PDA q XX$XX$

Actions of the Example PDA q XXX$XXX$

Actions of the Example PDA p XX$XX$

Actions of the Example PDA p X$X$

Actions of the Example PDA p $

Actions of the Example PDA f $

Example: Non Deterministic PDA A PDA that accepts palindromes –L {pp’ ∊ Σ* | p’=reverse(p)} 122

123