1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.

Slides:



Advertisements
Similar presentations
Finite-State Machines with No Output Ying Lu
Advertisements

4b Lexical analysis Finite Automata
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
1 Foundations of Software Design Lecture 24: Compilers, Lexers, and Parsers; Intro to Graphs Marti Hearst Fall 2002.
Context-Free Grammars Lecture 7
Prof. Bodik CS 164 Lecture 81 Grammars and ambiguity CS164 3:30-5:00 TT 10 Evans.
January 14, 2015CS21 Lecture 51 CS21 Decidability and Tractability Lecture 5 January 14, 2015.
1 Foundations of Software Design Lecture 22: Regular Expressions and Finite Automata Marti Hearst Fall 2002.
Specifying Languages CS 480/680 – Comparative Languages.
Lecture 9UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 9.
COP4020 Programming Languages
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
EECS 6083 Intro to Parsing Context Free Grammars
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
Grammars CPSC 5135.
PART I: overview material
Lecture # 9 Chap 4: Ambiguous Grammar. 2 Chomsky Hierarchy: Language Classification A grammar G is said to be – Regular if it is right linear where each.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Context Free Grammars CFGs –Add recursion to regular expressions Nested constructions –Notation expression  identifier | number | - expression | ( expression.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
A Programming Languages Syntax Analysis (1)
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Syntax Analysis - Parsing Compiler Design Lecture (01/28/98) Computer Science Rensselaer Polytechnic.
Grammars Hopcroft, Motawi, Ullman, Chap 5. Grammars Describes underlying rules (syntax) of programming languages Compilers (parsers) are based on such.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.
Department of Software & Media Technology
CS 3304 Comparative Languages
Chapter 3 – Describing Syntax
Introduction to Parsing
Context-Free Grammars: an overview
CS 404 Introduction to Compiler Design
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSE 3302 Programming Languages
Department of Software & Media Technology
Lexical and Syntax Analysis
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Lecture 7: Introduction to Parsing (Syntax Analysis)
CHAPTER 2 Context-Free Languages
CS 3304 Comparative Languages
4b Lexical analysis Finite Automata
COMPILER CONSTRUCTION
Presentation transcript:

1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002

2 Adapted from lecture by Ras Bodik, Outline Finite State Automata Why regular expressions are not enough Context Free Grammars Next Time: Relationship to Programming Languages and Compilers

3 Adapted from Jurafsky & Martin 2000 Three Equivalent Representations Finite automata Regular expressions Regular languages Each can describe the others Theorem: For every regular expression, there is a deterministic finite-state automaton that defines the same language, and vice versa.

4 Adapted from lecture by Ras Bodik, Finite Automata A FA is similar to a compiler in that: –A compiler recognizes legal programs in some (source) language. –A finite-state machine recognizes legal strings in some language. Example: Pascal Identifiers –sequences of one or more letters or digits, starting with a letter: letter letter | digit S A

5 Adapted from lecture by Ras Bodik, Finite-Automata State Graphs The start state An accepting state A transition a A state

6 Adapted from lecture by Ras Bodik, Finite Automata Transition s 1  a s 2 Is read In state s 1 on input “a” go to state s 2 If end of input –If in accepting state => accept –Otherwise => reject If no transition possible (got stuck) => reject FSA = Finite State Automata

7 Adapted from lecture by Ras Bodik, Language defined by FSA The language defined by a FSA is the set of strings accepted by the FSA. –in the language of the FSM shown below: x, tmp2, XyZzy, position27. –not in the language of the FSM shown below: 123, a?, 13apples. letter letter | digit S A

8 Adapted from lecture by Ras Bodik, Example: Integer Literals FSA that accepts integer literals with an optional + or - sign: Note – two different edges from S to A \(+|-)?[0-9]+\ + digit S B A -

9 Example: FSA that accepts three letter English words that begin with p and end with d or t. Here I use the convenient notation of making the state name match the input that has to be on the edge leading to that state. p t a o u d i

10 Adapted from lecture by Ras Bodik, Practice Write an automaton that accepts Java identifiers –One or more letters, digits, or underscores, starting with a letter or an underscore. –Start with the regexp

11 Adapted from lecture by Ras Bodik, Formal Definition A finite automaton is a 5-tuple (, Q, , q, F) where: –An input alphabet  –A set of states Q –A start state q –A set of accepting states F  Q – is the state transition function: Q x   Q (i.e., encodes transitions state  input state)

12 Adapted from lecture by Ras Bodik, How to Implement an FSA A table-driven approach: table: –one row for each state in the machine, and –one column for each possible character. Table[j][k] –which state to go to from state j on character k, –an empty entry corresponds to the machine getting stuck.

13 Adapted from lecture by Ras Bodik, The table-driven program for a Deterministic FSA state = S // S is the start state repeat { k = next character from the input if (k == EOF) // the end of input if state is a final state then accept else reject state = T[state,k] if state = empty then reject // got stuck }

14 Adapted from lecture by Ras Bodik, Regular expressions are not enough. You can write an automaton that accepts the specific strings: –“a”, “(a)”, “((a))”, and “(((a)))” But you can’t write one for this (in the general case): –“a”, “(a)”, “((a))”, “(((a)))”, … “( k a) k ”

15 Adapted from lecture by Ras Bodik, Regular expressions are not enough. What programs are generated by? digit+ ( ( “+” | “-” | “*” | “/” ) digit+ )* [Note: the Perl-style replacement operators such as /1 are not part of the definition of regular expressions.] What important properties does this regular expression fail to express? –Regex’s are not good at showing Precedence Nesting Recursion

16 Context-Free Grammars

17 Adapted from lecture by Ras Bodik, Derivation 1.Begin with a string consisting of the start symbol “S” 2.Replace any non-terminal in the string by a the right-hand side of some production 3.Repeat (2) until there are no non-terminals in the string

18 Adapted from lecture by Ras Bodik, Example Define a language to recognize strings of balanced parentheses: The grammar: Recognizes these strings: () (()) (((()))) …

19 Adapted from lecture by Ras Bodik, Derivation Rules Is the same as

20 Adapted from lecture by Ras Bodik, Arithmetic Example Simple arithmetic expressions: Some valid strings in the language:

21 Adapted from lecture by Ras Bodik, Arithmetic Example Simple arithmetic expressions: Some valid strings in the language:

22 Adapted from lecture by Ras Bodik, Derivations and Parse Trees A derivation is a sequence of productions A derivation can be drawn as a tree –Start symbol is the tree’s root –For a production add children to node –Stop when you reach all non-terminals

23 Adapted from lecture by Ras Bodik, Right-most Derivation Example Input: id * id + id E E EE E+ id*

24 Adapted from lecture by Ras Bodik, Left-most and Right-most Derivations The example is a right- most derivation –At each step, replace the right-most non-terminal There is an equivalent notion of a left-most derivation

25 Adapted from lecture by Ras Bodik, A formal definition of CFGs A CFG consists of –A set of terminals T –A set of non-terminals N –A start symbol S (a non-terminal) –A set of productions:

26 Adapted from lecture by Ras Bodik, Derivations and Parse Trees Note that right-most and left-most derivations have the same parse tree The difference is the order in which branches are added

27 Adapted from lecture by Ras Bodik, Example E E EE E+ id* The program: x * y + z Input to parser: ID TIMES ID PLUS ID we’ll write tokens as follows: id * id + id Output of parser: the parse tree 

28 Adapted from lecture by Ras Bodik, Ambiguity Grammar String

29 Adapted from lecture by Ras Bodik, Ambiguity (Cont.) This string has two parse trees E E EE E * id+ E E EE E+ *

30 Adapted from lecture by Ras Bodik, Ambiguity (Cont.) A grammar is ambiguous if for some string (the following three conditions are equivalent) –it has more than one parse tree –if there is more than one right-most derivation –if there is more than one left-most derivation Ambiguity is BAD –Makes the meaning of some programs ill-defined

31 Adapted from lecture by Ras Bodik, Dealing with Ambiguity There are several ways to handle ambiguity Most direct method is to rewrite grammar unambiguously Enforces precedence of * over +

32 Adapted from lecture by Ras Bodik, Properties of CFGs Membership in a language is “yes” or “no” Form of the grammar is important –Different grammars can generate the same language Need an “implementation” of CFG’s, –i.e. the parser –we’ll create the parser using a parser generator available generators: yacc, javacc

33 CFGs vs. Regex’s CFGs are better at expressing recursive structure –They are often described using trees … –… which we know have a recursive structure Example: –if E then S1 else S2 –if E1 then if E2 then S1 else S2 else S3 –if E1 then (if E2 then S1 else S2) else S3

34 Regular Languages vs CFGs Every regex can be expressed as a CFG The converse is not true –BUT regex’s cover much of what you need When to use which? –According to Aho, Sethi, and Ullman ’86: –Regex’s are more concise and easier to understand –More efficient analyzers can be constructed from regex’s –It is often useful to separate the structure of a language into lexical and nonlexical parts Lexical processed with regex The structure processed with CFGs