1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

4b Lexical analysis Finite Automata
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
Finite Automata Chapter 5. Formal Language Definitions Why need formal definitions of language –Define a precise, unambiguous and uniform interpretation.
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.
2. Scanning College of Information and Communications Prof. Heejin Park.
Lexical Analyzer (Checker)
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CSc 453 Lexical Analysis (Scanning)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2006.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Deterministic Finite Automata Nondeterministic Finite Automata.
Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Compilers Lexical Analysis 1. while (y < z) { int x = a + b; y += x; } 2.
Department of Software & Media Technology
COMP 3438 – Part II - Lecture 3 Lexical Analysis II Par III: Finite Automata Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Finite automate.
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
CSc 453 Lexical Analysis (Scanning)
Two issues in lexical analysis
Recognizer for a Language
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Recognition of Tokens.
NFAs and Transition Graphs
Finite Automata.
4b Lexical analysis Finite Automata
Finite Automata & Language Theory
4b Lexical analysis Finite Automata
NFAs and Transition Graphs
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that recognize regular languages (regular expressions) Finite State Machines

2 Finite State Machines(cont’d) In particular,finite automata can be used to describe the process of recognizing patterns in input strings and so can be used to construct scanners.

3 Finite State Machines(cont’d Formal basis for lexical analysis is the finite state automaton (FSA) – REs generate regular sets – FSAs recognize regular sets FSA – informal definition: – A finite set of states – Transitions between states – An initial state (start) – A set of final states (accepting states)

4 Finite Automata and Lexical Analysis The tokens of a language are specified using regular expressions. A scanner is a big DFA, essentially the “aggregate” of the automata for the individual tokens.

5 Two Kinds of FSA Non-deterministic finite automata (NFA) – There may be multiple possible transitions or some transitions that do not require an input (  ) Deterministic finite automata (DFA) – The transition from each state is uniquely determined by the current input character For each state, at most 1 edge labeled ‘a’ leaving state – No  transitions

6 Implementing the Scanner Three methods – Hand-coded approach: Draw DFSM, then implement with loop and case statement – Hybrid approach : Define tokens using regular expressions, convert to NFSM, apply algorithm to obtain minimal DSFM Hand-code resulting DFSM – Automated approach: Use regular grammar as input to lexical scanner generator (e.g. LEX)

7 Hand-coding Branch depending on first character: – If digit, scan numeric literal – If character, scan identifier or keyword – If operator, check next character (++, etc.) Return token found Write aggressive efficient code: goto’s, global variables

8 NFAs & DFAs Non-Deterministic Finite Automata (NFAs) easily represent regular expression, but are somewhat less precise. Deterministic Finite Automata (DFAs) require more complexity to represent regular expressions, but offer more precision.

9 Non-Deterministic Finite Automata An NFA is a mathematical model that consists of :  A set of states,S  A set of input symbols  (input symbol alphabet)  A transition function,move, that maps state-symbol pairs to sets of states.  move(state, symbol)  set of states  A state s 0 that is distinguished as the start (or initial) state  A set of states F, distinguished as accepting (or final)state

10 Representing NFAs Transition Diagrams : Transition Tables: Number states (circles), arcs, final states, … More suitable to representation within a computer We’ll see examples of both !

11 Example NFA S = { 0, 1, 2, 3 } s0 = 0 F = { 3 }  = { a, b } start 0 3 b 21 ba a b What is the Transition Table ?  (null) moves possible ji  Switch state but do not use any input symbol statestate 2 ab { 0, 1 } -- { 3 } { 0 } What Language is defined ?(a|b)*abb 0 1 { 2 }

12 How Does An NFA Work ? start 0 3 b 21 ba a b Given an input string, we trace moves If no more input & in final state, ACCEPT move(0, a) = 0 move(0, b) = 0 move(0, a) = 1 move(1, b) = 2 move(2, b) = 3 ACCEPT ! move(0, a) = 1 move(1, b) = 2 move(2, a) = ? (undefined) REJECT ! EXAMPLE: Input: ababb -OR-

13 How Does An NFA Work ?(cont’d) An NFA can be represented diagrammatically by a labeled directed graph,called transition graph,in which the nodes are the states and the labeled edges represent the transition function. This graph looks like a transition diagram,but the same character can label two or more transitions out of one state, and edges can be labeled by the especial  as well as by input symbols.

14 How Does An NFA Work ?(cont’d) The transition graph for an NFA that recognizes the language (a | b)*abb. The set of states of the NFA is { 0,1,2,3 } and the input symbol alphabet is { a, b} State 0 is distinguish as the start state and state 3 is indicated by a double circle.

15 How Does An NFA Work ?(cont’d) When describing an NFA we use the transition graph representation. In a computer, the transition function of an NFA can be implemented in several different ways.

16 How Does An NFA Work ?(cont’d) The easiest implementation is a transition table in which there is a row for each state and a column for each input symbol and  if necessary. The entry for row i and symbol a in the table is the set of states(or more likely in practice, a pointer to the set of states) that can be reached by a transition from state i on input “a”. the transition table for the NFA is shown in the above slides.

17 The transition table representation has the advantage that it provides fast access to the transitions of a given state on a given character; its disadvantage is that it can take up a lot of space when the input alphabet is large and most transitions are to the empty set.

18 Handling Undefined Transitions start 03 b 21 ba a b 4 a, b a a  We can handle undefined transitions by defining one more state, a “death” state, and transitioning all previously undefined transition to this death state.

19 NFA- Regular Expressions & Compilation Problems with NFAs for Regular Expressions: 1. Valid input might not be accepted 2. NFA may behave differently on the same input Relationship of NFAs to Compilation: 1. Regular expression “recognized” by NFA 2. Regular expression is “pattern” for a “token” 3. Tokens are building blocks for lexical analysis 4. Lexical analyzer can be described by a collection of NFAs. Each NFA is for a language token.

20 Second NFA Example Given the regular expression : (a (b*c)) | (a (b | c+)?) Find a transition diagram NFA that recognizes it start c b  c b c  a String abbc can be accepted.

21 Alternative Solution Strategy a (b*c) 32 b ca c a c 4 b a (b | c+)?

22 Using Null Transitions to “OR” NFAs 32 b ca c a c 4 b 0  

23 Other Concepts Not all paths may result in acceptance. start 03 b 21 ba a b aabb is accepted along path : 0  0  1  2  3 BUT… it is not accepted along the valid path: 0  0  0  0  0 baba a abb

24 NFA Example Recognizes: aa* | b | ab 3 a a b    start  ab 1,2,3-- -4Error -Error5 -2Error -4Error -ErrorError a Can represent FA with either graph of transition table

25 Deterministic Finite Automata A DFA is an NFA with a few restrictions – No epsilon transitions(€) – For every state s, there is only one transition (s,x) from s for any symbol x in Σ – ADVANTAGES – Easy to implement a DFA with an algorithm! – Deterministic behavior

26 Simulating a DFA INPUT : An input string x terminated by end of file character eof(or any other delimiter). A DFA ‘d’ with start state sº and a set of accepting states F. OUTPUT: The answer “yes” if ‘d’ accepts x, “no” other wise

27 Simulating a DFA METHOD: Apply the algorithm to the input string x.The function move(s,c) gives the state to which there is a transition from state s on input character c. The function nextchar returns the next character of the input string x.

28 Simulating a DFA(cont’d) s = s0 c = nextchar; while c  eof do s = move(s,c); c = nextchar; end; if s is in F then return “yes” else return “no”

29 Following transition graph of a DFA accepting the language (a|b)*abb as that accepted by the NFA.With this DFA and input string ababb algorithm follows the sequence of state 0,1,2,1,2,3 and return “yes”.

30  String ababb start 03 b 21 ba b a b a a 03 b 21 ba a b Recall the original NFA : DFA accepting (a|b)*abb

31 DFA Example Recognizes: aa* | b | ab a b start a b 1 a

32 Finite State Machines(Cont’d) The pattern for identifiers as commonly defined in programming languages is given by the following regular definition. ID = Letter (Letter | Digit)* This represent a string that begins with a letter and continues with any sequence of letters and / or digits. The process of recognizing such a string can be described by the diagram.

33 Finite State Machines(Cont’d) ID Digit 1 2 Letter 3 2

34 Finite State Machines(Cont’d) In the diagram,the circles numbered 1 and 2 represent states, which are locations in the process of recognition that record how much of the pattern has already been seen. The arrowed lines represents transitions that record a change from one state to another upon a match of the character or characters by which they are labeled.

35 Finite State Machines(Cont’d) In the sample diagram, the state 1 is the start state or the state at which the recognition process begins.By convention the start state is indicated by drawing an un labeled arrowed line to it.On state 2 any number of letters and /or digits may be seen and a match to these return us to state 2.

36 Finite State Machines(Cont’d) The states that represent the end of the recognition process in which we can declare success are called Accepting States and are indicated by drawing a double line border around the state in the diagram.There may be more than one of these.In the the sample diagram state 3 is an accepting state indicating that after a letter is seen,any subsequent sequence of letters and digits represents a legal identifier.

37 Example The process of recognizing an actual character string as an identifier can now be indicated by listing the sequence of states and transitions in the diagram that are used in the recognition process.For example,the process of recognizing Xtemp as an identifier can be indicated as follows. 2 pe 1222 txm 2

38 Where Are The Missing Transitions? The answer is that they represent errors that is,in recognizing an identifier we can not accept any characters other than letters from the start state and letters or numbers after that.The convention is that these error transitions are not drawn in the diagram but are simply assumed to always exist.If we were to draw them the diagram for an identifier would look as show in next slide.

39 Where Are The Missing Transitions?(cont’d) Digit Start 2 Letter In_id Error Other Any

40 Where Are The Missing Transitions?(cont’d) In the figure,we have labeled the new state error (Since it represents an erroneous occurrence),and we have labeled the error transitions OTHER.By convention,other represents any character not appearing in any other transition from the state where it originates.Thus the definition of OTHER coming from the start state is Other = - Letter

41 Where Are The Missing Transitions?(cont’d) The definition of other coming from the stat in_id is Other = - (Letter|Digit)

42 Where Are The Missing Transitions?(cont’d) Digit Start Letter In_id {Other} 2 Finish Return ID

43 Structure of a Scanner Automaton

44 How much should we match? In general, find the longest match possible. E.g., on input , match this as num_const(123.45) rather than num_const(123), “.”, num_const(45).