Department of Software & Media Technology

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Finite Automata CPSC 388 Ellen Walker Hiram College.
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Introduction to Finite Automata Adapted from the slides of Stanford CS154.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
1 An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
CS 3304 Comparative Languages
CS314 – Section 5 Recitation 2
Finite automate.
Lecture 2 Lexical Analysis
Chapter 2 :: Programming Language Syntax
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
Lexical Analysis (Sections )
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Non Deterministic Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Nondeterministic Finite Automata (NFA)
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
PROGRAMMING LANGUAGES
Finite-State Machines (FSMs)
Deterministic Finite Automata
Two issues in lexical analysis
Recognizer for a Language
Chapter 2 FINITE AUTOMATA.
Department of Software & Media Technology
Deterministic Finite Automata
Non-Deterministic Finite Automata
Non-Deterministic Finite Automata
Non Deterministic Automata
CS 3304 Comparative Languages
4b Lexical analysis Finite Automata
CS 3304 Comparative Languages
Compiler Construction
Chapter 2 :: Programming Language Syntax
4b Lexical analysis Finite Automata
Chapter 2 :: Programming Language Syntax
Lecture 5 Scanning.
Non Deterministic Automata
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Department of Software & Media Technology Scanning, or Lexical Analysis. Regular Grammars Non-terminals (arbitrary names) Terminals (characters) Productions limited to the following: Non-terminal ::= terminal Non-terminal ::= terminal Non-terminal Treat character class (e.g. digit) as terminal Regular grammars cannot count: cannot express size limits on identifiers, literals Cannot express proper nesting (parentheses) 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology Regular Grammars grammar for real literals with no exponent digit :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 REALVAL ::= digit REALVAL1 REALVAL1 ::= digit REALVAL1 (arbitrary size) REALVAL1 ::= . INTEGERVAL INTEGERVAL ::= digit INTEGERVAL (arbitrary size) INTEGERVAL ::= digit Start symbol is ? 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology Regular Expressions RE are defined by an alphabet (terminal symbols) and three operations: Alternation RE1 | RE2 Concatenation RE1 RE2 Repetition RE* (zero or more RE’s) Language of RE’s = regular grammars Regular expressions are more convenient for some applications 8 January 2004 Department of Software & Media Technology

Finite State Machines or Finite Automata (FSM or FA) A language defined by a grammar is a (possibly infinite) set of strings An automaton is a computation that determines whether a given string belongs to a specified language A finite state machine (FSM) is an automaton that recognize regular languages (regular expressions) Simplest automaton: memory is single number (state) 8 January 2004 Department of Software & Media Technology

Specifying an Finite State Machine (FA) A set of labeled states, directed arcs between states labeled with character One or more states may be terminal (accepting) Start is a distinguished state Automaton makes transition from state S1 to S2 If and only if arc from S1 to S2 is labeled with next character in input Token is legal if automaton stops on terminal state 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology FA from Grammar One state for each non-terminal A rule of the form Nt1 ::= terminal, generates transition from a state to final state Nt1 ::= terminal Nt2 Generates transition from state 1 to state 2 on an arc labeled by the terminal 8 January 2004 Department of Software & Media Technology

Graphic representation of FA digit letter underscore identifier 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology FA from RE Each RE corresponds to a grammar For all REs A natural translation to FSM exists Alternation often leads to non-deterministic machines 8 January 2004 Department of Software & Media Technology

Deterministic Finite Automata (DFA) For all states S For all characters C There is at most one arc from any state S that is labeled with C Easier to implement No backtracking Conventions for DFA: Error transitions are not explicitly shown Input symbols that result in the same transition are grouped together (this set can even be given a name) Still not displayed: stopping conditions and actions 8 January 2004 Department of Software & Media Technology

Non-Deterministic Finite Automata (NFA) A non-deterministic FA Has at least one state With two arcs to two distinct states Labeled with the same character Example: from start state, a digit can begin an integer literal or a real literal Implementation requires backtracking 8 January 2004 Department of Software & Media Technology

Lookahead & Backtracking in NFA letter start in_id [other] return id finish digit 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology Implementation of FA letter start in_id [other] return id finish digit 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology From RE to DFA & RE to NFA letter start in_id [other] return id finish digit 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology NFA to DFA There is an algorithm for converting a non-deterministic machine to a deterministic one Result may have exponentially more states Intuitively: need new states to express uncertainty about token: int or real Other algorithms for minimizing number of states of FSM, for showing equivalence, etc. 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology Example DFA 8 January 2004 Department of Software & Media Technology

Another view of the same DFA 8 January 2004 Department of Software & Media Technology

Yet another view of the same DFA 8 January 2004 Department of Software & Media Technology

State Minimization in DFA 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology TINY DFA: 8 January 2004 Department of Software & Media Technology

Department of Software & Media Technology Lex for Scanner Lex Conventions for RE Format of a Lex Input File 8 January 2004 Department of Software & Media Technology