Finite-State Machines (FSMs)

Slides:



Advertisements
Similar presentations
Finite-State Machines with No Output Ying Lu
Advertisements

4b Lexical analysis Finite Automata
From Cooper & Torczon1 The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language?
CSc 453 Lexical Analysis (Scanning)
CS5371 Theory of Computation
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis (I) Compiler Baojian Hua
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
SCRIBE SUBMISSION GROUP 8 Date: 7/8/2013 By – IKHAR SUSHRUT MEGHSHYAM 11CS10017 Lexical Analyser Constructing Tokens State-Transition Diagram S-T Diagrams.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CSc 453 Lexical Analysis (Scanning)
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Language Translation Part 2: Finite State Machines.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
1 Compiler Construction Vana Doufexi office CS dept.
Department of Software & Media Technology
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Finite automate.
CS510 Compiler Lecture 2.
Lecture 2 Lexical Analysis
Chapter 2 Scanning – Part 1 June 10, 2018 Prof. Abdelaziz Khamis.
CSc 453 Lexical Analysis (Scanning)
Finite-State Machines (FSMs)
Lexical analysis Finite Automata
Compilers Welcome to a journey to CS419 Lecture5: Lexical Analysis:
Nondeterministic Finite Automata (NFA)
CSc 453 Lexical Analysis (Scanning)
RegExps & DFAs CS 536.
CS 536 / Fall 2017 Introduction to programming languages and compilers
Two issues in lexical analysis
Recognizer for a Language
Department of Software & Media Technology
Some slides by Elsa L Gunter, NJIT, and by Costas Busch
Review: Compiler Phases:
Lexical Analysis Lecture 3-4 Prof. Necula CS 164 Lecture 3.
Finite Automata.
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Lexical Analysis.
Compiler Structures 2. Lexical Analysis Objectives
CSC312 Automata Theory Transition Graphs Lecture # 9
Compiler Construction
Lecture 5 Scanning.
Announcements - P1 part 1 due Today - P1 part 2 due on Friday Feb 1st
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

Finite-State Machines (FSMs) CS 536

Assignments P1 assigned Part 1 due on the 25th of Jan

Last time A compiler is a For example, gcc: S is C, T is x86, H is C recognizer of language S (Source) a translator from S to T (Target) a program in language H (Host) For example, gcc: S is C, T is x86, H is C

Last time Why do we need a compiler? Processors can execute only binaries (machine-code/assembly programs) Writing assembly programs will make you want to kill yourself Write programs in a nice(ish) high-level language like C; compile to binaries

Last time front end = understand source code S IR = intermediate representation back end = map IR to T

Last time P2 P3 P1 Symbol table P4, P5 front end back end P6

Special linkage between scanner and parser in most compilers Source Program Sequence of characters Implementation: master/slave (or “coroutine”) syntax analyzer (parser) … next token, please source code a < = p lexical analyzer (scanner) lexical analyzer (scanner) Sequence of tokens syntax analyzer (parser) … Conceptual organization

The scanner a = 2 * b + abs(-71) Translates sequence of chars into a sequence of tokens (ignoring whitespace) Each time the scanner is called it should: find the longest prefix (lexeme) of the remaining input that corresponds to a token return that token a = 2 * b + abs(-71) ident (a) asgn int lit (2) times ident (b) plus ident (abs) lparens int lit (-71) rparens

How to create a scanner? For every possible lexeme that can occur in source program, return corresponding token Inefficient Error-prone

Scanner generator Generates a scanner Inputs: - one regular expression for each token - one regular expressions for each item to ignore (comments, whitespace, etc.) Output: scanner program How does a scanner generator work? - Finite-state machines (FSMs)

FSMs: Finite State Machines (A.k.a. finite automata, finite-state automata, etc.) Input: string (sequence of chars) Output: accept / reject i.e., input is legal in language Language defined by an FSM is the set of strings accepted by the FSM

Example 1 Language: single line comments with // Nodes are states Edges are transitions Start state has an arrow (only one start state) Final states are double circles (one or more)

Example 1 Language: single line comments with // “// this is a comment.” “/ / this is not.” “// \n” “Not // a comment”

Example 2 Language: Integer literals with an optional + or – (token: int-lit) e.g., -543, +15, 0007 digit 3 digit digit ‘+’ 1 2 ‘-’

L(M) = set of integer literals FSMs, formally M ≡ final states the alphabet (characters) transition function finite set of states start state L(M) = set of integer literals ‘+’ ‘-’ digit 1 2 3

FSM example, formally M ≡ L(M) = {ε, ab, abab, ababab, abababab, …. } What is L(M)? L(M) = {ε, ab, abab, ababab, abababab, …. } s1 s0 a b c anything else, machine is stuck

Coding an FSM curr_state = start_state done = false while (!done) ch = nextChar() next = table[curr_state][ch] if (next == stuck || ch == EOF) done = true else curr_state = next return final_states.contains(curr_state) && next!=stuck

FSM types: DFA & NFA Deterministic Nondeterministic no state has >1 outgoing edge with same label Nondeterministic states may have multiple outgoing edges with same label edges may be labelled with special symbol ɛ (empty string) ɛ-transitions can happen without reading input

NFA Example Language: Integer literals with an optional + or – (token: int-lit) e.g., -543, +15, 0007 digit digit 3 3 digit digit digit ‘+’, ‘-’ ‘+’ 1 2 1 2 ‘-’ ‘ε’ A string is accepted by an NFA if there exists a sequence of transitions leading to a final state

Why NFA? Simpler and more intuitive than DFA Language: sequence of 0s and 1s, ending with 00

Extra example A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit.

Extra Example - Part 1 A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit. digit, letter, ‘_’ 1 2 ‘_’, letter

Extra example A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit. What if you wanted to add the restriction that it can't end with an underscore?

Extra Example - Part 2 What if you wanted to add the restriction that it can't end with an underscore? digit, letter, ‘_’ 1 2 3 ‘_’, letter digit, letter letter

Recap The scanner reads a stream of characters and tokenizes it (i.e., finds tokens) Tokens are defined using regular expressions, scanners are implemented using FSMs FSMs can be non-deterministic Next time: understand connection between DFA and NFA, regular languages and regular expressions

automatatutor.com Loris D’Antoni Play with automata! automatatutor.com Loris D’Antoni