1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Regular Expressions Finite State Automaton. Programming Languages2 Regular expressions  Terminology on Formal languages: –alphabet : a finite set of.
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Winter 2007SEG2101 Chapter 81 Chapter 8 Lexical Analysis.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
2. Lexical Analysis Prof. O. Nierstrasz
Compiler Construction
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Topic #3: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
어휘분석 (Lexical Analysis). Overview Main task: to read input characters and group them into “ tokens. ” Secondary tasks: –Skip comments and whitespace;
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Topic #3: Lexical Analysis EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
May 31, May 31, 2016May 31, 2016May 31, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University,
1 November 1, November 1, 2015November 1, 2015November 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Lexical Analysis Lecture 2 Mon, Jan 19, Tokens A token has a type and a value. Types include ID, NUM, ASSGN, LPAREN, etc. Values are used primarily.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CSc 453 Lexical Analysis (Scanning)
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Lexical Analysis (Scanning) Lexical Analysis (Scanning)
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
Finite automate.
Lexical analysis Finite Automata
CSc 453 Lexical Analysis (Scanning)
Recognizer for a Language
Lecture 5: Lexical Analysis III: The final bits
Lexical Analysis Lecture 2 Mon, Jan 17, 2005.
4b Lexical analysis Finite Automata
Compiler Construction
4b Lexical analysis Finite Automata
Compiler Construction
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying what kind of token has been read (e.g. identifier, operator, integer literal, etc.) Once the scanner identifies a token, it sends it off to the parser and starts over with the next word. Some tokens need additional data to be carried along with them For example, an identifier token needs to have the identifier itself attached to it. Alternatively, the scanner generates a file of tokens which is then input to the parser.

2 The scanning process A simple hand-written scanner would look a bit like this: … nextchar = getNextChar(); switch (nextchar) { case '(': return LPAREN; /* return LPAREN token */ case 0: case 1:... case 9: nextchar = getNextChar(); while (nextchar is a digit) { concat the digits to build an integer nextchar = getNextChar(); } putBack(nextchar) make a new INTEGER token with the integer value attached return INTEGER;... } …

3 The scanning process Not always as simple as it seems Example from old versions of FORTRAN: Instead of writing a scanner by hand, we can automate the process. Specify what needs to be recognized and what to do when something is recognized. Have a scanner generator create the scanner based on our specification. Hand-written vs. automated scanner DO 5 I=1,10 vs. DO 5 I=1.10

4 The scanning process Specify what needs to be recognized. Some tokens are easy to identify e.g. = is an assignment operator, ( is a parenthesis Others are more complex How would the scanner recognize an identifier? The set of possible identifiers is very large or even infinite (assuming no length restrictions) SOLUTION: Recognize a pattern! Example: An identifier is a sequence of letters or digits that starts with a letter. We need a way to describe this pattern to our scanner generator. Regular expressions come to the rescue!

5 The scanning process Definition: Regular expressions (over alphabet  )  is an RE denoting {  } If , then  is an RE denoting {  } If r and s are REs, then (r) is an RE denoting L(r) r|s is an RE denoting L(r)  L(s) rs is an RE denoting L(r)L(s) r* is an RE denoting the Kleene closure of L(r) Property: REs are closed under many operations This allows us to build complex REs.

6 Regular Definitions A regular expression that describes digits is: 0|1|2|3|4|5|6|7|8|9 For convenience, we'd like to give it a name and then use the name in building more complex regular expressions: digit  0|1|2|3|4|5|6|7|8|9 This is called a regular definition. Example letter  a|...|z|A|...|Z ident  letter (letter | digit)*

7 What’s next Given an input string, we need a “machine” that has a regular expression hard-coded in it and can tell whether the input string matches the pattern described by the regular expression or not. A machine that determines whether a given string belongs to a language is called a finite automaton.

8 The scanning process Definition: Deterministic Finite Automaton a five-tuple ( , S, , s 0, F) where  is the alphabet S is the set of states  is the transition function (S  S) s 0 is the starting state F is the set of final states (F  S) Notation: Use a transition diagram to describe a DFA states are nodes, transitions are directed, labeled edges, some states are marked as final, one state is marked as starting If the automaton stops at a final state on end of input, then the input string belongs to the language.

9 The scanning process Goal: automate the process Idea: Start with an RE Build a DFA How? We can build a non-deterministic finite automaton (Thompson's construction) Convert that to a deterministic one (Subset construction) Minimize the DFA (Hopcroft's algorithm) Implement it Existing scanner generator: flex