1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos.

Slides:



Advertisements
Similar presentations
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Advertisements

Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Lexical Analysis - Scanner Computer Science Rensselaer Polytechnic Compiler Design Lecture 2.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
CSE 3302 Programming Languages Chengkai Li, Weimin He Spring 2008 Syntax Lecture 2 - Syntax, Spring CSE3302 Programming Languages, UT-Arlington ©Chengkai.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 12: Semantic Analysis COMP 144 Programming Language Concepts Spring 2002 Felix.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
2. Lexical Analysis Prof. O. Nierstrasz
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 5: Syntax Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos.
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Compiler Construction
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary  Quoted string in.
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 4: Syntax Specification COMP 144 Programming Language Concepts Spring 2002 Felix.
Scanner Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source language? Is the.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
CMSC 331, Some material © 1998 by Addison Wesley Longman, Inc. 1 Chapter 4 Chapter 4 Lexical analysis.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
1 Syntax Specification Regular Expressions. 2 Phases of Compilation.
1 Chapter 3 Scanning – Theory and Practice. 2 Overview Formal notations for specifying the precise structure of tokens are necessary –Quoted string in.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
COMP 3438 – Part II - Lecture 2: Lexical Analysis (I) Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ. 1.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Lexical Analyzer (Checker)
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
1 Syntax Specification (Sections ) CSCI 431 Programming Languages Fall 2003 A modification of slides developed by Felix Hernandez-Campos at UNC.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
CSc 453 Lexical Analysis (Scanning)
CSE 425: Syntax I Syntax and Semantics Syntax gives the structure of statements in a language –Allowed ordering, nesting, repetition, omission of symbols.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 2: Lexical Analysis.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 3.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Compiler Construction Vana Doufexi office CS dept.
Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
COMP3190: Principle of Programming Languages DFA and its equivalent, scanner.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Department of Software & Media Technology
CS 3304 Comparative Languages
Intro to compilers Based on end of Ch. 1 and start of Ch. 2 of textbook, plus a few additional references.
Languages.
Chapter 3 Lexical Analysis.
CS 326 Programming Languages, Concepts and Implementation
Chapter 2 :: Programming Language Syntax
Lexical Analysis (Sections )
CSE 105 theory of computation
Lexical analysis Jakub Yaghob
Review: Compiler Phases:
CS 3304 Comparative Languages
CS 3304 Comparative Languages
Compiler Construction
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Lecture 5 Scanning.
CSc 453 Lexical Analysis (Scanning)
Presentation transcript:

1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos Jan 14 The University of North Carolina at Chapel Hill

2 COMP 144 Programming Language Concepts Felix Hernandez-Campos Phases of Compilation

3 COMP 144 Programming Language Concepts Felix Hernandez-Campos Specification of Programming Languages PLs require precise definitions (i.e. no ambiguity)PLs require precise definitions (i.e. no ambiguity) –Language form (Syntax) –Language meaning (Semantics) Consequently, PLs are specified using formal notation:Consequently, PLs are specified using formal notation: –Formal syntax »Tokens »Grammar –Formal semantics

4 COMP 144 Programming Language Concepts Felix Hernandez-Campos Phases of Compilation

5 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanner Main task: identify tokensMain task: identify tokens –Basic building blocks of programs –E.g. keywords, identifiers, numbers, punctuation marks Desk calculator language example:Desk calculator language example: read A sum := A e-3 write sum write sum / 2

6 COMP 144 Programming Language Concepts Felix Hernandez-Campos Formal definition of tokens A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet –{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …} A set of tokens is a regular set that can be defined by comprehension using a regular expressionA set of tokens is a regular set that can be defined by comprehension using a regular expression For every regular set, there is a deterministic finite automaton (DFA) that can recognize itFor every regular set, there is a deterministic finite automaton (DFA) that can recognize it –i.e. determine whether a string belongs to the set or not –Scanners extract tokens from source code in the same way DFAs determine membership

7 COMP 144 Programming Language Concepts Felix Hernandez-Campos Regular Expressions A regular expression (RE) is:A regular expression (RE) is: –A single character –The empty string,  –The concatenation of two regular expressions »Notation: RE 1 RE 2 (i.e. RE 1 followed by RE 2 ) –The union of two regular expressions »Notation: RE 1 | RE 2 –The closure of a regular expression »Notation: RE* »* is known as the Kleene star »* represents the concatenation of 0 or more strings

8 COMP 144 Programming Language Concepts Felix Hernandez-Campos Token Definition Example Numeric literals in PascalNumeric literals in Pascal –Definition of the token unsigned_number digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 unsigned_integer  digit digit* unsigned_number  unsigned_integer ( (. unsigned_integer ) |  ) ( ( e ( + | – |  ) unsigned_integer ) |  ) Recursion is not allowed!Recursion is not allowed! Notice the use of parentheses to avoid ambiguityNotice the use of parentheses to avoid ambiguity

9 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanning Pascal scannerPascal scannerPseudo-code

10 COMP 144 Programming Language Concepts Felix Hernandez-Campos DFAs Scanners areScanners aredeterministic finite automata (DFAs) –With some hacks

11 COMP 144 Programming Language Concepts Felix Hernandez-Campos Difficulties Keywords and variable namesKeywords and variable names Look-aheadLook-ahead –Pascal’s ranges [1..10] –FORTRAN’s example DO 5 I=1,25 => Loop 25 times up to label 5 DO 5 I=1.25 => Assign 1.25 to DO5I »NASA’s Mariner 1 (apocryphal?) Pragmas: significant commentsPragmas: significant comments –Compiler options

12 COMP 144 Programming Language Concepts Felix Hernandez-Campos Outline ofOutline of the Scanner

13 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanner Generators Scanners generators:Scanners generators: –E.g. lex, flex –These programs take a table as their input and return a program (i.e. a scanner) that can extract tokens from a stream of characters

14 COMP 144 Programming Language Concepts Felix Hernandez-Campos Table-drivenTable-drivenscanner Lexical errorsLexical errors

15 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanners and String Processing Scanning is a common task in programmingScanning is a common task in programming –String processing –E.g. reading configuration files, processing log files,… StringTokenizer and StreamTokenizer in JavaStringTokenizer and StreamTokenizer in Java – – Regular expressions in Python and other scripting languagesRegular expressions in Python and other scripting languages

16 COMP 144 Programming Language Concepts Felix Hernandez-Campos Reading Assignment Scott’s Chapter 2:Scott’s Chapter 2: –Introduction –Section –Section 2.2.1