Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos.

Similar presentations


Presentation on theme: "1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos."— Presentation transcript:

1 1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos Jan 14 The University of North Carolina at Chapel Hill

2 2 COMP 144 Programming Language Concepts Felix Hernandez-Campos Phases of Compilation

3 3 COMP 144 Programming Language Concepts Felix Hernandez-Campos Specification of Programming Languages PLs require precise definitions (i.e. no ambiguity)PLs require precise definitions (i.e. no ambiguity) –Language form (Syntax) –Language meaning (Semantics) Consequently, PLs are specified using formal notation:Consequently, PLs are specified using formal notation: –Formal syntax »Tokens »Grammar –Formal semantics

4 4 COMP 144 Programming Language Concepts Felix Hernandez-Campos Phases of Compilation

5 5 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanner Main task: identify tokensMain task: identify tokens –Basic building blocks of programs –E.g. keywords, identifiers, numbers, punctuation marks Desk calculator language example:Desk calculator language example: read A sum := A + 3.45e-3 write sum write sum / 2

6 6 COMP 144 Programming Language Concepts Felix Hernandez-Campos Formal definition of tokens A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet –{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …} A set of tokens is a regular set that can be defined by comprehension using a regular expressionA set of tokens is a regular set that can be defined by comprehension using a regular expression For every regular set, there is a deterministic finite automaton (DFA) that can recognize itFor every regular set, there is a deterministic finite automaton (DFA) that can recognize it –i.e. determine whether a string belongs to the set or not –Scanners extract tokens from source code in the same way DFAs determine membership

7 7 COMP 144 Programming Language Concepts Felix Hernandez-Campos Regular Expressions A regular expression (RE) is:A regular expression (RE) is: –A single character –The empty string,  –The concatenation of two regular expressions »Notation: RE 1 RE 2 (i.e. RE 1 followed by RE 2 ) –The union of two regular expressions »Notation: RE 1 | RE 2 –The closure of a regular expression »Notation: RE* »* is known as the Kleene star »* represents the concatenation of 0 or more strings

8 8 COMP 144 Programming Language Concepts Felix Hernandez-Campos Token Definition Example Numeric literals in PascalNumeric literals in Pascal –Definition of the token unsigned_number digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 unsigned_integer  digit digit* unsigned_number  unsigned_integer ( (. unsigned_integer ) |  ) ( ( e ( + | – |  ) unsigned_integer ) |  ) Recursion is not allowed!Recursion is not allowed! Notice the use of parentheses to avoid ambiguityNotice the use of parentheses to avoid ambiguity

9 9 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanning Pascal scannerPascal scannerPseudo-code

10 10 COMP 144 Programming Language Concepts Felix Hernandez-Campos DFAs Scanners areScanners aredeterministic finite automata (DFAs) –With some hacks

11 11 COMP 144 Programming Language Concepts Felix Hernandez-Campos Difficulties Keywords and variable namesKeywords and variable names Look-aheadLook-ahead –Pascal’s ranges [1..10] –FORTRAN’s example DO 5 I=1,25 => Loop 25 times up to label 5 DO 5 I=1.25 => Assign 1.25 to DO5I »NASA’s Mariner 1 (apocryphal?) Pragmas: significant commentsPragmas: significant comments –Compiler options

12 12 COMP 144 Programming Language Concepts Felix Hernandez-Campos Outline ofOutline of the Scanner

13 13 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanner Generators Scanners generators:Scanners generators: –E.g. lex, flex –These programs take a table as their input and return a program (i.e. a scanner) that can extract tokens from a stream of characters

14 14 COMP 144 Programming Language Concepts Felix Hernandez-Campos Table-drivenTable-drivenscanner Lexical errorsLexical errors

15 15 COMP 144 Programming Language Concepts Felix Hernandez-Campos Scanners and String Processing Scanning is a common task in programmingScanning is a common task in programming –String processing –E.g. reading configuration files, processing log files,… StringTokenizer and StreamTokenizer in JavaStringTokenizer and StreamTokenizer in Java –http://java.sun.com/products/jdk/1.2/docs/api/java/util/StringTokenizer.html http://java.sun.com/products/jdk/1.2/docs/api/java/util/StringTokenizer.html –http://java.sun.com/products/jdk/1.2/docs/api/java/io/StreamTokenizer.html http://java.sun.com/products/jdk/1.2/docs/api/java/io/StreamTokenizer.html Regular expressions in Python and other scripting languagesRegular expressions in Python and other scripting languages

16 16 COMP 144 Programming Language Concepts Felix Hernandez-Campos Reading Assignment Scott’s Chapter 2:Scott’s Chapter 2: –Introduction –Section 2.1.1 –Section 2.2.1


Download ppt "1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 3: Lexical Analysis COMP 144 Programming Language Concepts Spring 2002 Felix Hernandez-Campos."

Similar presentations


Ads by Google