CPSC 325 - Compiler Tutorial 2 Scanner & Lex.

Slides:



Advertisements
Similar presentations
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Advertisements

176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Tutorial 1 Scanner & Parser
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
College of Computer Science & Technology Compiler Construction Principles & Implementation Techniques -1- Compiler Construction Principles & Implementation.
Lexical Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 2.
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Automata and Regular Expression Discrete Mathematics and Its Applications Baojian Hua
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
Chapter 3 Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compilers: lex/3 1 Compiler Structures Objectives – –describe lex – –give many examples of lex's use , Semester 1, Lex.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
FLEX Fast Lexical Analyzer EECS Introduction Flex is a lexical analysis (scanner) generator. Flex is provided with a user input file or Standard.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
LEX (04CS1008) A tool widely used to specify lexical analyzers for a variety of languages We refer to the tool as Lex compiler, and to its input specification.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Lexical Analyzer in Perspective
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions 1.
1 Lex. 2 Lex is a lexical analyzer Var = ; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:
CPS 506 Comparative Programming Languages Syntax Specification.
1.  It is the first phase of compiler.  In computer science, lexical analysis is the process of converting a sequence of characters into a sequence.
Introduction to Yacc Ying-Hung Jiang
Introduction Lecture 1 Wed, Jan 12, The Stages of Compilation Lexical analysis. Syntactic analysis. Semantic analysis. Intermediate code generation.
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
Introduction to Lexical Analysis and the Flex Tool. © Allan C. Milne Abertay University v
CSc 453 Lexical Analysis (Scanning)
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
The Role of Lexical Analyzer
Lexical Analysis – Part II EECS 483 – Lecture 3 University of Michigan Wednesday, September 13, 2006.
Lexical Analysis.
1st Phase Lexical Analysis
Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.
Scanner Generation Using SLK and Flex++ Followed by a Demo Copyright © 2015 Curt Hill.
1 February 23, February 23, 2016February 23, 2016February 23, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University.
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
CS 3304 Comparative Languages
More on scanning: NFAs and Flex
Lexical Analyzer in Perspective
Lexical and Syntax Analysis
Lecture 2 Lexical Analysis
Lexical Analysis.
Tutorial On Lex & Yacc.
Chapter 2 :: Programming Language Syntax
Lecture 2 Lexical Analysis Joey Paquet, 2000, 2002, 2012.
Lexical Analysis (Sections )
CSc 453 Lexical Analysis (Scanning)
Using SLK and Flex++ Followed by a Demo
PROGRAMMING LANGUAGES
Bison: Parser Generator
Languages, Automata, Regular Expressions & Scanners
Chapter 2 :: Programming Language Syntax
Chapter 2 :: Programming Language Syntax
Compiler Design Yacc Example "Yet Another Compiler Compiler"
Systems Programming & Operating Systems Unit – III
CSc 453 Lexical Analysis (Scanning)
Compiler Design 3. Lexical Analyzer, Flex
Presentation transcript:

CPSC 325 - Compiler Tutorial 2 Scanner & Lex

Tokens Input Token Stream: Each significant lexical chunk of the program is represented by a token Operators & Punctuation: { } ! + - = * ; : … Keywords: if while return goto Identifier: id & actual name Constants: kind & value; int, floating-point character, string, …

Token – example 1 Input text if( x >= y ) y = 10; Token Stream IF LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI

Parser Tokens IF LP ID(x) GEQ ID(y) RP ID(y) Assign INT(10) SEMI IfStmt >= assign ID(y) ID(y) INT(10) ID(x)

Sample Grammar Program ::= statement | program statement Statement ::= assignStmt | ifStmt assignStmt ::= id = expr; ifStmt ::= if ( expr ) Statement Expr ::= id | int | expr + expr id ::= a | b | … | y | z Int ::= 1 | 2 | … | 9 | 0 a, b, 1, 2, 0 – terminal symbols; program, statement, id: non-terminal symbols.

Why Separate the Scanner and Parser? Simplicity & Separation of Concerns Scanner hides details from parser (comments, whitespace, input files, etc.) Parser is easier to build; has simpler input stream Efficiency Scanner can use simpler, faster design (But still often consumes a surprising amount of the compiler’s total execution time)

Principle of Longest Match In most of languages, the scanner should pick the longest possible string to make up the next token if there is a choice. Example return apple != banana; Should be recognized as 5 tokens Not more (not parts of words or identifier, or ! And = as separate tokens) return ID(apple) NEQ ID(banana) SEMI

Scanner DFA Example (1) White space or comments Accept EOF 1 Accept EOF 1 end of input ( Accept LP 2 ) 3 Accept RP ; 4 Accept SEMI

Scanner DFA Example (2) White space or comments Accept NEQ ! = 6 5 Accept NOT other 7 8 < = 9 Accept LEQ other 10 Accept LESS

Scanner DFA Example (3) White space or comments [0-9] [0-9] 11 Accept INT other 12

Scanner DFA Example (4) White space or comments [a-zA-Z] [a-zA-Z] 13 Accept ID or keyword other 14

Lex/Flex Use Flex instead of Lex Use Bison instead of yacc When compile, link to the library flex file.lex gcc –o object lex.yy.c –ll object

Lex - Structure Declarations/Definitions %% Rules/Production - Lex expression - white space - C statement (optional) Additional Code/Subroutines

Lex – Basic operators * - zero or more occurrences . - “ANY” character .* - matches any sequence | - separator + - one or more occurrences. (a+ :== aa*) ? - zero or one of something. (b? :== (b+null) [ ] - choice, so [12345]  (1|2|3|4|5) (Note: [*+] represent a choice between star and plus. They lost their specialty. - - [a-zA-Z]  a to z and A to Z, all the letters. \ - \* matches *, and \. Match period or decimal point.