CS 536 Spring 20011 Learning the Tools: JLex Lecture 6.

Slides:



Advertisements
Similar presentations
AP Computer Science Anthony Keen. Computer 101 What happens when you turn a computer on? –BIOS tries to start a system loader –A system loader tries to.
Advertisements

Lexical Analysis Consider the program: #include main() { double value = 0.95; printf("value = %f\n", value); } How is this translated into meaningful machine.
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
Lexical Analysis with lex(1) and flex(1) © 2011 Clinton Jeffery.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #4 Lexical.
Lexical Analysis Textbook:Modern Compiler Design Chapter 2.1
Tools for building compilers Clara Benac Earle. Tools to help building a compiler C –Lexical Analyzer generators: Lex, flex, –Syntax Analyzer generator:
Chapter 3 Chang Chi-Chung. The Structure of the Generated Analyzer lexeme Automaton simulator Transition Table Actions Lex compiler Lex Program lexemeBeginforward.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Scripting Languages Chapter 8 More About Regular Expressions.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Compiler Construction Lexical Analysis Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Flex. 2 Flex A Lexical Analyzer Generator  generates a scanner procedure directly, with regular expressions and user-written procedures Steps to using.
Introduction to Java Thanks to Dan Lunney (SHS). Java Basics File names The “main” method Output to screen Escape Sequence – Special Characters format()
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
Lexical Analysis Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Textbook:Modern.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Winter Compiler Construction T2 – Lexical Analysis (Scanning) Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv University.
Review: Regular expression: –How do we define it? Given an alphabet, Base case: – is a regular expression that denote { }, the set that contains the empty.
Lecture 2: Lexical Analysis
CPSC 388 – Compiler Design and Construction Scanners – JLex Scanner Generator.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
Chapter 2: Java Fundamentals
Scanning & FLEX CPSC 388 Ellen Walker Hiram College.
LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.
Regular Expressions.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
CS 536 Fall Scanner Construction  Given a single string, automata and regular expressions retuned a Boolean answer: a given string is/is not in.
Jianguo Lu 1 Explanation on Assignment 2, part 1 DFASimulator.java –The algorithm is on page 116, dragon book. DFAInput.txt The Language is: 1* 0 1* 0.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
JLex Lecture 4 Mon, Jan 24, JLex JLex is a lexical analyzer generator in Java. It is based on the well-known lex, which is a lexical analyzer generator.
Introduction to Lex Ying-Hung Jiang
Compiler Construction Lexical Analysis. 2 Administration Project Teams Project Teams Send me your group Send me your group
© 2004 Pearson Addison-Wesley. All rights reserved ComS 207: Programming I Instructor: Alexander Stoytchev
IN LINE FUNCTION AND MACRO Macro is processed at precompilation time. An Inline function is processed at compilation time. Example : let us consider this.
1 Using Lex. Flex – Lexical Analyzer Generator A language for specifying lexical analyzers Flex compilerlex.yy.clang.l C compiler -lfl a.outlex.yy.c a.outtokenssource.
Introduction to Lex Fan Wu
Lexical Analysis with lex(1) and flex(1) © 2014 Clinton Jeffery.
Flex Fast LEX analyzer CMPS 450. Lexical analysis terms + A token is a group of characters having collective meaning. + A lexeme is an actual character.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
Compiler Construction Sohail Aslam Lecture 9. 2 DFA Minimization  The generated DFA may have a large number of states.  Hopcroft’s algorithm: minimizes.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
C Chuen-Liang Chen, NTUCS&IE / 35 SCANNING Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei,
LECTURE 6 Scanning Part 2. FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
LEX SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
9-December-2002cse Tools © 2002 University of Washington1 Lexical and Parser Tools CSE 413, Autumn 2002 Programming Languages
LEX & Yacc Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
Sung-Dong Kim, School of Computer Engineering, Hansung University
Tutorial On Lex & Yacc.
RegExps & DFAs CS 536.
JLex Lecture 4 Mon, Jan 26, 2004.
Appendix B.1 Lex Appendix B.1 -- Lex.
Compiler Design 3. Lexical Analyzer, Flex
Lex Appendix B.1 -- Lex.
Presentation transcript:

CS 536 Spring Learning the Tools: JLex Lecture 6

CS 536 Spring Jlex: a scanner generator JLex.Main (java) javac P2.main (java) jlex specification xxx.jlex xxx.jlex.java generated scanner xxx.jlex.java Yylex.class input program test.sim Output of P2.main

CS 536 Spring P2.java: or how to create & call the scanner public class P2 { public static void main(String[] args) { FileReader inFile = new FileReader(args[0]); Yylex scanner = new Yylex(inFile); Symbol token = scanner.next_token(); while (token.sym != sym.EOF) { switch (token.sym) { case sym.INTLITERAL: System.out.println("INTLITERAL (" + ((IntLitTokenVal)token.value).intVal \ + ")"); break; … } token = scanner.next_token(); }

CS 536 Spring JLex Structure user code % JLex directives % regular expression rules

CS 536 Spring Jlex Specification file (xxx.jlex) User code: copied to xxx.jlex.java, - use it to define auxiliary classes and methods. % JLex directives: macro definitions - use to specify what letters, digits, whitespace are. % Regular expression rules: -specify how to divide up input into tokens. -regular expressions are followed by actions -print error messages, return token codes -no need to put characters back to input (do by Jlex)

CS 536 Spring Regular expression rules regular-expression { action } pattern to be matchedcode to be executed when the pattern is matched When next_token() method is called, it repeats: Find the longest sequence of characters in the input (starting with the current character) that matches a pattern. Perform the associated action (plus “consume the matched lexeme”). until a return in an action is executed.

CS 536 Spring Matching rules If several patterns that match the same sequence of characters, then the longest pattern is considered to be matched. If several patterns that match the same (longest) sequence of characters, then the first such pattern is considered to be matched –so the order of the patterns can be important! If an input character is not matched in any pattern, the scanner throws an exception –make sure that there can be no unmatched characters, (otherwise the scanner will "crash" on bad input).

CS 536 Spring Regular expressions Similar to those discussed in class. –most characters match themselves: abc == while –characters in quotes, including special characters, except \”, match themselves “a|b” matches a|b not a or b “a\”\”\tb” matches a””\tb not a”” b

CS 536 Spring Regular-expression operators the traditional ones, plus the ? operator |means "or" *means zero or more instances of +means one or more instances of ?means zero or one instance of ()are used for grouping

CS 536 Spring More operators ^matches beginning of line ^main matches string “main” only when it appears at the beginning of line. $matches end of line main$ matches string “main” only when it appears at the end of line.

CS 536 Spring Character classes [abc] –matches one character (either a or b or c) [a-z] –matches any character between a and z, inclusive [^abc] –matches any character except a, b, or c. –^ has special meaning only at 1 st position in […] [\t\\] –matches tab or \ [a bc] is equivalent to a|" "|b|c –white-space in char class and strings matches itself

CS 536 Spring TEST YOURSELF #1 Question 1: –The character class [a-zA-Z] matches any letter. Write a character class that matches any letter or any digit. Question 2: –Write a pattern that matches any Pascal identifier (a sequence of one or more letters and/or digits, starting with a letter). Question 3: –Write a pattern that matches any Java identifier (a sequence of one or more letters and/or digits and/or underscores, starting with a letter or underscore. Question 4: –Write a pattern that matches any Java identifier that does not end with an underscore.

CS 536 Spring JLex directives specified in the second part of xxx.jlex. –can also specify (see the manual for details) the value to be returned on end-of-file, that line counting should be turned on, and that the scanner will be used with the parser generator java cup. directives includes macro definitions (very useful): –name = regular-expression name is any valid Java identifier –DIGIT= [0-9] –LETTER= [a-zA-Z] –WHITESPACE= [ \t\n] To use a macro, use its name inside curly braces. –{LETTER}({LETTER}|{DIGIT})*

CS 536 Spring TEST YOURSELF #2 Question: –Define a macro named NOTSPECIAL that matches any character except a newline, double quote, or backslash.

CS 536 Spring Comments You can include comments in the first and second parts of your JLex specification, –in the third part, JLex would think your comments are part of a pattern. –use Java comments // …

CS 536 Spring A Small Example % DIGIT= [0-9] LETTER= [a-zA-Z] WHITESPACE= [ \t\n] // space, tab, newline // for compatibility with java CUP %implements java_cup.runtime.Scanner %function next_token %type java_cup.runtime.Symbol // Turn on line counting %line …

CS 536 Spring Continued … % {LETTER}({LETTER}|{DIGIT}*) {System.out.println(yyline+1 + ": ID " + yytext());} {DIGIT}+ {System.out.println(yyline+1 + ": INT");} "=" {System.out.println(yyline+1 + ": ASSIGN");} "==" {System.out.println(yyline+1 + ": EQUALS");} {WHITESPACE}* { }. {System.out.println(yyline+1 + ": bad char");}

CS 536 Spring Another example (a snippet from sim.jlex) {DIGIT}+ { int val = (new Integer(yytext())).intValue(); Symbol S = new Symbol(sym.INTLITERAL, new IntLitTokenVal(yyline+1, CharNum.num, val)); CharNum.num += yytext().length(); return S; } {WHITESPACE}+ {CharNum.num += yytext().length();}