LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.

Slides:



Advertisements
Similar presentations
Application: Yacc A parser generator A context-free grammar An LR parser Yacc Yacc input file:... definitions... %... production rules... %... user-defined.
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Lexical Analysis Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!
Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.
Lexical Analysis III Recognizing Tokens Lecture 4 CS 4318/5331 Apan Qasem Texas State University Spring 2015.
1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.
Bottom-Up Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter (modified)
Context-Free Grammars Lecture 7
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Languages and Machines Unit two: Regular languages and Finite State Automata.
1 Material taught in lecture Scanner specification language: regular expressions Scanner generation using automata theory + extra book-keeping.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
1 Lexical Analysis - An Introduction Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Automated Parser Generation (via CUP)CUP 1. High-level structure JFlexjavac Lexer spec Lexical analyzer text tokens.java CUPjavac Parser spec.javaParser.
LANGUAGE TRANSLATORS: WEEK 3 LECTURE: Grammar Theory Introduction to Parsing Parser - Generators TUTORIAL: Questions on grammar theory WEEKLY WORK: Read.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
COMP Parsing 2 of 4 Lecture 22. How do we write programs to do this? The process of getting from the input string to the parse tree consists of.
Grammars CPSC 5135.
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lexical Analysis (I) Compiler Baojian Hua
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Lexical and Syntax Analysis
CSE 5317/4305 L2: Lexical Analysis1 Lexical Analysis Leonidas Fegaras.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
A Programming Languages Syntax Analysis (1)
Introduction to Compiling
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Com Functional Programming Lexical Analysis Marian Gheorghe Lecture 15 Module homepage Mole & ©University of Sheffieldcom2010.
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Lecture 3: Parsing CS 540 George Mason University.
Compiler Principles Fall Compiler Principles Lecture 5: Parsing part 4 Roman Manevich Ben-Gurion University.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
Chapter 4: Syntax analysis Syntax analysis is done by the parser. –Detects whether the program is written following the grammar rules and reports syntax.
using Deterministic Finite Automata & Nondeterministic Finite Automata
1 Topic 2: Lexing and Flexing COS 320 Compiling Techniques Princeton University Spring 2016 Lennart Beringer.
Language Translation Part 2: Finite State Machines.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Deterministic Finite Automata Nondeterministic Finite Automata.
Department of Software & Media Technology
CS510 Compiler Lecture 2.
Lecture 2 Lexical Analysis
Finite-State Machines (FSMs)
Finite-State Machines (FSMs)
Regular Languages.
Formal Language Theory
CS 363 Comparative Programming Languages
Department of Software & Media Technology
Lexical and Syntax Analysis
Chapter 7 Regular Grammars
Review: Compiler Phases:
Lecture 4: Lexical Analysis & Chomsky Hierarchy
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Lecture 5 Scanning.
Presentation transcript:

LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES USING REGULAR EXPRESSIONS

LEXICAL ANALYSIS n Is the first step in the translation/compilation process input language ====> output language n means putting the raw characters of the input into TOKENS.

LEXICAL ANALYSIS PHASE n The language of TOKENS e.g. Identifiers is always a regular language. n REGULAR EXPRESSIONS generate regular languages (as do Regular Grammars..) The tokens of languages are often specified by regular expressions. n Finite State Machines consume regular languages

REGULAR EXPRESSIONS n One line method of specifying a language n equivalent to `type 3’ or regular grammars n used to parameterize UNIX/LINUX file processing commands

REGULAR EXPRESSIONS - DEFINITION EXAMPLE DEFINITION a | b ‘|’ means choice a | b | c = [abc] ‘[..]’ is shorthand for multiple choice  ‘  ‘ means the empty word (abc)* ‘*’ means repetition 0,1 or more.. (abcd)+ ‘+’ means repetition 1 or more times

REGULAR EXPRESSIONS - EXAMPLES n [a - z A - Z][a - z A - Z 0 - 9]* defines the language of IDENTIFIERS in some programming languages n (xyz)* defines the language { , xyz, xyzxyz, xyzxyzxyz,..} n [abcd]+ defines the language {a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca,..} Putting choice and repetition together produces complicated regular languages

Finite State Machines n Can be defined by annotated nodes and arcs. n Can translate Reg. Exps into FSMs but must add ERROR STATES onto the FSMs

Regular Expression ==> NDFSM ab [ab] a* then NDFSM ==> FSM.. a b a b a

Example n Specify a language of alphabet { w,x,y,z} with the only restrictions being that u 1. no strings contain both x and y, and u 2. If there is a y and w in a string, then the first w ALWAYS occurs before the first y SOLUTION: Write down exs and counter exs Decide on any ambiguities 3.. Use Case Analysis to sub-divide the problem language = (a) strings of { w,x,z} UNION (b) strings of { w,y,z} with restriction 2. - Part (a): = [w x z]+ - Part (b): can assume y is always in a string = [y z]+ | z* w [wz]* y [x y z]* -. Put together answer = [w x z]+ | [y z]+ | z* w [wz]* y [x y z]*

A LEXICAL ANALYSER - GENERATOR (e.g. LEX, JLEX) - how they work n INPUT REGULAR EXPRESSIONS n TRANSLATE REGULAR EXPRESSION INTO NON-DETERMINISTIC FSM n TRANSLATE NON-DETERMINISTIC FSM INTO DETERMINISTIC FSM (which is easily described as a simple program)

EXAMPLE INPUT TO A LEXICAL ANALYSER - GENERATOR % ";" { return new Symbol(sym.SEMI); } "+" { return new Symbol(sym.PLUS); } "*" { return new Symbol(sym.TIMES); } "(" { return new Symbol(sym.LPAREN); } ")" { return new Symbol(sym.RPAREN); } [0-9]+ { return new Symbol(sym.NUMBER, new Integer(yytext())); } [ \t\r\n\f] { /* ignore white space. */ }. { System.err.println("Illegal character: "+yytext()); } example; if string (231+3)*3 was input to the generated lexical analyser the output would be: LPAREN (NUMBER,231) PLUS (NUMBER,3) RPAREN TIMES (NUMBER,3)

Simple Lexical Analyser public class scanner { protected static int next_char; protected static void advance() throws java.io.IOException { next_char = System.in.read(); } public static void init() throws java.io.IOException { advance(); } public static Symbol next_token() throws java.io.IOException { for (;;) switch (next_char) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': /* parse a decimal integer */ int i_val = 0; do { i_val = i_val * 10 + (next_char - '0'); advance(); } while (next_char >= '0' && next_char <= '9'); return new Symbol(sym.INT, new Integer(i_val)); case 'p': advance(); return new Symbol(sym.PRINT); case 'r': advance(); return new Symbol(sym.REPEAT); case 'u': advance(); return new Symbol(sym.UNTIL); case '=': advance(); return new Symbol(sym.ASSIGNS); case ';': advance(); return new Symbol(sym.SEMI); case '+': advance(); return new Symbol(sym.PLUS); case '-': advance(); return new Symbol(sym.MINUS); case '(': advance(); return new Symbol(sym.LPAREN); case ')': advance(); return new Symbol(sym.RPAREN); case 'x': advance(); return new Symbol(sym.ID,"x"); case 'y': advance(); return new Symbol(sym.ID,"y"); case 'z': advance(); return new Symbol(sym.ID,"z"); case -1: return new Symbol(sym.EOF); default: advance(); break; } } };

Introduction to Grammar Theory n Grammars can be used to generate the syntax of all formal languages – the structural complexity of a language is determined by the simplest grammar that can generate it. n In order to create parsers, we are interested in “properties of grammars”. For example, the “first set” of a string w of terminals and non-terminals is the set of TERMINAL symbols (tokens) that may be at the front of ANY string derived from w using the grammar rules.

Summary: n Regular expressions are a quick and easy way to specify simple forms of language. They can be easily translated into FSMs (which have nice properties e.g. they have linear time complexity in their execution) n There are tools (JLEX) which input regular expressions and output a lexical analyser which recognises the language they define.