Presentation is loading. Please wait.

Presentation is loading. Please wait.

LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES.

Similar presentations


Presentation on theme: "LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES."— Presentation transcript:

1 LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES USING REGULAR EXPRESSIONS

2 LEXICAL ANALYSIS n Is the first step in the translation/compilation process input language ====> output language n means putting the raw characters of the input into TOKENS.

3 LEXICAL ANALYSIS PHASE n The language of TOKENS e.g. Identifiers is always a regular language. n REGULAR EXPRESSIONS generate regular languages (as do Regular Grammars..) The tokens of languages are often specified by regular expressions. n Finite State Machines consume regular languages

4 REGULAR EXPRESSIONS n One line method of specifying a language n equivalent to `type 3’ or regular grammars n used to parameterize UNIX/LINUX file processing commands

5 REGULAR EXPRESSIONS - DEFINITION EXAMPLE DEFINITION a | b ‘|’ means choice a | b | c = [abc] ‘[..]’ is shorthand for multiple choice  ‘  ‘ means the empty word (abc)* ‘*’ means repetition 0,1 or more.. (abcd)+ ‘+’ means repetition 1 or more times

6 REGULAR EXPRESSIONS - EXAMPLES n [a - z A - Z][a - z A - Z 0 - 9]* defines the language of IDENTIFIERS in some programming languages n (xyz)* defines the language { , xyz, xyzxyz, xyzxyzxyz,..} n [abcd]+ defines the language {a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca,..} Putting choice and repetition together produces complicated regular languages

7 Finite State Machines n Can be defined by annotated nodes and arcs. n Can translate Reg. Exps into FSMs but must add ERROR STATES onto the FSMs

8 Regular Expression ==> NDFSM ab [ab] a* then NDFSM ==> FSM.. a b a b a

9 Example n Specify a language of alphabet { w,x,y,z} with the only restrictions being that u 1. no strings contain both x and y, and u 2. If there is a y and w in a string, then the first w ALWAYS occurs before the first y SOLUTION: 1. 1. Write down exs and counter exs 2. 2. Decide on any ambiguities 3.. Use Case Analysis to sub-divide the problem language = (a) strings of { w,x,z} UNION (b) strings of { w,y,z} with restriction 2. - Part (a): = [w x z]+ - Part (b): can assume y is always in a string = [y z]+ | z* w [wz]* y [x y z]* -. Put together answer = [w x z]+ | [y z]+ | z* w [wz]* y [x y z]*

10 A LEXICAL ANALYSER - GENERATOR (e.g. LEX, JLEX) - how they work n INPUT REGULAR EXPRESSIONS n TRANSLATE REGULAR EXPRESSION INTO NON-DETERMINISTIC FSM n TRANSLATE NON-DETERMINISTIC FSM INTO DETERMINISTIC FSM (which is easily described as a simple program)

11 EXAMPLE INPUT TO A LEXICAL ANALYSER - GENERATOR % ";" { return new Symbol(sym.SEMI); } "+" { return new Symbol(sym.PLUS); } "*" { return new Symbol(sym.TIMES); } "(" { return new Symbol(sym.LPAREN); } ")" { return new Symbol(sym.RPAREN); } [0-9]+ { return new Symbol(sym.NUMBER, new Integer(yytext())); } [ \t\r\n\f] { /* ignore white space. */ }. { System.err.println("Illegal character: "+yytext()); } example; if string (231+3)*3 was input to the generated lexical analyser the output would be: LPAREN (NUMBER,231) PLUS (NUMBER,3) RPAREN TIMES (NUMBER,3)

12 Simple Lexical Analyser public class scanner { protected static int next_char; protected static void advance() throws java.io.IOException { next_char = System.in.read(); } public static void init() throws java.io.IOException { advance(); } public static Symbol next_token() throws java.io.IOException { for (;;) switch (next_char) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': /* parse a decimal integer */ int i_val = 0; do { i_val = i_val * 10 + (next_char - '0'); advance(); } while (next_char >= '0' && next_char <= '9'); return new Symbol(sym.INT, new Integer(i_val)); case 'p': advance(); return new Symbol(sym.PRINT); case 'r': advance(); return new Symbol(sym.REPEAT); case 'u': advance(); return new Symbol(sym.UNTIL); case '=': advance(); return new Symbol(sym.ASSIGNS); case ';': advance(); return new Symbol(sym.SEMI); case '+': advance(); return new Symbol(sym.PLUS); case '-': advance(); return new Symbol(sym.MINUS); case '(': advance(); return new Symbol(sym.LPAREN); case ')': advance(); return new Symbol(sym.RPAREN); case 'x': advance(); return new Symbol(sym.ID,"x"); case 'y': advance(); return new Symbol(sym.ID,"y"); case 'z': advance(); return new Symbol(sym.ID,"z"); case -1: return new Symbol(sym.EOF); default: advance(); break; } } };

13 Introduction to Grammar Theory n Grammars can be used to generate the syntax of all formal languages – the structural complexity of a language is determined by the simplest grammar that can generate it. n In order to create parsers, we are interested in “properties of grammars”. For example, the “first set” of a string w of terminals and non-terminals is the set of TERMINAL symbols (tokens) that may be at the front of ANY string derived from w using the grammar rules.

14 Summary: n Regular expressions are a quick and easy way to specify simple forms of language. They can be easily translated into FSMs (which have nice properties e.g. they have linear time complexity in their execution) n There are tools (JLEX) which input regular expressions and output a lexical analyser which recognises the language they define.


Download ppt "LANGUAGE TRANSLATORS: WEEK 14 LECTURE: REGULAR EXPRESSIONS FINITE STATE MACHINES LEXICAL ANALYSERS INTRO TO GRAMMAR THEORY TUTORIAL: CAPTURING LANGUAGES."

Similar presentations


Ads by Google