Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

Similar presentations

Presentation on theme: "1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa."— Presentation transcript:

1 1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS400 Compiler Construction

2 2  Formalization January 18, 2016 2 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction Lexical Analysis & Lexical Analyzer Generators  Regular Expressions  Finite Automata  RE  Conversion  FA  Lexer Design

3 3 January 18, 2016 3 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction Keep in mind with following questions Regular Expressions –a–a concise and flexible means for identifying strings of text –w–written in a formal language –I–Interpreted by a RegEx processor Why RegEx –P–Precise definition of language –L–Layered definition of language –L–Lexical/Syntax/Semantic Further use of RegEx –S–Supportive foundation of Lexer –F–Formal communication –C–Common application ***

4 4 January 18, 2016 4 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction Why: Language Definition Problem How to precisely define language Layered structure of language definition –Start with a set of letters in language –Lexical structure - identifies “words” in language (each word is a sequence of letters) –Syntactic structure - identifies “sentences” in language (each sentence is a sequence of words) –Semantics - meaning of program (specifies what result should be for each input) –Today’s topic: lexical and syntactic structures

5 5 Basis symbols: –  is a regular expression denoting language {  } –a   is a regular expression denoting {a} If r and s are regular expressions denoting languages L(r) and M(s) respectively, then –r  s is a regular expression denoting L(r)  M(s) –rs is a regular expression denoting L(r)M(s) –r * is a regular expression denoting L(r) * –(r) is a regular expression denoting L(r) A language defined by a regular expression is called a regular set January 18, 2016 5 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction Specification of Patterns for Tokens: Regular Expressions

6 6 January 18, 2016 6 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

7 7 January 18, 2016 7 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

8 8 Specification of Patterns for Tokens: Regular Definitions Regular definitions introduce a naming convention: d 1  r 1 d 2  r 2 … d n  r n where each r i is a regular expression over   {d 1, d 2, …, d i-1 } Any d j in r i can be textually substituted in r i to obtain an equivalent set of definitions January 18, 2016 8 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

9 9 Specification of Patterns for Tokens: Regular Definitions Example: letter  A  B  …  Z  a  b  …  z digit  0  1  …  9 id  letter ( letter  digit ) * Regular definitions are not recursive: digits  digit digits  digitwrong! January 18, 2016 9 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

10 10 Specification of Patterns for Tokens: Notational Shorthand The following shorthands are often used: r + = rr * r? = r  [ a - z ] = a  b  c  …  z Examples: digit  [ 0 - 9 ] num  digit + (. digit + )? ( E (+  -)? digit + )? January 18, 2016 10 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

11 11 Regular Definitions and Grammars stmt  if expr then stmt  if expr then stmt else stmt   expr  term relop term  term term  id  num if  if then  then else  else relop   >  >=  = id  letter ( letter | digit ) * num  digit + (. digit + )? ( E (+  -)? digit + )? Grammar Regular definitions January 18, 2016 11 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

12 12 Coding Regular Definitions in Transition Diagrams January 18, 2016 12 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

13 13 Coding Regular Definitions in Transition Diagrams: Code token nexttoken() { while (1) { switch (state) { case 0: c = nextchar(); if (c==blank || c==tab || c==newline) { state = 0; lexeme_beginning++; } else if (c==‘<’) state = 1; else if (c==‘=’) state = 5; else if (c==‘>’) state = 6; else state = fail(); break; case 1: … case 9: c = nextchar(); if (isletter(c)) state = 10; else state = fail(); break; case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11; break; … int fail() { forward = token_beginning; swith (start) { case 0: start = 9; break; case 9: start = 12; break; case 12: start = 20; break; case 20: start = 25; break; case 25: recover(); break; default: /* error */ } return start; } Decides the next start state to check January 18, 2016 13 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

14 14 Common Application of Regular Expressions January 18, 2016 14 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction  Validate passwords and email addresses  Extract specific sections from an HML page  Parse data files  Replace values (strings)

15 15 January 18, 2016 15 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction Got it with following questions Regular Expressions –a–a concise and flexible means for identifying strings of text –w–written in a formal language –I–Interpreted by a RegEx processor Why RegEx –P–Precise definition of language –L–Layered definition of language –L–Lexical/Syntax/Semantic Further use of RegEx –S–Supportive foundation of Lexer –F–Formal communication –C–Common application ***

16 16 Thank you very much! Questions? January 18, 2016 16 Azusa Pacific University, Azusa, CA 91702, Tel: (800) 825-5278 Department of Computer Science, CS@APU: CS400 Compiler Construction

Download ppt "1 January 18, 2016 1 January 18, 2016January 18, 2016January 18, 2016 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa."

Similar presentations

Ads by Google