Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying.

Similar presentations


Presentation on theme: "1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying."— Presentation transcript:

1 1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying what kind of token has been read (e.g. identifier, operator, integer literal, etc.) Once the scanner identifies a token, it sends it off to the parser and starts over with the next word. Some tokens need additional data to be carried along with them For example, an identifier token needs to have the identifier itself attached to it. Alternatively, the scanner generates a file of tokens which is then input to the parser.

2 2 The scanning process A simple hand-written scanner would look a bit like this: … nextchar = getNextChar(); switch (nextchar) { case '(': return LPAREN; /* return LPAREN token */ case 0: case 1:... case 9: nextchar = getNextChar(); while (nextchar is a digit) { concat the digits to build an integer nextchar = getNextChar(); } putBack(nextchar) make a new INTEGER token with the integer value attached return INTEGER;... } …

3 3 The scanning process Not always as simple as it seems Example from old versions of FORTRAN: Instead of writing a scanner by hand, we can automate the process. Specify what needs to be recognized and what to do when something is recognized. Have a scanner generator create the scanner based on our specification. Hand-written vs. automated scanner DO 5 I=1,10 vs. DO 5 I=1.10

4 4 The scanning process Specify what needs to be recognized. Some tokens are easy to identify e.g. = is an assignment operator, ( is a parenthesis Others are more complex How would the scanner recognize an identifier? The set of possible identifiers is very large or even infinite (assuming no length restrictions) SOLUTION: Recognize a pattern! Example: An identifier is a sequence of letters or digits that starts with a letter. We need a way to describe this pattern to our scanner generator. Regular expressions come to the rescue!

5 5 The scanning process Definition: Regular expressions (over alphabet  )  is an RE denoting {  } If , then  is an RE denoting {  } If r and s are REs, then (r) is an RE denoting L(r) r|s is an RE denoting L(r)  L(s) rs is an RE denoting L(r)L(s) r* is an RE denoting the Kleene closure of L(r) Property: REs are closed under many operations This allows us to build complex REs.

6 6 Regular Definitions A regular expression that describes digits is: 0|1|2|3|4|5|6|7|8|9 For convenience, we'd like to give it a name and then use the name in building more complex regular expressions: digit  0|1|2|3|4|5|6|7|8|9 This is called a regular definition. Example letter  a|...|z|A|...|Z ident  letter (letter | digit)*

7 7 What’s next Given an input string, we need a “machine” that has a regular expression hard-coded in it and can tell whether the input string matches the pattern described by the regular expression or not. A machine that determines whether a given string belongs to a language is called a finite automaton.

8 8 The scanning process Definition: Deterministic Finite Automaton a five-tuple ( , S, , s 0, F) where  is the alphabet S is the set of states  is the transition function (S  S) s 0 is the starting state F is the set of final states (F  S) Notation: Use a transition diagram to describe a DFA states are nodes, transitions are directed, labeled edges, some states are marked as final, one state is marked as starting If the automaton stops at a final state on end of input, then the input string belongs to the language.

9 9 The scanning process Goal: automate the process Idea: Start with an RE Build a DFA How? We can build a non-deterministic finite automaton (Thompson's construction) Convert that to a deterministic one (Subset construction) Minimize the DFA (Hopcroft's algorithm) Implement it Existing scanner generator: flex


Download ppt "1 The scanning process Main goal: recognize words/tokens Snapshot: At any point in time, the scanner has read some input and is on the way to identifying."

Similar presentations


Ads by Google