Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in.

Similar presentations


Presentation on theme: "1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in."— Presentation transcript:

1 1 ScanGen

2 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in spring 1979 and modified and enhanced by Robert Gray in spring 1980 Later changes were made by Charles Fischer in 1981 and 1982

3 3 ScanGen User defines the input to ScanGen in the form of a file with three sections:  Options,  Character Classes,  Token Definitions: Token name {minor,major} = regular expression Regular expression can include except clauses, and {Toss} attributes Example of ScanGen input: textbook page 61: extended Micro

4 4 Options Section The is optional Followed by one or more option names (which are not reserved) The option names may appear in any order, separated by blanks or commas

5 5 Class Definitions The specify the character classes that make up the alphabet used by the regular expressions. The character classes are sets of ASCII characters, which are defined, as in the example, by using single characters within quotes or by using ranges of characters.

6 6 Regular Expression definitions Specify the tokens, are built using the character classes and the following operations  positive closure ("+")  Kleene closure ("*")  concatenation (".")  union (",")  Precedences can be overridden by use of parentheses

7 7 Token Number Major token number  Define a token class Minor token number  Specify the member of that class.  If not specified default value "0". Token numbers  Must be non-negative integers  Same token number may be used for different tokens Tokens that are to be deleted (comments, spaces, etc)  Are assigned a major token number of "0".

8 8 "NOT" operation Used in  definitions of "StrConst" and "RunOnString". Only be used to  complement a union of character classes. The complement is taken relative to the classes specified in the class definitions.  character class "EPSILON" stays out of complements.

9 9 "TOSS" feature Tell the scanner  whether or not to append a character to the token string it is building. If a character is not to be appended  put a "{TOSS}" after the name of its character class in the token definition. A "{TOSS}" may only appear  after the name of a character class or after "NOT(...)". Careless use of the TOSS feature  can lead to a toss/save conflict

10 10 "TOSS" feature For example  a toss/save conflict would occur if "StrConst" were defined by:  Quote{TOSS}. (NOT(Quote, Linefeed), Quote.Quote{TOSS})*. Quote{TOSS}  This conflict can be seen by comparing scanner actions on the strings 'a' and 'a''b'.

11 11 Example 1 OPTIONS tables,list CLASS letter = 'A'..'Z', 'a'..'z'; digit = '0'..'9'; blank = ' '; DEFINITION TOKEN emptyspace {0} = blank+; TOKEN identifier {1} = letter.(letter, digit)*; TOKEN number {2} = digit+;

12 12 Example 2 OPTIONS List, tables CLASS E = 'E', 'e'; OtherLetter = 'A'..'D','F'..'Z','a'..'d','f'..'z'; Digit = '0'..'9'; Blank = ' '; Dot = '.'; Plus = '+'; Minus = '-'; Quote = ''''; Linefeed = 10;

13 13 Example 2 DEFINITION TOKEN EmptySpace {0} = (Blank, Linefeed)+; Letter = E, OtherLetter; TOKEN Identifier {1} = Letter.(Letter,Digit)* EXCEPT 'BEGIN' {4}, 'END' {5}; TOKEN IntConst {2,1} = Digit+; TOKEN RealConst {2,2} = IntConst.Dot.IntConst. (EPSILON, E.(EPSILON, Plus, Minus).IntConst); TOKEN StrConst {2,3} = Quote{TOSS}. (NOT(Quote, Linefeed),Quote{TOSS}.Quote)*. Quote{TOSS}; TOKEN RunOnString {3} = Quote{TOSS}. (NOT(Quote, Linefeed), Quote{TOSS}.Quote)*. Linefeed{TOSS};

14 14 ScanGen Driver The driver routine provides the actual scanner routine, which is called by the parser. void scanner(codes *major, codes *minor, char *token_text) It reads the input character stream, and drives the finite automata, using the tables generated by ScanGen, and returns the found token.

15 15 ScanGen Tables The finite automata table has the form next_state[NUMSTATES][NUMCHARS] In addition, an action table tells the driver when a complete token is recognized and what to do with the “ lookahead ” character: action[NUMSTATES][NUMCHARS]

16 16 Action Table The action table has 6 possible values: ERROR MOVEAPPEND MOVENOAPPEND HALTAPPEND HALTNOAPPEND HALTREUSE scan error. current_token += ch and go on. discard ch and go on. current_token += ch, token found, return it. discard ch, token found, return it. save ch for later reuse, token found, return it. Driver program on textbook pages 65,66

17 17 Output tables This file consists of the following five sections:  Section 1: Parameters for the Scanner  Section 2: Character Class Mapping  Section 3: Reserved Word to Token Mapping.  Section 4: Reserved Word List  Section 5: Transition Table of the Minimal Deterministic Finite Automaton.

18 18

19 19 Execute ScanGen 1.Download the SCANGEN.zip and expand into cygwin /usr/src/scangen directory 2. Run./scangen.exe < adacs.scan 3. Type Tables when show s c a n g e n -- automatic lexical analyzer generator version 2.0 (12/82) options used for this run: tables, optimize construction of finite automaton completed Output file `Tables': Tables

20 20 Complie and Test Scanner 1.Download the scanner.example.rar and expand into cygwin /usr/src/scanner.example directory 2. Copy Tables file from /usr/src/scangen into /usr/src/scanner.example 3.compile with makefile  make 4. run a.exe ./a 5. type source file  test


Download ppt "1 ScanGen. 2 Scangen accepts descriptions of tokens written as regular produces tables for a finite automata driver program written by Gary Sevitsky in."

Similar presentations


Ads by Google