D Goforth COSC Translating High Level Languages
D Goforth COSC Stages of translation Lexical analysis Syntactic analysis Code generation Linking Before Execution
D Goforth COSC Lexical analysis Translate stream of characters into lexemes Lexemes belong to categories called tokens Token identity of lexemes is used at the next stage of syntactic analysis
D Goforth COSC Examples: tokens and lexemes Some token categories contain only one lexeme: semi-colon ; Some tokens categorize many lexemes: identifier count, maxCost,…
D Goforth COSC Tokens and Lexemes yVal = x – min ( 100, 4xVal )); Lexical analysis identifies lexemes and their token type recognizes illegal lexemes (4xVal) does NOT identify syntax error: ) ) identifier illegal lexeme left_paren equal_sign
D Goforth COSC Syntax or Grammar of Language rules for generating (used by programmer) or recognizing (used by syntactic analyzer in translation a valid sequence of lexemes
D Goforth COSC Grammars 4 categories of grammars (Chomsky) Two categories are important in computing: Regular expressions (pattern matching) Context-free grammars (programming languages)
D Goforth COSC Context-free grammar Meta-language for describing languages States rules or productions for what lexeme sequences are correct in the language Written in Backus-Naur Form (BNF)
D Goforth COSC Example of BNF rule PROBLEM: how to recognize all these as correct? y = x f = rVec.length + 1 button[4].label = “Exit” RULE for defining assignment statement: = Assumes other rules for,
D Goforth COSC BNF rules Non-terminal and terminal symbols: Non-terminals are defined by at least one rule Terminals are tokens or lexemes =
D Goforth COSC Simple sample grammar(p.113) = A | B | C // lexical + | * | ( ) | Assumes other rules for,
D Goforth COSC Simple sample production = <- apply one rule at each step B = to leftmost non-terminal B = * B = A * B = A * ( ) B = A * ( + ) B = A * ( C + ) B = A * ( C + C )
D Goforth COSC Sample parse tree = + * B A ( ) C C Leaves represent the sentence of lexemes
D Goforth COSC Ambiguous grammar Different parse trees for same sentence Different translations for same sentence Different machine code for same source code!
D Goforth COSC Grammars for ‘human’ conventions Putting features of languages into grammars: expression any length precedence - an extra non-terminal associativity - order in recursive rules nested if statements - “dangling else” problem: p. 119
D Goforth COSC Forms for grammars Backus-Naur form (BNF) Extended Backus-Naur fomr (EBNF) -shortens set of rules Syntax graphs -easier to read for learning language
D Goforth COSC EBNF optional zero or one occurrence -> [ + ] optional zero or more occurrences -> { + } ‘or’ choice of alternative symbols -> [ (*|/) ]
Syntax Graph - basic structures expr term factor * / expr term + - factor * / term
BNF (p. 121)EBNF Syntax Graph -> + | - | -> * | / | -> [ (+|-)] -> [ (*\/)] -> {(+|-) } -> {(*|/) } expr term + - factor * /
D Goforth COSC Attribute grammars Problem: context-free grammars cannot describe some features needed in programming e.g.: rules for using data types