Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROGRAMMING LANGUAGES

Similar presentations


Presentation on theme: "PROGRAMMING LANGUAGES"— Presentation transcript:

1 PROGRAMMING LANGUAGES
CS 222 LECTURE 06 PROGRAMMING LANGUAGES

2 Syntax and Semantics The syntax of a language defines the valid symbols and grammar. Syntax defines the structure of a program, i.e., the form that each program unit and each statement must use. The semantics defines the meaning of the grammar elements. Lexical structure is the form of lowest level syntactic units (words or tokens) of a grammar.

3 Syntax and Semantics Compared
Syntax: in Java, an assignment statement is: identifier = expression { operator expression } ; Semantics: an assignment statement must use compatible types, e.g. int n1, n2; n1 = 20*1024; // OK, int_var = int_expression n2 = 3.50; // illegal, incompatible types Lexical elements (Lexemes): "n2" "=" "3.50" ";"

4 Parts of a Compiler / Interpreter:
How are they used? Program Source Code Token stream Parse tree Intermediate code Tokenizer (Lexical Analysis) Parser (Syntax Analysis) Semantic Analysis Optimization and Code Generation Object code Parts of a Compiler / Interpreter:

5 Scanning and Parsing source file Scanner Parser input stream
parse tree sum = x1 + x2; sum = x1 + x2 ; x1 x2 tokens

6 Scanners Recognize regular expressions
Implemented as finite automata (finite state machines) Typically contain a loop that cycles through characters, building tokens and associated values by repeated operations scanner may be integrated as a function in the parser. Parser calls the Scanner to get the next token.

7 Scanners

8 Lexical Structure Lexemes are the smallest lexical unit of a language, grouped according to syntactic usage. Some types of lexemes in computer languages are: identifiers: x, println, _INIT, ArrayList numeric constants: 0, 10000, 2.98E+6 operators: =, +, -, ++, +=, *, / separators: [ ] ; : . , ( ) string literals: "hello there"

9 Lexical Structure A token is a structure representating a lexeme that explicitly indicates its categorization for the purpose of parsing. Lexemes are recognized by the first phase of a translator -- the scanner -- that deals directly with the input. The scanner separates the input into tokens. Scanners are also called lexers.

10 Types of lexemes Common Lexemes (classes of tokens)
identifiers: x, println, _INIT, ArrayList numeric constants: 0, 10000, 2.98E+6 assignment operators: =, +=, -=, *=, /=, %= arithmetic operators: *, /, +, -, % boolean operators: &&, ||, ^, ! separators: [ ] ; : . , ( ) string literals: "hello there“ Reserved words: may be defined as a class, or simply treat as identifiers at lexical level

11 Tokens = assignment operator ( expression delimiter sum identifier
Tokens are the strings of syntactic units. Example: what are the tokens in this statement? result = (sum - average)/count; Lexeme Tokens: result identifier = assignment operator ( expression delimiter sum identifier - arithmetic operator average identifier ) expression delimiter / arithmetic operator count identifier ; semi-colon (statement delimiter)

12 Here is an FA that recognizes a subset of tokens in the Pascal language:

13 when scanning “ temp := temp + 1 ”
The first token should be temp. From start state, then go to state q1 and loop in state q1 until get “ : ” . It will stop and return the first token “temp” then start to get the next token. Try scanning “ x := (y+10) * z1; ”

14 Parsing Algorithms Broadly divided into LL and LR.
LL algorithms match input directly to left-side symbols, then choose a right-side production that matches the tokens. This is top-down parsing LR algorithms try to match tokens to the right- side productions, then replace groups of tokens with the left-side nonterminal. They continue until the entire input has been "reduced" to the start symbol

15 Parsing Algorithms Look ahead:
algorithms must look at next token(s) to decide between alternate productions for current tokens LL(1) means LL with 1 token look-ahead LL algorithms are simpler and easier to visualize. LR algorithms are more powerful: can parse some grammars that LL cannot, such as left recursion.

16 Top-down Parsing Example (LL)
Grammar rule : Input String : x – 2 * y Tokens : id – number * id Rule Number 1 2 3 4 5 6 7 8 9 10

17

18

19

20

21 Top-down Parsing Example (LL)
Grammar rule : Input String : x – 2 * y Tokens : id – number * id Rule Number 1 2 3 4 5 6 7 8 9 10

22

23

24 Elimination of Left Recursion

25 Elimination of Left Recursion
Here is the grammar again: S ::= A | B A ::= ABc | AAdd | a | aa B ::= Bee | b An equivalent right-recursive grammar: A ::= aA′ | aaA′ A′ ::= BcA′ | AddA′ |  B ::= bB′ B′ ::= eeB′ | 

26 Eliminating Left Recursion

27 LL Parsing Example Let try input String : x – 2 * y
Tokens : id – number * id

28 Rule Sentential Form Input - Goal Expr Term Expr Factor Term Expr
<id,x> Term Expr <id,x>  Expr <id,x> +Term Expr <id,x> - Term Expr <id,x> - Factor Term Expr <id,x> - <number,2> Term Expr <id,x> - <number,2> * Factor Term Expr <id,x> - <number,2> * <id,y> Term Expr <id,x> - <number,2> * <id,y>  Expr <id,x> - <number,2> * <id,y>  <id,x> - <number,2> * <id,y> x – 2 * y x – 2 * y x – 2 * y x – 2  * y x – 2 *  y x – 2 * y 


Download ppt "PROGRAMMING LANGUAGES"

Similar presentations


Ads by Google