PROGRAMMING LANGUAGES

PROGRAMMING LANGUAGES
CS 222 LECTURE 06 PROGRAMMING LANGUAGES

Syntax and Semantics The syntax of a language defines the valid symbols and grammar. Syntax defines the structure of a program, i.e., the form that each program unit and each statement must use. The semantics defines the meaning of the grammar elements. Lexical structure is the form of lowest level syntactic units (words or tokens) of a grammar.

Syntax and Semantics Compared
Syntax: in Java, an assignment statement is: identifier = expression { operator expression } ; Semantics: an assignment statement must use compatible types, e.g. int n1, n2; n1 = 20*1024; // OK, int_var = int_expression n2 = 3.50; // illegal, incompatible types Lexical elements (Lexemes): "n2" "=" "3.50" ";"

Parts of a Compiler / Interpreter:
How are they used? Program Source Code Token stream Parse tree Intermediate code Tokenizer (Lexical Analysis) Parser (Syntax Analysis) Semantic Analysis Optimization and Code Generation Object code Parts of a Compiler / Interpreter:

Scanning and Parsing source file Scanner Parser input stream
parse tree sum = x1 + x2; sum = x1 + x2 ; x1 x2 tokens

Scanners Recognize regular expressions
Implemented as finite automata (finite state machines) Typically contain a loop that cycles through characters, building tokens and associated values by repeated operations scanner may be integrated as a function in the parser. Parser calls the Scanner to get the next token.

Scanners

Lexical Structure Lexemes are the smallest lexical unit of a language, grouped according to syntactic usage. Some types of lexemes in computer languages are: identifiers: x, println, _INIT, ArrayList numeric constants: 0, 10000, 2.98E+6 operators: =, +, -, ++, +=, *, / separators: [ ] ; : . , ( ) string literals: "hello there"

Lexical Structure A token is a structure representating a lexeme that explicitly indicates its categorization for the purpose of parsing. Lexemes are recognized by the first phase of a translator -- the scanner -- that deals directly with the input. The scanner separates the input into tokens. Scanners are also called lexers.

Types of lexemes Common Lexemes (classes of tokens)
identifiers: x, println, _INIT, ArrayList numeric constants: 0, 10000, 2.98E+6 assignment operators: =, +=, -=, *=, /=, %= arithmetic operators: *, /, +, -, % boolean operators: &&, ||, ^, ! separators: [ ] ; : . , ( ) string literals: "hello there“ Reserved words: may be defined as a class, or simply treat as identifiers at lexical level

Tokens = assignment operator ( expression delimiter sum identifier
Tokens are the strings of syntactic units. Example: what are the tokens in this statement? result = (sum - average)/count; Lexeme Tokens: result identifier = assignment operator ( expression delimiter sum identifier - arithmetic operator average identifier ) expression delimiter / arithmetic operator count identifier ; semi-colon (statement delimiter)

Here is an FA that recognizes a subset of tokens in the Pascal language:

when scanning “ temp := temp + 1 ”
The first token should be temp. From start state, then go to state q1 and loop in state q1 until get “ : ” . It will stop and return the first token “temp” then start to get the next token. Try scanning “ x := (y+10) * z1; ”

Parsing Algorithms Broadly divided into LL and LR.
LL algorithms match input directly to left-side symbols, then choose a right-side production that matches the tokens. This is top-down parsing LR algorithms try to match tokens to the right- side productions, then replace groups of tokens with the left-side nonterminal. They continue until the entire input has been "reduced" to the start symbol

Parsing Algorithms Look ahead:
algorithms must look at next token(s) to decide between alternate productions for current tokens LL(1) means LL with 1 token look-ahead LL algorithms are simpler and easier to visualize. LR algorithms are more powerful: can parse some grammars that LL cannot, such as left recursion.

Top-down Parsing Example (LL)
Grammar rule : Input String : x – 2 * y Tokens : id – number * id Rule Number 1 2 3 4 5 6 7 8 9 10

Elimination of Left Recursion

Eliminating Left Recursion

LL Parsing Example Let try input String : x – 2 * y
Tokens : id – number * id

Rule Sentential Form Input - Goal Expr Term Expr Factor Term Expr
<id,x> Term Expr <id,x>  Expr <id,x> +Term Expr <id,x> - Term Expr <id,x> - Factor Term Expr <id,x> - <number,2> Term Expr <id,x> - <number,2> * Factor Term Expr <id,x> - <number,2> * <id,y> Term Expr <id,x> - <number,2> * <id,y>  Expr <id,x> - <number,2> * <id,y>  <id,x> - <number,2> * <id,y> x – 2 * y x – 2 * y x – 2 * y x – 2  * y x – 2 *  y x – 2 * y 

PROGRAMMING LANGUAGES

Similar presentations

Presentation on theme: "PROGRAMMING LANGUAGES"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PROGRAMMING LANGUAGES

Similar presentations

Presentation on theme: "PROGRAMMING LANGUAGES"— Presentation transcript:

Similar presentations

About project

Feedback