Principles of Programming Languages

Principles of Programming Languages
P. S. Suryateja Asst. Professor, CSE Dept Vishnu Institute of Technology

UNIT – 1 SYNTAX & SEMANTICS

General Problem of Describing Syntax
A language is a set of strings of characters from some alphabet. The strings of a language are called as sentences or statements. The syntax rules specify which strings belong to the language. Lowest level syntactic units are known as lexemes. Lexemes of a programming language include numeric literals, operators, special words etc...

General Problem of Describing Syntax (cont...)
Lexemes are partitioned into groups like identifiers, keywords, literals etc. A token of a language is a category of its lexemes.

General Problem of Describing Syntax (cont...)
Consider the following statement: index = 2 * count + 17; Lexemes Tokens index identifier = equal_sign 2 int_literal * mult_op count identifier + plus_op 17 int_literal ; semicolon

Language Recognizers A language can be defined in two ways: by recognition and by generation. For a language L that uses an alphabet Σ of characters, we need to construct a mechanism R, called a recognition device. The recognition device would indicate whether the string formed with characters from alphabet is in the language L or not. The syntax analysis part of a compiler is a recognizer for the language the compiler translates.

Language Generators A generator is a device used to generate the sentences of a language. Generator is a device of limited usefulness as a language descriptor as the sentence generated by a generator is unpredictable. Example for language recognizer is a Finite State Automata (FSA) and example for language generator is CFG.

Formal Methods of Describing Syntax – Context-Free Grammars
Two of the four Chomsky’s classes of grammars namely regular grammars and context-free grammars are used to describe the syntax of programming languages. Regular grammars are for describing tokens. Context-free grammars are for describing the syntax of whole programming languages.

Formal Methods of Describing Syntax – Backus-Naur Form (BNF)
John Backus presented a paper describing ALGOL 58 which introduced a new formal notation for specifying programming language syntax. Later Peter Naur slightly modified the notation proposed by Backus for ALGOL 60. This revised notation is called as Backus-Naur Form (BNF).

BNF - Fundamentals A meta-language is a language that is used to describe another language. BNF is a meta-language for programming languages. BNF uses abstractions for syntactic structures. Abstraction names are enclosed with angular brackets (< >). For example, the abstraction for an assignment statement can be <assign> and its definition is as follows: <assign> -> <var> = <expression> The text on the left side of the arrow is called left-hand side (LHS), is the abstraction being defined. The text to the right of the arrow is called as right-hand side (RHS), which is the definition of LHS and can contain a mixture of tokens, lexemes or other abstractions.

BNF – Fundamentals (cont...)
The LHS and RHS combined is called a rule or production. Example for the <assign> definition: total = s1 + s2 The abstractions in a BNF description, or a grammar, are often called as non-terminals and the lexemes and tokens of the rules are called terminals. A BNF description or a grammar is a collection of rules.

A Java if statement can be described with the following rules: <if_stmt> -> if (<logic_expr>) <stmt> <if_stmt> -> if (<logic_expr>) <stmt> else <stmt> Above two rules can be combined as follows: <if_stmt> -> if (<logic_expr>) <stmt> | if (<logic_expr>) <stmt> else <stmt>

BNF does not contain ellipsis (...) to represent variable-length lists. Instead it uses recursion in the rules. A rule is said to be recursive if the LHS appears in its RHS as shown below: <iden_list> -> identifier | identifier, <iden_list>

Grammars and Derivations
A grammar is a generative device for defining languages. Sentences are generated through a sequence of application of the rules, beginning with a special non-terminal symbol known as start symbol. The sequence of rule applications is called a derivation. For a programming language, the start symbol often refers the entire program and is denoted as <program>.

Grammars and Derivations (cont...)
Adopted from Concepts of Programming Languages - Sebesta

A derivation of a program is as follows: Adopted from Concepts of Programming Languages - Sebesta

The symbol => is read as “derives”. Each of the strings in the derivation, including <program>, is called a sentential form. Derivations in which always the left most non-terminals are replaced are known as leftmost derivations. The sentential form consisting of only terminals, or lexemes, is the generated sentence.

Parse Trees Grammars naturally describe the hierarchical structure of sentences. These hierarchical structures are known as parse trees. Every internal node in a parse tree is a non-terminal symbol. Every leaf node is a terminal symbol. Every sub-tree describes one instance of an abstraction in the sentence.

Adopted from Concepts of Programming Languages - Sebesta
Parse Trees (cont...) Adopted from Concepts of Programming Languages - Sebesta

Ambiguity A grammar is said to be ambiguous if a string derived by using the grammar has more than one parse tree. Adopted from Concepts of Programming Languages - Sebesta

Ambiguity (cont...) Parse trees for the string A = B + C * A Adopted from Concepts of Programming Languages - Sebesta

Operator Precedence The mechanism which allows the implementation to choose one operator among several operators for evaluation is know as operator precedence. Ambiguous grammars makes it difficult to choose one operator over another. General rule is to execute the operator which is lower in the parse tree.

Operator Precedence (cont...)
Parse trees for the string A = B + C * A Adopted from Concepts of Programming Languages - Sebesta In one parse tree * is lower and in another + is lower. Which one to choose?

Correct ordering is specified by using separate non-terminals to represent the operands of operators that require different precedence. Previous grammar can be re-written (unambiguous) as follows: Adopted from Concepts of Programming Languages - Sebesta

Associativity The semantic rule which specifies the precedence in case of same level operators is known as associativity. If the LHS of a rule appears first in its RHS, such grammar is said to be left recursive. Adopted from Concepts of Programming Languages - Sebesta

Associativity (cont...) If the LHS of a rule appears last in its RHS, such grammar is said to be right recursive. Left recursion supports left associativity and right recursion supports right associativity.

Extended BNF (EBNF) Due to shortcomings in BNF, it was extended. The extended version is known as Extended BNF or simply EBNF. Three extensions are commonly included in the various versions of EBNF. First extension is denoting a optional part in the RHS using square brackets. Ex: <if_stmt> -> if (<expr>) <stmt> [ else <stmt> ]

Extended BNF (EBNF) (cont...)
Second extension is the use of braces in the RHS to indicate that the enclosed part can be repeated indefinitely. Ex: <iden_list> -> <identifier> {, <identifier> }

Third extension deals with multiple-choice options using the parentheses and OR operator, |. Ex: <term> -> <term> (* | / | % ) <factor> The brackets, braces and parentheses are known as metasymbols.

Attribute Grammars An attribute grammar is used to describe more about the structure of a programming language. Attribute grammar is an extension to a CFG. Attribute grammar allows certain language rules like type compatibility to be conveniently described.

Attribute Grammars – Static Semantics
Some characteristics of the programming languages like type compatibility cannot be specified using BNF. A syntax rule that cannot be specified using BNF is, all variables must be declared before their usage. These are examples of static semantic rules. Static semantics can be checked at compile time. Attribute grammar is one of the alternatives for describing static semantics. It was designed by Knuth.

Attribute Grammars – Basic Concepts
Attribute grammars are CFGs along with attributes, attribute computation functions and predicate functions. Attributes are associated with grammar symbols (terminals and non-terminals) and are similar to variables. Attribute Computation Functions are associated with grammar rules. They are used to specify how attribute values are computed. Predicate functions, which state the static semantic rules, are associated with grammar rules.

Attribute Grammars – Definition
Associated with each grammar symbol X is a set of attributes A(X). The set A(X) contains two disjoint sets S(X) and I(X), called synthesized attributes and inherited attributes. Synthesized Attributes are used to pass semantic information up the parse tree. Inherited Attributes pass semantic information down and across a tree

Attribute Grammars – Definition (cont...)
Associated with each grammar rule is a set of semantic functions. For a rule X0 -> X1....Xn , the synthesized attributes of X0 are computed with semantic functions of the form S(X0) = f(A(X1),...,A(Xn)). So the value of a synthesized attribute on a node only depends upon the values of the attributes of that node’s child nodes. Inherited attributes of symbols Xj, 1<=j<=n, are computed with a semantic function of the form I(Xj) = f(A(X0),.....,A(Xn)). So the value of inherited attribute on a node depends on attribute values of that node’s parent node and those of its sibling nodes.

Attribute Grammars – Definition (cont...)
A predicate function has the form of a Boolean expression on the union of the attribute set {A(X0),....,A(Xn)} and a set of literal attribute values. The only derivations allowed with an attribute grammar are those in which every predicate associated with every non-terminal is true.

Intrinsic Attributes Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined outside the parse tree (ex: type of a variable from symbol table). Given the intrinsic attribute values on a parse tree, the semantic functions can be used to compute remaining attribute values.

Attribute Grammar – Example 1
Adopted from Concepts of Programming Languages - Sebesta Attribute grammar that describes the rule that the name on the end of an Ada procedure must match the procedure’s name. (This rule cannot be stated using BNF). Note: Numbers represented as subscripts are used to denote the instances of an abstraction.

Attribute Grammar – Example 2
actual_type: Synthesized Attribute expected_type: Inherited Attribute Adopted from Concepts of Programming Languages - Sebesta

Attribute Grammar – Example 2 (cont...)

Dynamic Semantics Dynamic semantics deals with meaning of the expressions, statements and program units. No universally accepted notation or approach has been devised for dynamic semantics.

Operational Semantics
Operational semantics specifies the meaning of a program in terms of its implementation on a real or virtual machine. Change in the state of the machine defines the meaning of the statement. To use operational semantics for a high-level language, a virtual machine is needed. Highest level operational semantics is known as natural operational semantics and lowest level is known as structural operational semantics.

Operational Semantics - Ex

Operational Semantics - Evaluation
Advantages: May be simple for small examples Good if used informally Useful for implementation Disadvantages: Very complex for large programs Lacks mathematical rigor Uses: Vienna Definition Language (VDL) used to define PL/I Compiler work

Denotational Semantics
A formal method for specifying the meaning of programs. Denotational semantics is based on recursive function theory. Key idea is to define a function that maps a program (a syntactic object) to its meaning (a semantic object). The domain of the mapping function is called the syntactic domain and the range is called semantic domain. The method is named denotational because the mathematical objects denote the meaning of their corresponding entities.

Denotational vs. Operational
Denotational semantics is similar to operational semantics except: There is no virtual machine Language is mathematics (lambda calculus) Difference between denotational and operational semantics: In operational semantics, the state changes are defined by coded algorithms for a virtual machine In denotational semantics, they are defined by rigorous mathematical functions

Denotational Semantics - Process
Define a mathematical object for each language entity. Define a function that maps instances of the language entities onto instances of the corresponding mathematical objects.

Denotational Semantics – Ex 1
Example: Representing binary strings as decimal numbers a) Syntax b) Mapping Function Mbin Adopted from Concepts of Programming Languages - Sebesta

Denotational Semantics – The State of a Program
Denotational semantics is defined in terms of values of all the variables in the program. If s is treated as a state of a program, then: s = {<i1,v1>, , <in,vn>} where i is a variable and v is the corresponding value of that variable. VARMAP(ij, s) gives vj. Where i is a variable in a state s and v is the value of that variable. These state changes are used to define the meanings of programs and program constructs.

Denotational Semantics – Expressions
Assumptions: Only operators are + and * An expression can have at most one operator Only operands are integer variables and integer constants No parentheses Adopted from Concepts of Programming Languages - Sebesta

Denotational Semantics – Expressions (cont...)
If Z is a set of integers and error is an error value, then Z union {error} is the semantic domain. Mapping function Me is: Adopted from Concepts of Programming Languages - Sebesta

Denotational Semantics – Evaluation
Advantages: Compact & precise, with solid mathematical foundation Can be used to prove the correctness of a program Can be an aid to language design Disadvantages: Requires mathematical knowledge Hard for programmer to use Uses: Semantics for ALGOL 60 and Pascal Used in compiler generation and optimization

Axiomatic Semantics Axiomatic semantics is based on formal logic (first order predicate calculus). Axiomatic semantics was originally used for program verification. Process is to define axioms or inference rules for each statement type in the language. The logical expressions used are called assertions or predicates, and state the relationship between variables. An assertion before a statement is called a pre-condition. An assertion following a statement is called a post-condition.

Axiomatic Semantics – Weakest Preconditions
The weakest pre-condition is the least restrictive precondition that will guarantee the validity of the associated post-condition. Ex: sum = 2 * x + 1 {sum > 1} In the above example, {sum > 1} is the post-condition for the assignment statement. The weakest pre-condition for the above assignment statement will be {x > 0}.

Axiomatic Semantics (cont...)
An inference rule is a method of inferring the truth of one assertion on the basis of the values of other assertions. The general form of an inference rule is: S1, S2, , Sn S The above rule states that if S1,...,Sn are true, then the truth of S can be inferred. The top part of a rule is called its antecedent and the bottom part is called its consequent. An axiom is a logical statement that is assumed to be true. Therefore, an axiom is a inference rule without an antecedent.

Axiomatic Semantics – Assignment Statements
The pre-condition and post-condition of an assignment statement together define precisely its meaning. If x = E is a general assignment statement and Q is its post-condition, then its pre-condition can be given as: P = Qx->E Example: a = b / 2 – 1 { a < 10 } Weakest pre-condition is computed by substituting b / 2 – 1 for a in the post-condition: b / 2 – 1 < 10 b < 22

Axiomatic Semantics – Assignment Statements (cont...)
The notation for specifying the axiomatic semantics of a given statement is{P} S {Q}. Where, P is the pre-condition, Q is the post-condition and S is the statement. For assignment statement, notation is: { Qx->E} x = E {Q}

Axiomatic Semantics - Evaluation
Advantages: Can be very abstract May be useful in proofs of correctness Solid theoretical foundations Disadvantages: Predicate transforms are hard to define Hard to give complete meaning Does not suggest implementation Uses: Semantics of Pascal Reasoning about correctness

Describing Semantics - Summary
Operational Semantics Informal descriptions Compiler work Denotational Semantics Formal definitions Provably correct implementations Axiomatic Semantics Reasoning about particular properties Proofs of correctness

UNIT – 1 LEXICAL ANALYSIS & PARSING

Reasons for Separating Lexical Analysis and Syntax Analysis
Simplicity – Removing low-level details of lexical analysis from syntax analyzer makes it both smaller and less complex. Efficiency – Optimization of syntax analyzer is not necessary. So separating lexical analysis from syntax analyzer improves efficiency. Portability – Lexical analyzer is platform dependent and syntax analyzer can be made platform independent. So both of them are separated.

Lexical Analysis A lexical analyzer is essentially a pattern matcher.
Lexical analyzers extracts lexemes from a given string and produce the corresponding tokens. Now-a-days lexical analyzers are subprograms for syntax analyzers where lexical analyzer returns a single token for each call. Lexical analysis process includes skipping comments and white spaces as they are not useful. Lexical analyzer inserts lexemes for user-defined names into symbol table. Lexical analyzer detects syntactic errors is tokens.

Lexical Analysis (cont...)
There are three approaches for building a lexical analyzer: Using a software like LEX (in UNIX) which takes token patterns as input and generates a lexical analyzer. Designing a state transition diagram for token patterns and building a program that implements the diagram. Designing a state transition diagram for token patterns and build a table-driven implementation of the diagram.

A state transition diagram is a directed graph. The nodes are labelled with state names and arcs are labelled with input characters that cause transitions among states. State diagrams used for lexical analysis are representations of a class of mathematical machines called finite automata. Finite automata are used to recognize members of a class of languages called regular languages. The tokens of a programming language are a regular language, and a lexical analyzer is a finite automaton.

State diagram for arithmetic statements: Adopted from Concepts of Programming Languages - Sebesta

Parsing The process of analyzing syntax that is referred to as syntax analysis is often called as parsing. Parsers for programming languages construct parse trees for given programs. Goals of syntax analysis: To check the input program whether it is syntactically correct. To produce a complete parse tree.

Parsing (cont...) Parsers are categorized based up on how they build the parse tress: In top-down parsing, tree is built from the root downward to the leaves. In bottom-up parsing, tree is built from the leaves upward to the root.

Top-down Parsers A top-down parser builds a parse tree in pre-order. A pre-order traversal begins with the root. Each node is visited before its branches are followed. Branches from a particular node are followed in left-to-right order. This corresponds to a left-most derivation. Parser’s task is find the next sentential form in the left-most derivation.

Top-down Parsers (cont...)
General form of a left sentential form is xAα. Where x is a string of non-terminals, A is the left-most non-terminal and α is a mixed string. If the rules for non-terminal A are: A->bB, A->cBb and A->a, the parsing decision problem is to choose the correct production for A. In general, top-down parsers scan the next token and based on that, chooses the correct production to apply for A.

Top-down Parsers (cont...)
A recursive descent parser is a coded version of a syntax analyzer based directly on the BNF description of the syntax of language. Alternative to recursive descent parser is to use a parsing table to implement BNF rules. Both of these are called LL algorithms. The first L in LL specifies left-to-right scan of the input and the second L specifies that a left most derivation is generated.

Bottom-Up Parsers A bottom-up parser constructs a parse tree by beginning at the leaves and moving towards the root. This parse order corresponds to the reverse of a right-most derivation. Given a sentential form α, the parser must determine what substring is the RHS of the rule in the grammar that must be reduced to its LHS to obtain the previous sentential form in the right-most derivation. There might be more than one RHS in the rules that matches a substring in the sentential form. The correct RHS is called the handle.

Bottom-Up Parsers (cont...)
Consider the following grammar: S -> aAc A -> aA | b Right-most derivation for string aabc is: S => aAc => aaAc => aabc In this example, when we start with aabc, the handle is easy to find as only one RHS matches with the sentential form and it is A -> b. So b is reduced to A and we arrive at sentential form aaAc. The most common bottom-up parsing algorithms are in LR family, where L specifies a left-to-right scan and R specifies that a right-most derivation is generated.

Complexity of Parsing Parsing algorithms that work for unambiguous grammar are complicated and inefficient. Complexity of such algorithms is O(n3). That means the time they take is cube to the length of the string to be parsed. This large amount of time is required because the algorithms must backtrack and rebuild the trees in case of errors. All algorithms used in commercial compliers have complexity O(n).

Recursive Descent Parser
A recursive descent parser is so named because it consists of subprograms which are recursive in nature and it produces a parse tree in top-down order. A recursive descent parser has a subprogram for each non-terminal. A recursive descent parser starts with the start symbol, applies rules replacing the left most non-terminal until it encounters stop symbol (end of input) or an error.

Recursive Descent Parser - Example
Example grammar: E -> iE’ E’ -> +iE’ | ε

Recursive Descent Parser – Example (cont...)
{ if(l == ‘+’) match(‘+’); match(‘i’); E’( ); } else return; E( ) { if(l == ‘i’) match(‘i’); E’( ); } match(char t) { if(l == t) l = getchar(); else printf(“error”); } main( ) { E( ); if(l == ‘$’) printf(“Parsing done!”); }

Recursive Descent Parser – Example (cont...)
Example string: i + i $ E i E’ + i E’

Principles of Programming Languages

Similar presentations

Presentation on theme: "Principles of Programming Languages"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Principles of Programming Languages

Similar presentations

Presentation on theme: "Principles of Programming Languages"— Presentation transcript:

Similar presentations

About project

Feedback