Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill.

Similar presentations


Presentation on theme: "Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill."— Presentation transcript:

1 Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill

2 Definitions Syntax: form of the expressions, statements and units Semantics: meaning of those expressions, statements and units What is needed for this course and beyond is a way to describe both in a clear and unambiguous way Copyright © 2003-2015 by Curt Hill

3 Some Terminology Sentence –A string of characters using some alphabet Language –A set of sentences –Possibly infinite Lexeme –The most basic unit of the syntax Token –A class of lexemes Copyright © 2003-2015 by Curt Hill

4 Programming Languages Here we also have characters and lexemes A token is a class of lexemes –Any token is interchangeable with its own class for syntax –It may change the meaning, but not the form In English: nouns, verbs etc –Nouns are interchangeable, even though the meaning changes Reserved words, punctuation, identifiers Copyright © 2003-2015 by Curt Hill

5 Tokens and Lexemes The lexeme is the word or item from the language itself A token is the representation of the lexeme that is output by the scanner Tokens are often records or objects Tokens are often identified by an enumeration This may be enhanced by other information, such as an identifier in a symbol table Copyright © 2003-2015 by Curt Hill

6 Compilers Typically have four phases Lexical Analyzer Parser Code Generator Optimizer These may be condensed as well Each does its part in the recognition and translation of a langauge Copyright © 2003-2015 by Curt Hill

7 Lexical Analyzer The front end of a compiler is a lexical analyzer or scanner It reads in characters and outputs tokens Some tokens are self sufficient –Punctuation and reserved words Other tokens may be tagged by an ID of the item, such as an identifier or literal The syntax routines do not need it but the semantic routines do Copyright © 2003-2015 by Curt Hill

8 Formal methods of describing syntax Two men worthy of note –Noam Chomsky Noted linguist and political activist Devised an hierarchy of languages –John Backus FORTRAN Algol60 Backus Normal (Naur) Form Copyright © 2003-2015 by Curt Hill

9 Chomsky Grammars All languages are defined by a grammar A grammar contains four pieces –V - an alphabet –The legal characters –T - set of terminal symbols –Terminals may appear in the language such as reserved words –Non-terminals may not appear They are concepts or statements composed of terminals –P - a set of rewriting rules, these are called productions –Z - the distinguished symbol Copyright © 2003-2015 by Curt Hill

10 More on Grammar A language is all the legal strings accepted by this language Terminals are those things that actually exist in the language Non-terminals are those things that only represent syntactic items For a parse to be complete all non- terminals must be rewritten into terminals Lets consider a simple example Copyright © 2003-2015 by Curt Hill

11 Binary The grammar is G = {V,T,P,Z} The alphabet, terminals and non- terminals: V = {0,1,Z,A} Terminals: T = {0,1} Non-Terminals must be Z and A Distinguished symbol is Z Productions are on next screen Copyright © 2003-2015 by Curt Hill

12 Productions P = { Z ::= A A ::= 1 A A ::= 0 A A ::= 0 A ::= 1 } A production allows us to rewrite from one form to another A non-terminal is on the left Terminals and non-terminals on the right Copyright © 2003-2015 by Curt Hill

13 Derive 101 Copyright © 2003-2015 by Curt Hill Start with distinguished symbolZ Apply production Z::= AA Apply production: A ::= 1 A1A Apply production: A ::= 0 A10A Apply production: A ::= 1101

14 Chomsky Hierarchy Chomsky proposed an hierarchy of languages based on the strength of the rewriting rules There are four –Type 0 through Type 3 The hierarchy is based on the strength of the rewriting rules Type 0 is strongest, 3 is weakest In programming languages we are only interested in the 3 and 2 Copyright © 2003-2015 by Curt Hill

15 Type 3 - regular languages U ::= N or U := WN U and W are non-terminals and N is a terminal A non-terminal may only be replaced by a terminal or non-terminal followed by a terminal Often used for describing tokens Regular expressions are of this type Copyright © 2003-2015 by Curt Hill

16 Type 2 - context free languages U ::= v U is in set of non-terminals and v is in set of terminals and non-terminals A terminal may be replaced by any combination of terminals and non- terminals –The context of the terminal does not matter Most programming languages are context-free or have a few minor exceptions Copyright © 2003-2015 by Curt Hill

17 Language Hierarchies Copyright © 2003-2015 by Curt Hill Type 3 Regular Type 2 Context Free Type 1 Context Sensitive Type 0 Unrestricted

18 BNF John Backus defined FORTRAN with a notation similar to Context Free languages independent of Chomsky in 1959 Peter Naur extended it slightly in describing ALGOL Became known as BNF for Backus Normal Form or Backus Naur Form Meta-language is the language that describes another language Copyright © 2003-2015 by Curt Hill

19 BNF Again There are several meta-languages for BNF, the production rules given above are one Like the Chomsky grammar there are non-terminals, terminals, productions and a start symbol –Each non-terminal represents some abstract concept in a language –There is often some notational way to distinguish a terminal from a non- terminal Copyright © 2003-2015 by Curt Hill

20 Simplest notation Form of productions: LHS  RHS Where: –LHS is a non-terminal (context free and regular grammars) –RHS is any sequence of terminals and non-terminals, including empty There can be many productions with exactly the same LHS, these are alternatives If the RHS contains the LHS, the rule is recursive Copyright © 2003-2015 by Curt Hill

21 Simple extensions Some times there is an alternation symbol that allows us to only need one production with the same LHS, often the vertical bar Some times things enclosed in [ and ] are optional, they may be present zero or one times Some times things enclosed in { and } may be present 1 or more times –Thus [{x}] allows zero or more x items Copyright © 2003-2015 by Curt Hill

22 More The extensions are often called EBNF Syntax graphs are equivalent to EBNF These tend to be more easy to read Copyright © 2003-2015 by Curt Hill

23 Simple Expressions Copyright © 2003-2015 by Curt Hill expression term + - factor * / constant ident ()expression

24 BNF is generative A derivation is sentence generation Leftmost derivation –Only the leftmost non-terminal can be rewritten –This is usually the kind of derivation used by compilers –The previous derivation was leftmost There are also rightmost derivations The order of derivation does not affect the language defined Copyright © 2003-2015 by Curt Hill

25 Example BNF productions Copyright © 2003-2015 by Curt Hill   | ;  =  a | b | c | d  + | -  | const

26 Example Derivation Copyright © 2003-2015 by Curt Hill => => = => a = => a = + => a = b + => a = b + const

27 Parse trees A multi-way tree where: –Each interior node is a non-terminal –Each leaf is a terminal –The start symbol is the root –Nested under each interior node is the RHS of the production, with the LHS being the node itself This is a handy data structure for compilers and the like Copyright © 2003-2015 by Curt Hill

28 Example Parse Tree Copyright © 2003-2015 by Curt Hill program stmts stmt varexpr = term = a b const var

29 Ambiguity A grammar is ambiguous when two parse trees can be derived from the same input sequence An ambiguous grammars usually require some fix-up in the compiler to guarantee that only one will be chosen Many IF grammars are ambiguous concerning whether they have an else or not Copyright © 2003-2015 by Curt Hill

30 BNF Problems BNF cannot capture important information –That a variable is defined –That an expression contains proper types Some problems like type checking could be done but would bulk out the grammar so much to be unusable –Other problems like declare before use in C++ are impossible to catch in BNF Many of these are types of things are called Static Semantics Copyright © 2003-2015 by Curt Hill

31 The Solution? Attribute Grammars An attempt to augment the syntax with static semantic information Associate with each production (and with nodes of the parse tree) a function that would check the static semantic information Check the attributes with a set of predicates Copyright © 2003-2015 by Curt Hill

32 Attribute Grammars A context free grammar For each symbol there may be a set of attribute values A set of functions that define these attribute values based on non- terminals Copyright © 2003-2015 by Curt Hill

33 Example Copyright © 2003-2015 by Curt Hill ProductionAttribute ::= val(exp)=val(term) ::= + val(exp)=val(exp)+ val(term) ::= * val(term)=val(term) * val(factor) ::= val(term) = val(factor) ::= identval(factor) = val(ident) ::= ( )val(factor) = val(exp) Consider: 2+4(1+2)

34 Second Example Copyright © 2003-2015 by Curt Hill ProductionAttribute ::= ::=inttype=int ::=floattype=float ::=identnames(list)=ident ::=ident, names(list)=ident  names(list) We can now determine whether defined or not from the types

35 Second example Consider declarations ProductionAttributes ::= ::=int type=int ::=floattype=float ::=identnames(list)=ident ::=ident, names(list)=ident  names(list) Now we can determine from the attributes whether an item is defined or not Copyright © 2003-2015 by Curt Hill

36 YACC Uses YACC (Yet Another Compiler Compiler) and many other programs is a common UNIX tool for constructing compilers YACC uses an attribute grammar of sorts –Attached to each production is a function call –You get to write the function that does the checking at that point, including code generation Copyright © 2003-2015 by Curt Hill

37 Conclusion and Summary Syntax is about the form of langauges Semantics the meaning BNF represents a context free grammar Copyright © 2003-2015 by Curt Hill


Download ppt "Syntax and Semantics Form and Meaning of Programming Languages Copyright © 2003-2015 by Curt Hill."

Similar presentations


Ads by Google