CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.

Slides:



Advertisements
Similar presentations
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
Advertisements

Grammars.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
ISBN Chapter 3 Describing Syntax and Semantics.
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
PZ02A - Language translation
Context-Free Grammars Lecture 7
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Fall 2007CS 2251 Miscellaneous Topics Deque Recursion and Grammars.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
Chapter 3: Formal Translation Models
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
Chapter 2 Syntax A language that is simple to parse for the compiler is also simple to parse for the human programmer. N. Wirth.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Syntax & Semantic Introduction Organization of Language Description Abstract Syntax Formal Syntax The Way of Writing Grammars Formal Semantic.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Syntax Specification and BNF © Allan C. Milne Abertay University v
Syntax and Backus Naur Form
Syntax: 10/18/2015IT 3271 Semantics: Describe the structures of programs Describe the meaning of programs Programming Languages (formal languages) -- How.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Grammars CPSC 5135.
PART I: overview material
Programming Languages Third Edition Chapter 6 Syntax.
LANGUAGE DESCRIPTION: SYNTACTIC STRUCTURE
C H A P T E R TWO Syntax and Semantic.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
CMSC 330: Organization of Programming Languages Context-Free Grammars.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
CPS 506 Comparative Programming Languages Syntax Specification.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Chapter 3 Describing Syntax and Semantics
Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.
1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections
ISBN Chapter 3 Describing Syntax and Semantics.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Syntax and Semantics Form and Meaning of Programming Languages Copyright © by Curt Hill.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Programming Languages and Design Lecture 2 Syntax Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
LECTURE 4 Syntax. SPECIFYING SYNTAX Programming languages must be very well defined – there’s no room for ambiguity. Language designers must use formal.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
C H A P T E R T W O Syntax and Semantic. 2 Introduction Who must use language definitions? Other language designers Implementors Programmers (the users.
©SoftMoore ConsultingSlide 1 Context-Free Grammars.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
Formal grammars A formal grammar is a system for defining the syntax of a language by specifying sequences of symbols or sentences that are considered.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Describing Syntax and Semantics
Automata and Languages What do these have in common?
CSE 3302 Programming Languages
Chapter 3 Describing Syntax and Semantics.
High-Level Programming Language
Chapter 10: Compilers and Language Translation
Presentation transcript:

CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next handout) –Syntactic analysis (the handout after that)

CSI 3120, Grammars, page 2 Syntax and semantics Points to discuss: –The form and meaning of programming languages –Types of processing –Types of languages

CSI 3120, Grammars, page 3 Syntax The syntax of a language determines how programs are built from elementary units (keywords, identifiers, numbers, brackets, and so on). A syntactically correct program may still not be acceptable, or it may work in a way that we do not want (or do not expect). Formal syntax is a system for describing the structure of programs exactly. –Such systems include grammars, BNF, syntactic diagrams (syntax graphs).

CSI 3120, Grammars, page 4 Grammars There are infinitely many different programs, but every program is finite and must be recognized in finite time. A grammar should allow a finite description of a usually infinite language.

CSI 3120, Grammars, page 5 Semantics Semantics of a language determines the meaning of elementary units and their combinations: –how does the meaning of a program derive from the meaning of its fragments? The effect of a compound statement (such as a loop, an "if") should depend only on the effect of the elementary statements (such as an assignment).

CSI 3120, Grammars, page 6 Methods of semantic description Operational semantics –Simple lower-level operations explain how higher-level statements are performed. Denotational semantics –A program computes a function, a mapping data  results Axiomatic semantics –A program establishes a relation data  results

CSI 3120, Grammars, page 7 Lexical analysis Lexical analysis pre-processes a file that contains a source program: –recognize units larger than single characters (keywords, predefined names, identifiers, numbers, brackets, operators, and so on). –remove white space. This helps make translators simpler, by keeping low-level details out.

CSI 3120, Grammars, page 8 Syntactic analysis Syntactic analysis, based on grammars, can mean two things: –Recognition (the program is / is not correct) –Parsing (a representation of the syntactic structure is built for correct programs). Syntactic analysis is the essential part of any implementation of a programming language. By the way, syntactic generation, also based on grammars, is the flip side of analysis: it runs from a syntactic structure to a source. Important in language technology, not much in programming languages.

CSI 3120, Grammars, page 9 What is a language? A language is a set of sentences. A sentence is a sequence of elementary pieces, built according to certain rules (usually grammar rules). In a natural language, sentences have the usual meaning. In programming languages, various syntactic unit can be considered as "sentences". –For example, in a set of all expressions, each valid expression is a sentence. –In a set of all programs, each valid complete program is a sentence. And so on.

CSI 3120, Grammars, page 10 A hierarchy of formal languages Formal languages are classified on their complexity. A four-level hierarchy, from the simplest to the most complicated: regular < context-free < context-sensitive < recursively enumerable. Grammars too are classified in this way.

CSI 3120, Grammars, page 11 Programming languages usually have: –context-free syntax, –context-sensitive semantics. Context-freeness (important in syntactic analysis) means that a fragment we are analyzing does not depend on any other fragments of the program. –for example, an occurrence of a variable is not related to its declaration; –a message to a method is analyzed separately of the definition of this method. A hierarchy of formal languages (2)

CSI 3120, Grammars, page 12 Formal grammars Points to discuss: –Concepts of formal grammars –A sample grammar in BNF –Derivations and parse trees –Ambiguity in grammars

CSI 3120, Grammars, page 13 A formal grammar has four components Terminal symbols = language elements (for example, variable names in Java, or English words). Non-terminal symbols = auxiliary symbols, denoting classes of constructions (for example: loop_statement, Boolean_expression). The goal (start) symbol denotes any sentence. Productions = rewriting rules ("this structure has such and such components") used to recognize or generate sentences.

CSI 3120, Grammars, page 14 Two ways of rewriting From the start symbol, produce more and more specific approximations of a sentence, replacing non-terminals with their definitions; Reduce the sentence into more and more general forms, replacing definitions with non-terminals, and reach the goal symbol (the same!). Productions are what makes a grammar regular, context-free or context-sensitive.

CSI 3120, Grammars, page 15 Example: a grammar of expressions Seven terminal symbols: + - * ( ) x y Four non-terminal symbols: ‹expr› ‹term› ‹factor› ‹var› These names are what we choose: writing a grammar is not different from writing a program. The names are meant to help us read the grammar. Start/goal symbol: ‹expr›

CSI 3120, Grammars, page 16 Notation Angle brackets distinguish non-terminal from terminal symbols. (It's like distinguishing strings and keywords in Java: "class" and class are different.) LHS  RHS means: "the Left-Hand Side consists of things on the Right- Hand Side". The bar | separates alternative Right-Hand Sides with the same Left-Hand Side.

CSI 3120, Grammars, page 17 Productions of our grammar ‹expr›  ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term› ‹term›  ‹factor› | ‹term› * ‹factor› ‹factor›  ‹var› | ( ‹expr› ) ‹var›  x | y

CSI 3120, Grammars, page 18 Top-down and bottom-up For any sentence , productions can be applied in two directions. Top-down: – Derive  from the start symbol. –  will then be an example of an expression. Bottom-up: – Fold  into the initial symbol.

CSI 3120, Grammars, page 19 Let us take the sequence of terminal symbols: ( x - y ) * x + y [It is an expression, but we must first show that it is.] Consider two derivations (the next two pages). On each line, the highlighted part is involved in rewriting into the next line, according to some grammar production. Derivations

CSI 3120, Grammars, page 20 A top-down derivation ‹expr›  ‹expr› + ‹term›  ‹term› + ‹term›  ‹term› * ‹factor› + ‹term›  ‹factor› * ‹factor› + ‹term›  ( ‹expr› ) * ‹factor› + ‹term›  ( ‹expr› - ‹term› ) * ‹factor› + ‹term›  ( ‹term› - ‹term› ) * ‹factor› + ‹term›  ( ‹factor› - ‹term› ) * ‹factor› + ‹term›  ( ‹var› - ‹term› ) * ‹factor› + ‹term›  ( x - ‹term› ) * ‹factor› + ‹term›  ( x - ‹factor› ) * ‹factor› + ‹term›  ( x - ‹var› ) * ‹factor› + ‹term›  ( x - y ) * ‹factor› + ‹term›  ( x - y ) * ‹var› + ‹term›  ( x - y ) * x + ‹term›  ( x - y ) * x + ‹factor›  ( x - y ) * x + ‹var›  ( x - y ) * x + y

CSI 3120, Grammars, page 21 A bottom-up derivation ( x - y ) * x + y  ( ‹var› - y ) * x + y  ( ‹factor› - y ) * x + y  ( ‹term› - y ) * x + y  ( ‹expr› - y ) * x + y  ( ‹expr› - ‹var› ) * x + y  ( ‹expr› - ‹factor› ) * x + y  ( ‹expr› - ‹term› ) * x + y  ( ‹expr› ) * x + y  ‹factor› * x + y  ‹term› * x + y  ‹term› * ‹var› + y  ‹term› * ‹factor› + y  ‹term› + y  ‹expr› + y  ‹expr› + ‹var›  ‹expr› + ‹factor›  ‹expr› + ‹term›  ‹expr›

CSI 3120, Grammars, page 22 And both side by side ( x - y ) * x + y  ( ‹var› - y ) * x + y  ( ‹factor› - y ) * x + y  ( ‹term› - y ) * x + y  ( ‹expr› - y ) * x + y  ( ‹expr› - ‹var› ) * x + y  ( ‹expr› - ‹factor› ) * x + y  ( ‹expr› - ‹term› ) * x + y  ( ‹expr› ) * x + y  ‹factor› * x + y  ‹term› * x + y  ‹term› * ‹var› + y  ‹term› * ‹factor› + y  ‹term› + y  ‹expr› + y  ‹expr› + ‹var›  ‹expr› + ‹factor›  ‹expr› + ‹term›  ‹expr› ‹expr›  ‹expr› + ‹term›  ‹term› + ‹term›  ‹term› * ‹factor› + ‹term›  ‹factor› * ‹factor› + ‹term›  ( ‹expr› ) * ‹factor› + ‹term›  ( ‹expr› - ‹term› ) * ‹factor› + ‹term›  ( ‹term› - ‹term› ) * ‹factor› + ‹term›  ( ‹factor› - ‹term› ) * ‹factor› + ‹term›  ( ‹var› - ‹term› ) * ‹factor› + ‹term›  ( x - ‹term› ) * ‹factor› + ‹term›  ( x - ‹factor› ) * ‹factor› + ‹term›  ( x - ‹var› ) * ‹factor› + ‹term›  ( x - y ) * ‹factor› + ‹term›  ( x - y ) * ‹var› + ‹term›  ( x - y ) * x + ‹term›  ( x - y ) * x + ‹factor›  ( x - y ) * x + ‹var›  ( x - y ) * x + y

CSI 3120, Grammars, page 23 In both derivations, guessing is required: which production should we choose to apply next? Strategies of choice are at the heart of parsing algorithms. Ideally, we would always guess correctly. Less ideally, we may have to try a production, fail, and return to try another. Both processes recognize the given sequence of symbols ( x - y ) * x + y as an expression that is well-formed according to our grammar. Is it really so easy?

CSI 3120, Grammars, page 24 Note that we do not show in this tree the order in which productions have been applied during derivations. The results of both derivations can be summarized in the same parse tree (or abstract syntax tree). Parse trees + * () — y x x y

CSI 3120, Grammars, page 25 Ambiguity A grammar is ambiguous when an expression defined by this grammar has more than one structurally different parse tree. For example, here is a grammar of arithmetic expressions: ‹E›  ‹E› + ‹E› | ‹E› * ‹E› | ‹N› where ‹N› denotes any unsigned integer. The expression 6 * has two different derivation trees.

CSI 3120, Grammars, page 26 Two different parse trees... * * ‹E›  ‹E› + ‹E› | ‹E› * ‹E› | ‹N› 6 *

CSI 3120, Grammars, page and their meaning... These trees represent two different ways of computing the value of the expression! Ambiguity should be avoided. * * 617  6 * ( )  ( 6 * 17 ) + 23

CSI 3120, Grammars, page 28 In our previous example, we should have written the usual two-level definition instead of a definition with + and * at the same level. An expressions ‹E› is a sum of terms ‹T›. A term is a product of numbers ‹N›. ‹E›  ‹T› | + ‹T›  ‹N› | *... and what to do with ambiguity

CSI 3120, Grammars, page 29 A long phrase... the dog the dog that chased the cat the dog that chased the cat that caught the mouse the dog that chased the cat that caught the mouse that chewed the shoe the dog that chased the cat that caught the mouse that chewed the shoe that squashed the fruit the dog that chased the cat that caught the mouse that chewed the shoe that squashed the fruit that stained the chair and so on... Examples

CSI 3120, Grammars, page a grammar of long phrases  the | the that  cat | chair | dog | fruit | mouse | shoe |...  caught | chased | chewed | squashed | stained |... Examples

CSI 3120, Grammars, page 31 A clause... the dog that chased the cat that caught the mouse that chewed the shoe that squashed the fruit that stained the chair grabbed the sausage that tempted the wolf that fought the fox that scared the squirrel that bit the twig that cracked the nut that hit the boy that lifted the hat Examples

CSI 3120, Grammars, page a grammar of clauses   the the that  boy | cat |...  bit | caught |... (Add maybe 1500 rules and you will have a reasonable grammar of English. ) Examples

CSI 3120, Grammars, page 33 Simple lists in Scheme... A list is either empty: () or it is a sequence of elements separated by blank spaces, all enclosed in parentheses: ( element... element ) Each element is either a list, or an atom. An atom is an identifier made of small letters. We assume that a scanner converts a text on input into a sequence of tokens—atoms and parentheses. Example: ( ab ( xyz br ) () ( no ) yes ) Examples

CSI 3120, Grammars, page a grammar of lists  () | ( )  |  |  a | b | c |... | z Examples

CSI 3120, Grammars, page 35 A flower garden... We have four kinds of things in our garden:  a wall  a large flower  a small flower  a house Starting from the left, a garden has a wall, then at least one large flower, another wall, some small flowers (more than we have large ones) and finally a house. Examples

CSI 3120, Grammars, page and a few examples...    a garden?    a garden?    a garden?    a garden? Examples

CSI 3120, Grammars, page a grammar of gardens        |     |  Examples