Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2003-2014 Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.

Similar presentations


Presentation on theme: "Copyright © 2003-2014 Curt Hill Languages and Grammars This is not English Class. But there is a resemblance."— Presentation transcript:

1 Copyright © 2003-2014 Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.

2 Introduction We have already determined that some computations are impossible –The halting problem was one and there are others What we want are models of computation that give us insight into what is and is not computable Strangely enough, models of computation are closely related to the complexity of languages Copyright © 2003-2014 Curt Hill

3 Languages Every natural language is spoken Usually written as well Such languages are extremely complicated Every language has a syntax and semantics Syntax – the form that the language must have Semantics – the meaning A sentence that violates the syntax may be difficult or impossible to assign meaning Copyright © 2003-2014 Curt Hill

4 Grammar The grammar of a language describes the syntax of a language Since natural languages are extremely complicated, we would expect their grammar’s to also be complicated Perhaps you recall diagramming sentences from high school –This is confirming the syntax of a sentence Copyright © 2003-2014 Curt Hill

5 Natural Languages These are extremely complicated The grammar for a language is a volume of books See the text for some simple examples from English Copyright © 2003-2014 Curt Hill

6 Formal Languages In contrast with natural languages are formal languages –Artificial languages These are typically not designed for person to person communication –Rather person to machine or machine to machine In comparison with natural languages they: –Have very few rules –Very few exceptions to these rules Copyright © 2003-2014 Curt Hill

7 Examples The largest class of these is likely programming languages There are others as well Mathematical notation may be considered a formal language even though it is designed for a form of person to person communication Copyright © 2003-2014 Curt Hill

8 Noam Chomsky Professor emeritus of linguistics at MIT Developed a theory of generative grammars This includes a language hierarchy –AKA Chomsky-Schützenberger Hierarchy Most of the theory of this section was developed by Chomsky Copyright © 2003-2014 Curt Hill

9 Phrase Structure Grammar A grammar, G, is a four tuple: G=(V,T,S,P) V is the alphabet or vocabulary T is a set of terminal elements S is the start symbol or distinguished symbol P is a set of productions –Productions are rewrite rules A grammar should be able to enumerate any legal sentence of the language Copyright © 2003-2014 Curt Hill

10 Formal Grammars Each grammar consists of four things V – a finite set of non-terminals (aka variables) T – a finite set of terminal symbols –Words made up from an alphabet S – the start symbol –Must be an element of V P – a set of productions Copyright © 2003-2014 Curt Hill

11 V A set of elements or symbols We may think about this as the character set –Although that is a little misleading –The alphabet of English is made up of letters, digits and punctuation –But not every combination of letters is a word Perhaps the better way to think about V is as words and stand-alone symbols –There is usually a rule for construction Copyright © 2003-2014 Curt Hill

12 T and N There is a set of terminal symbols, T, as well as a set of non-terminal symbols, N –T is a subset of V Terminals can exist in a legal instance of the language Non-terminals are concepts that need to be instantiated, that is converted into concrete terminals Copyright © 2003-2014 Curt Hill

13 Examples In English any legal word is a terminal A concept like “noun phrase” is a non-terminal –This can be instantiated in a myriad of actual words In C++ the reserved word for or an identifier would be terminals In C++ the concept “if statement” is a non-terminal Copyright © 2003-2014 Curt Hill

14 P A set of productions A production is a rewrite rule Form: – X  Y This means that we can rewrite X as Y –Since  is hard to type we often use ::= Each production must have at least one non-terminal on the left The complexity of these rules determines the type of language Copyright © 2003-2014 Curt Hill

15 S The start symbol or distinguished symbol This is a non-terminal from which all derivations start In English this is usually something like “sentence” In most programming languages it is something like “program” or “unit” Copyright © 2003-2014 Curt Hill

16 Grammars We should be able to produce two things from a grammar –A generator –A recognizer A generator should produce any legal string in the language A recognizer should determine if a string is legal or not –This process is part of parsing Copyright © 2003-2014 Curt Hill

17 Language Recognizer Automaton that reads in a purported construction in the language It answers yes or no if this is indeed in the language Sometimes a reference recognizer is produced A recognizer is not a compiler –Only purpose is to classify Copyright © 2003-2014 Curt Hill

18 Language generators Generates correct statements or correct programs If given enough time (  ) should generate every correct statement in the language Since it generates random correct statements it has some use in learning the syntax Copyright © 2003-2014 Curt Hill

19 Some Examples Lets consider a simple grammar that generates and bit string G = {V, T, S, P} V = {Z, B, 0, 1} T = {0, 1} S = Z P = {Z  B, B  BB, B  0, B  1} Terminals are 0 and 1 Non terminals are Z and B Copyright © 2003-2014 Curt Hill

20 Derivations Is the above grammar able to generate all possible bit strings? Let’s consider a few: 1 (start with Z) –Z  B (B) –B  1 (1) 10 (start with F) –Z  B (B) –B  BB (BB) –B  1 (1B) –B  0 (10) Copyright © 2003-2014 Curt Hill

21 One More 010 (start with Z) –Z  B (B) –B  BB (BB) –B  0 (0B) –B  BB (0BB) –B  1 (01B) –B  0 (010) Are you convinced? Copyright © 2003-2014 Curt Hill

22 Definitions A string may be derived from the start symbol if it is a legal construct of the language A string is a direct derivation from another if it needs only one production A string is a derivation from another if it needs one or more production applications Copyright © 2003-2014 Curt Hill

23 A Language Definition: The language of a grammar is the set of all possible strings that may be derived from a grammar –The finished string must only contain non-terminals Copyright © 2003-2014 Curt Hill

24 Other Way Let us now try one where we want a particular language and we have to come up the grammar Lets consider the set of bit strings that start with 00 and end with a sequence of 1s As a regular expression: 00(0|1)*1+ –001, 00111111, 00011010011111 –0000001, among others Copyright © 2003-2014 Curt Hill

25 The Grammar There is not a single way to do the previous G = (V, T, S, P) T = {0,1} S = S What is P? Copyright © 2003-2014 Curt Hill

26 P Must have at least one production starting with S: –S  00 B 1 B then looks like the bit string of before: –B0–B0 –B1–B1 –B  BB What other possibilities could we have? Copyright © 2003-2014 Curt Hill

27 Audience Participation What is the grammar for the bit strings that look like this: 0 h 1 j 0 k where h>0,j>0,k>0 This includes: –010, 00000001110,000011111111000 among others Copyright © 2003-2014 Curt Hill

28 One Last Thing (or not) Finally lets look at an example programming language A subset of C Copyright © 2003-2014 Curt Hill

29 C Subset as an Example V – set of non-terminals –Statement –Declaration –For-statement T – set of terminals –Reserved words –Punctuation –Identifiers Copyright © 2003-2014 Curt Hill

30 C example again S – Start symbol –Independently compilable part –Program –Function –Constant P – set of productions –Rewrite rules –Start at the start symbol –End at terminals Copyright © 2003-2014 Curt Hill

31 C For Production For-statement  for ( expression; expression; expression) statement This contains the terminals: –For ( ; ) Non-terminals –Expression –Statement Copyright © 2003-2014 Curt Hill

32 Productions Again Each non-terminal should have one or more productions that define it –Every non-terminal must have one or more productions Multiple productions usually signify alternation Recursion is allowed Copyright © 2003-2014 Curt Hill

33 Recursion Productions may be recursive Recall for-statement, here is Statement Statement  expression ; Statement  for-statement ; Statement  if-statement ; Statement  while-statement ; Statement  compound-statement Etc. Copyright © 2003-2014 Curt Hill

34 Exercises 13.1a –1, 5, 13 Copyright © 2003-2014 Curt Hill


Download ppt "Copyright © 2003-2014 Curt Hill Languages and Grammars This is not English Class. But there is a resemblance."

Similar presentations


Ads by Google