Presentation is loading. Please wait.

Presentation is loading. Please wait.

COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer.

Similar presentations


Presentation on theme: "COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer."— Presentation transcript:

1 COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer Science The University of Hong Kong (

2 Outline What is a Formal Language? Phrase-Structure Grammars
Finite State Automata Formal languages and Models of Computation

3 Natural Language vs. Formal Language
written and/or spoken languages in the world, such as Chinese, English, Japanese, German, French, Spanish, etc. Syntax Semantics Formal language: a language specified by a well-defined set of rules of syntax. A study of formal languages is important to computer science. For example, we need to understand what kind of statements are acceptable in the C programming language. This is the task of a compiler of a programming language.

4 Formal Language We will describe the sentences of a formal language using a grammar. How can we determine whether a combination of words is a valid sentence in a formal language? How can we generate the valid sentences of a formal language? We will only be interested in the syntax, not the semantics (meaning), of a language.

5 a sentence is made up of a noun-phrase followed by a verb-phrase;
a noun-phrase is made up of an article followed by an adjective followed by a noun, or a noun-phrase is made up of an article followed by a noun; a verb-phrase is made up of a verb followed by an adverb, or a verb-phrase is made up of a verb; an article is a, or an article is the; an adjective is large, an adjective is hungry; a noun is rabbit, or a noun is mathematician; a verb is eats, or a verb is hops; an adverb is quickly, or an adverb is wildly. If we define a subset of English using the list of rules shown here that describe how a valid sentence can be produced, how the language looks like?

6 Example: a Subset of English
From the previous rules we can form valid sentences using a series of replacements until no more rules can be used. For instance, the valid sentence the large rabbit hops quickly can be obtained by the following sequence of replacements: sentence noun-phrase verb-phrase article adjective noun verb-phrase article adjective noun verb adverb the adjective noun verb adverb the large noun verb adverb the large rabbit verb adverb the large rabbit hops adverb the large rabbit hops quickly Some other valid sentences: a hungry mathematician eats wildly the rabbit eats quickly An invalid sentence: the quickly eats mathematician

7 Some Terminologies A vocabulary (or alphabet) V is a finite, nonempty set of elements called symbols. A word (or sentence) over V is a string of finite length of elements of V . The empty string or null string, denoted by , is the string containing no symbols. The set of all words (or sentences) over V is denoted by V*. A language over V is a subset of V* . Example: In English, The alphabet V consists of English letters and other symbols. A word (or sentence) over V is a finite string of symbols. The meaningful word (or sentence) of English is a subset of V* .

8 How to specify a language?
to list all the words (or sentences) in the language; or to give some criteria that a word (or a sentence) must satisfy to be in the language; or to specify a language through the use of a grammar, such as the set of rules we gave in the previous example of English subset.

9 Outline What is a Formal Language? Phrase-Structure Grammars
Finite State Automata Formal languages and Models of Computation

10 What is a Phrase-Structure Grammar?
A phrase-structure grammar is G = (V,T,S,P), where V is a vocabulary; T is a subset of V consisting of terminal elements (i.e., the elements of V which can not be replaced by other symbols); The elements of N = V–T are called nonterminal symbols (i.e., the elements of V which can be replaced by other symbols) S is a start symbol from V (i.e., the element of the V that we always begin with; P is a set of productions. We denote by w0w1 the production that specifies that w0 can be replaced by w1. Every production in P must contain at least one nonterminal on its left side.

11 Example: a Phrase-Structure Grammar
G = (V,T,S,P), where V = { a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly; sentence, noun-phrase, verb-phrase, article, adjective, noun, verb, adverb }; T = { a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly }; V–T = { sentence, noun-phrase, verb-phrase, article, adjective, noun, verb, adverb }; S = sentence; Production rules: P

12 P = { sentence  noun-phrase verb-phrase,
noun-phrase  article adjective noun, noun-phrase  article noun, verb-phrase  verb adverb, verb-phrase  verb, article  a, article  the, adjective  large, adjective  hungry, noun  rabbit, noun  mathematician, verb  eats, verb  hops, adverb  quickly, adverb  wildly }

13 Some Terminologies Let G = (V,T,S,P) be a phrase-structure grammar.
Let w0 = lz0r and w1 = lz1r be strings over V . If z0  z1 is a production of G, we say that w1 is directly derivable from w0 and we write w0w1. Example: the adjective noun verb adverb  the large noun verb adverb because adjective  large If w0,w1, … ,wn, n  0, are strings over V such that w0w1, w1w2, … ,wn-1wn, then we say that wn is derivable from w0, and we write w0 wn. The sequence of steps used to obtain wn from w0 is called a derivation.

14 Example: sentence  the large rabbit hops quickly
via the following derivation: sentence  noun-phrase verb-phrase, noun-phrase verb-phrase  article adjective noun verb-phrase, article adjective noun verb-phrase  article adjective noun verb adverb, article adjective noun verb adverb  the adjective noun verb adverb, the adjective noun verb adverb  the large noun verb adverb, the large noun verb adverb  the large rabbit verb adverb, the large rabbit verb adverb  the large rabbit hops adverb, the large rabbit hops adverb  the large rabbit hops quickly.

15 What is the language generated by a Phrase-Structure Grammar?
Let G = (V,T,S,P) be a phrase-structure grammar. The language generated by G (or the language of G), denoted by L(G), is the set of all strings of terminals that are derivable from the starting symbol S. L(G) = { w T* | Sw }

16 starting symbol S, and production rules
Example: Suppose G = (V,T,S,P), where V = {a,b,A,B,S}, T = {a,b}, S is the start symbol, and P = { SABa, ABB, Bab, ABb }. All the “sentences" (words) generated by this grammar are { abababa, ba }, since S  ABa  BBBa  abababa S  ABa  ba Example: Let G be the grammar with V = {S,0,1}, T = {0,1}, starting symbol S, and production rules P = { S11S, S0 }. L(G) = { (11)n 0 | n = 0,1,2, …}.

17 How to construct a grammar that generates a given language?
Example: Find a phrase-structure grammar to generate the set { 0n1n | n = 0,1,2, … } Solution: G = (V,T,S,P), where V = { S, 0, 1 }, T = { 0,1 }, S is the start symbol, and P = { S0S1,S }.

18 How to construct a grammar that generates a given language??
Example: Find a phrase-structure grammar to generate the set { 0m1n | m,n = 0,1,2, … } Solution 1: G1 = (V,T,S,P), where V = {S,0,1}, T = {0,1}, S is the start symbol, and P = { S0S, SS1, S} Solution 2: G2 = (V,T,S,P), where V = {S,A,0,1}, T = {0,1}, S is the start symbol, and P = { S0S, S1A, S1, A1A, A1, S } Two grammars can generate the same language!

19 How to construct a grammar that generates a given language???
There are many techniques from the theory of computation which can be used to systematically construct a grammar for a given formal language, but This is beyond the scope of this course.

20 Types of Phrase-Structure Grammars (1)
Phrase-structure grammars can be classified according to the types of productions that are allowed. Such a classification scheme introduced by Noam Chomsky is as follows: Type 0 grammar: has no restrictions on its production. Type 1, or context-sensitive, grammar: can have productions only of the form w1  w2, where l(w1)  l(w2), or of the form w1  . Type 2, or context-free grammar: can have productions only of the form A w2, where A is a nonterminal symbol.

21 Types of Phrase-Structure Grammars (2)
Type 3, or regular grammar: can have productions only of the form A  aB, A  a, S   , where A and B are nonterminal symbols, S is the start symbol, and a is a terminal symbol. A language generated by a type 1 grammar is called a context-sensitive language; type 2 grammar is called a context-free language; type 3 grammar is called a regular language.

22 Examples { 0m1n | m,n = 0,1,2, … } is a regular language, since it can be generated by a regular grammar G with P: P = { S0S, S1A, S1, A1A, A1, S } { 0n1n | n = 0,1,2, … } is a context-free language, since it can be generated by a context-free grammar G with P: P = { S0S1, S } { 0n1n2n | n = 0,1,2, … } is a context-sensitive language, since it can be generated by a type 1 grammar G = (V,T,S,P) with V = {0,1,2,S,A,B}, T = {0,1,2}, starting symbol S, and productions P = { S0SAB, S, BAAB, 0A01, 1A11, 1B12, 2B22 }; but not by any type 2 grammar.

23 Example: a Phrase-Structure Grammar
G = (V,T,S,P), where V = { a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly; sentence, noun-phrase, verb-phrase, article, adjective, noun, verb, adverb }; T = { a, the, large, hungry, rabbit, mathematician, eats, hops, quickly, wildly }; V–T = { sentence, noun-phrase, verb-phrase, article, adjective, noun, verb, adverb }; S = sentence; Production rules: P

24 P = { sentence  noun-phrase verb-phrase,
noun-phrase  article adjective noun, noun-phrase  article noun, verb-phrase  verb adverb, verb-phrase  verb, article  a, article  the, adjective  large, adjective  hungry, noun  rabbit, noun  mathematician, verb  eats, verb  hops, adverb  quickly, adverb  wildly }

25 Example: Backus-Naur Form
What is the Backus-Naur Form of the grammar for a subset of English described before? <sentence> ::= <noun phrase><verb phrase> <noun phrase> ::= <article><adjective><noun>|<article><noun> <verb phrase> ::= <verb><adverb>|<verb> <article> ::= a | the <adjective> ::= large | hungry <noun> ::= rabbit | mathematician <verb> ::= eats | hops <adverb> ::= quickly | wildly

26 What is Backus-Naur Form (BNF)?
There is another notation that is used to specify a type 2 (context-free) grammar, called the Backus-Naur Form: all productions having the same nonterminal as their left-hand side are combined with the different right-hand sides of these productions, each separated by a bar ( | ), with nonterminal symbols enclosed in angular brackets (<>), and the symbol  replaced by ::= Example: The Backus-Naur form for a grammar that produces signed integers is as follows: <signed integer> ::= <sign><integer> <sign> ::= +|- <integer> ::= <digit>|<digit><integer> <digit> ::= 0|1|2|3|4|5|6|7|8|9

27 What is a Derivation (or Parse) Tree?
A derivation in the language generated by a context-free grammar can be represented graphically using an ordered rooted tree, called a derivation (or parse) tree: the root represents the starting symbol, internal vertices represent nonterminals, leaves represent terminals, and the children of a vertex are the symbols on the right side of a production, in order from left to right, where the symbol represented by the parent is on the left-hand side.

28 Example Construct a derivation tree for the derivation of the sentence, the hungry rabbit eats quickly, discussed previously.

29 How to determine whether a string is in the language generated by a context-free grammar?
Top-down parsing: begins with the starting symbol and proceeds by successively applying productions to see if the given string can be derived. Bottom-up parsing: work backwards.

30 S is the starting symbol, and the productions are S  AB
Example: Determine whether the word cbab belongs to the L(G), where, G = (V,T,S,P) with V = {a,b,c,A,B,C,S}, T = {a,b,c}, S is the starting symbol, and the productions are S  AB A  Ca B  Ba B  Cb B  b C  cb C  b Top-down parsing: S  AB S  AB  CaB S  AB  CaB  cbaB S  AB  CaB  cbaB  cbab Bottom-up parsing: Cab  cbab Ab  Cab  cbab AB  Ab  Cab  cbab S  AB  Ab  Cab  cbab

31 Outline What is a Formal Language? Phrase-Structure Grammars
Finite State Automata Formal languages and Models of Computation

32 Finite State Machines with No Output
Finite-state machines with no output are also called finite-state automata. Finite-state automata do not generate output. But they have a set of special states, called final states. A finite-state automaton is often used for language recognition. This application plays a fundamental role in the design and construction of compliers for programming languages.

33 What is a Deterministic Finite-State Automaton?
A finite-state automaton M = (S,I,f,s0,F) consists of a finite set S of states, a finite input alphabet I, a transition function f that assigns a state to every pair of state and input, an initial state s0, and a subset F of S consisting of final states.

34 How to represent a Finite-State Automaton?
We can represent a finite-state automaton using either a state table or a state diagram. Final states are indicated in the state diagram by using double circles. What is the state table of the above finite-state automaton?

35 What is the language recognized by a given Finite-State Automaton?
An input string is recognized or accepted by an automaton M if the string takes the automaton to one of its final states. The language recognized by an automaton M, denoted by L(M), is the set of all strings that are recognized by M. The language recognized by the above finite-state automaton M is L(M) = { 0n,0n10x | n=0,1,2, …, and x is any string }.

36 Deterministic vs Nondeterministic Finite-State Automata
The finite-state automata discussed so far are deterministic, since for each pair of state and input value there is a unique next state given by the transition function. There is another important type of finite-state automaton in which there may be several possible next states for each pair of state and input value. Such machines are called nondeterministic. Nondeterministic finite-state automata are important in determining which languages can be recognized by a finite-state automaton.

37 What is a Nondeterministic Finite-State Automaton?
M = (S,I,f,s0,F) consists of a finite set S of states, a finite input alphabet I, a transition function f that assigns a set of states to each pair of state and input, an initial state s0, and a subset F of S consisting of final states.

38 How to represent a Nondeterministic Finite-State Automaton?
Using a state table: for each pair of state and input value we give a list of possible next states. Using a state diagram: include an edge from each state to all possible next states, labelling edges with the input(s) that lead to this transition.

39 What is the language recognized by a given Nondeterministic Finite-State Automaton?
What does it mean for a nondeterministic finite-state automaton to recognize a string x = x1x2 … xk? x1 takes the starting state s0 to a set S1 of states; x2 takes each of the states in S1 to a set of states. Let S2 be the union of these sets; Continue this process, including at a stage all states that can be obtained using a state obtained at the previous stage and the current input symbol; The string x is recognized or accepted if there is a final state in the set of all states that can be obtained from s0 using x. The language recognized by a nondeterministic finite-state automaton is the set of all strings recognized by this automaton.

40 Example Determine the language recognized by the nondeterministic finite-state automaton M shown in the following figure. Solution: L(M) = { 0n, 0n01, 0n11 | n=0,1 ,2, … }.

41 An Important Fact Theorem:
If the language L is recognized by a nondeterministic finite-state automaton M0, then L is also recognized by a deterministic finite-state automaton M1. Two finite-state automata are called equivalent if they recognize the same language.

42 Outline What is a Formal Language? Phrase-Structure Grammars
Finite State Automata Formal languages and Models of Computation

43 Build an FSA from a Regular Grammar
Suppose that G = (V,T,S,P) is a regular grammar generating the set L(G), where each production is of the form S   , A  a, or A  aB, with a being a terminal symbol, A and B are nonterminal symbols. We can build a nondeterministic finite-state machine M = (S,I,f,s0,F) that recognizes L(G).

44 M = (S,I,f,s0,F) S: contains a state sA for each nonterminal symbol A of G, and an additional final state sF ; The start state s0 is the state formed from the start symbol S; A transition from sA to sF on input of a is included if A  a is a production; A transition from sA to sB on input of a is included if A  aB is a production; s0 will also be a final state if S   is a production. It can be shown that L(M) = L(G).

45 Example Construct a nondeterministic finite-state automaton that recognizes the language generated by the regular grammar G = (V,T,S,P) where V = {0,1,A,S}, T = {0,1}, and the productions in P are S 1A, S  0, S  , A  0A, A  1A, and A  1.

46 Construct a Regular Grammar from an FSA
Suppose that M = (S,I,f,s0,F) is a finite-state machine with the property that s0 is never the next state for a transition. A regular grammar G = (V,T,S,P) can be defined as follows: V is formed by assigning a symbol to each state of S and each input symbol in I; T is formed from the input symbols in I; S is the symbol formed from the start state s0; The set P of productions is formed from the transitions in M: As  a is included if the state s goes to a final state under input a, where As is the nonterminal symbol formed from s; As  aAt is included if the state s goes to t under input a. S   is included if and only if   L(M). It can be shown that L(G) = L(M).

47 Example Find a regular grammar that generates the language recognized by the finite-state automaton shown in the following figure: Soultion: G = (V,T,S,P) where V = {S,A,B,0,1}, the symbols S,A, and B correspond to the states S0,S1, and S2, respectively; T = {0,1}; S is the start symbol; and The productions are S  0A, S  1B, S  1, S  , A  0A, A  1B, A  1, B  0A, B  1B, B  1.

48 More Powerful Types of Machines (1)
The main limitation of finite-state automata is their finite amount of memory. This prevents them from recognizing languages that are not regular, such as {0n1n|n = 0,1,2,…}. A more powerful model of computation called pushdown automaton can be used to recognize the above language. Theorem: A set is recognized by a pushdown automaton if and only if it is the language generated by a context-free grammar. However, there are sets that cannot be expressed as the language generated by a context-free grammar. One such set is { 0n1n2n|n = 0,1,2, … }.

49 More Powerful Types of Machines (2)
Actually, there exists an even more powerful machine than pushdown automata, called linear bounded automata which can recognize context-sensitive languages such as the sets { 0n1n2n | n=0,1,2, …}; but they cannot recognize all the languages generated by phrase-structure grammars. The most general model of a computing machine is the so-called Turing Machine which can recognize all languages generated by phrase-structure grammars; model all the computations that can be performed on a computing machine.

50 Future: Scientists vs Engineers
Scientists try to understand what is . Engineers try to create what has never been ! The really great engineers have a strong background in science so that they thoroughly understand what is. These special people also have to have the imagination to create what has never been, and this is what really sets them apart ! The methodology of engineering research: There exists some phenomenon of nature for which a model should be found; The mathematical analysis is just a tool that helps one to find this model; The results of any analysis should be confirmed by experiments. Future: What you make it to be !

51 Reference Sections 11.1, 11.3, 11.4 of the following book
Kenneth H. Rosen, Discrete Mathematics and Its Applications, Fifth Edition, McGraw-Hill International Editions, 2004; or The relevant sections of the above book in earlier editions.


Download ppt "COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation Qiang HUO Department of Computer."

Similar presentations


Ads by Google