Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Similar presentations


Presentation on theme: "1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete."— Presentation transcript:

1 1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

2 2 Today Hw1 is due: 1% penalty per hour after 3pm. Probability Theory (from last time) Formal languages, formal grammars, and regular expression  J&M 2 Hw2

3 3 Coming up Next Monday –Finite-state automaton: J&M Chapter 2 Next Wed –Finite-state transducer: J&M Section 3.4 –Hw2 is due

4 4 Probability theory (from last time)

5 5 Common trick #1: Maximum likelihood estimation An example: toss a coin 3 times, and got two heads. What is the probability of getting a head with one toss? Maximum likelihood: (ML)  * = arg max  P(data |  ) In the example, –P(X=2) = 3 * p * p * (1-p)  when p=2/3, P(X=2) reaches the maximum.

6 6 Common trick #2: Chain rule

7 7 Common trick #3: joint prob  Marginal prob

8 8 Common trick #4: Bayes’ rule

9 9 Independent random variables Two random variables X and Y are independent iff the value of X has no influence on the value of Y and vice versa. P(X,Y) = P(X) P(Y) P(Y|X) = P(Y) P(X|Y) = P(X) Our previous coin examples: P(X, C) != P(X) P(C)

10 10 Conditional independence Once we know the value of Z, knowing the value of Y does not help us predict the value of X, and vice versa. P(X,Y | Z) = P(X|Z) P(Y|Z) P(X|Y,Z) = P(X | Z) P(Y|X, Z) = P(Y |Z)

11 11 Independence and conditional independence If A and B are independent, are they conditional independent? Example: –Burglar, Earthquake –Alarm

12 12 Common trick #5: Independence assumption

13 13 An example P(w 1 w 2 … w n ) = P(w 1 ) P(w 2 | w 1 ) P(w 3 | w 1 w 2 ) * … * P(w n | w 1 …, w n-1 ) ¼ P(w 1 ) P(w 2 | w 1 ) …. P(w n | w n-1 ) Why do we make independence assumption which we know are not true?

14 14 Summary of elementary probability theory Basic concepts: sample space, event space, random variable, random vector Joint / conditional /marginal probability Independence and conditional independence Five common tricks: –Max likelihood estimation –Chain rule –Calculating marginal probability from joint probability –Bayes’ rule –Independence assumption

15 15 Formal languages, formal grammars and regular languages

16 16 Unit #1 Formal grammar, language and regular expression Finite-state automaton (FSA) Finite-state transducer (FST) Morphological analysis using FST

17 17 Other units in ling570 Unit #2: LM and smoothing Unit #3: HMM and POS tagging Unit #4: Classification and sequence labeling Unit #5: Introduction to other common tasks (e.g., IR/IE/WSD)

18 18 Regular expression Two concepts: –Regular expression in formal language theory –Regular expression (or pattern) in pattern matching: it is a way of expressing a pattern for the purpose of matching a string Both concepts describe a set of strings. The two concepts are closely related, but the latter is often more expressive than the former.

19 19 Outline Formal language –Regular language Regular expression in formal language theory Formal grammars –Regular grammars Regular expression in pattern matching

20 20 Formal languages

21 21 Definition of formal language An alphabet is a finite set of symbols: –Ex: § = {a, b, c} A string is a finite sequence of symbols from a particular alphabet juxtaposed: –Ex: the string “baccab” –Ex: empty string ² A formal language is a set of strings defined over some alphabet. –Ex1: {aa, bb, cc, aaaa, abba, acca, baab, bbbb, ….} –Ex2: {a n b n | n > 0} –Ex3: the empty set Á

22 22 Definition of regular languages The class of regular languages over an alphabet § is formally defined as: –The empty set, Á, is a regular language – 8 a 2 § [ ², {a} is a regular language. –If L1 and L2 are regular languages, then so are: (a) L 1 ² L 2 = {xy | x 2 L 1 ; y 2 L 2 } (concatenation) (b) L 1  L 2 (union or disjunction) (c) L 1 * = {x 1 x 2 …, x n | x i 2 L 1, n 2 N} (Kleene closure) –There are no other regular languages

23 23 Kleene star Another way to define L*: L 2 = L ² L L n = L n-1 ² L L* = { ² }  L 1  L 2  … Examples: L = {a, bc} L 2 = {aa, abc, bca, bcbc} L* = {abcbca, ….} = { (a|bc)*}

24 24 Properties Regular languages are closed under –Concatenation –Union –Kleene closure Regular languages are also closed under: –Intersection: L 1 Å L 2 –Difference: L 1 – L 2 –Complementation: § * - L 1 –Reversal

25 25 Are the following languages regular? {a, aa, aaa, ….} Any finite set of strings {xy | x 2 § *, and y is the reverse of x} {xx | x 2 § *} {a n b n | n 2 N} {a n b n c n | n 2 N}

26 26 Regular expression

27 27 Definition of Regular expression (as in formal language theory) The set of regular expressions is defined as follows: (1) Every symbol of is a regular expression (2) ² is a regular expression (3) If r 1 and r 2 are regular expressions, so are (r 1 ), r 1 r 2, r 1 | r 2, r 1 * (4) Nothing else is a regular expression.

28 28 Examples ab*c a (0|1|2|..|9)* b (CV | CCV) + C?C?: C is a consonant, V is a vowel Other operations that we can use: a + = a a* a? = (a | ² )

29 29 Relation between regular language and Regex With every regular expression we can associate a regular language. Conversely, every regular language can be obtained from a regular expression. Examples: –Regex = ab*c –Regular language = {ac, abc, abbc, ….}

30 30 Formal grammar

31 31 Definition of formal grammar A formal grammar is a concise description of a formal language. It is a (N, §, P, S) tuple: A finite set N of nonterminal symbols A finite set Σ of terminal symbols that is disjoint from N A finite set P of production rules, each of the form: ( §  N)* N ( §  N)* ! ( §  N)* A distinguished symbol S 2 N that is the start symbol

32 32 Chomsky hierarchy The left-hand side of a rule must contain at least one non-terminal. ®, ¯, ° 2 (N  § )*, A,B 2 N, a 2 § Type 0: unrestricted grammar: no other constraints. Type 1: Context-sensitive grammar: The rules must be of the form: ® A ¯ ! ® ° ¯ Type 2: Context-free grammar (CFGs): The rules must be of the form: A ! ® Type 3: Regular grammar: The rules are of the forms: right regular grammar: A ! a, A ! aB, or A ! ² left regular grammar: A ! a, A ! Ba, or A ! ² Are there other kinds of grammars?

33 33 Strings generated from a grammar The rules are: S → x | y | z | S + S | S - S | S * S | S/S | (S) What strings can be generated? A grammar is ambiguous if there exists at least one string which has multiple parse trees. Is this grammar ambiguous?

34 34 Languages generated by grammars Given a grammar G, L(G) is the set of strings that can be generated from G. Ex: G = (N, §, P, S) N = {S}, § = {a, b, c} P = { S ! a S b, S ! c } What is L(G)?

35 35 The relation between regular grammars and regular languages The regular grammars describe exactly all regular languages. All the following are equivalent: –Regular languages –Regular grammars –Regular expression –Finite state automaton (FSA)

36 36 Relation between grammars and languages (from wikipedia page) Chomsky hierarchy GrammarsLanguagesMinimal automaton Type-0UnrestrictedRecursively enumerable Turing machine n/a(no common name)RecursiveDecider Type-1Context-sensitive Linear-bounded n/aIndexed Nested stack n/aTree-adjoiningMildly context- sensitive Thread Type-2Context-free Nondeterministic pushdown n/aDeterministic context-free Deterministic pushdown Type-3 Regular Finite state

37 37 How about human languages? Are they formal languages? –What is alphabet? –What is string? What type of formal languages are they?

38 38 Outline Formal language –Regular language Regular expression in formal language theory Formal grammar –Regular grammar Patterns in pattern matching  J&M 2.1

39 39 Patterns in Perl [ab] a|b. match any character ^ the starting position in a string $ the ending position in a string (..) defines a marked subexpression a* match “a” zero or more times a+ match “a” one or more time a? match “a” zero or one time a{n,m} “a” appears n to m times

40 40 Special symbols in the patterns \s match any whitespace char \d match any digit \w match any letter or digit \S match any non-whitespace char … \+, \-, \., \?, \*, …

41 41 Examples Integer: (\+|\-)?\d+ Real: (\+|\-)?\d+\.\d+ Scientific notation: (\+|\-)? \d+ (\.\d+)?e (\+|\-)?\d+ Any of the three: (\+|\-)? \d+ (\.\d+)? (e (\+|\-)?\d+)?

42 42 Patterns in Perl and Regex /^(.*)\1$/  { xx | x 2 § *} /^(.+)a(.+)\1\2$/  {xayxy | x, y 2 § *}  The extra power comes from the ability to refer to marked subexpression.

43 43 Outline Formal language –Regular language Regular expression in formal language theory Formal grammars –Regular grammars Regex patterns in pattern matching

44 44 Homework #2

45 45 Part I: probability Part II: formal grammar Part III: regular expression


Download ppt "1 Languages, grammars, and regular expressions LING 570 Fei Xia Week 2: 10/03/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete."

Similar presentations


Ads by Google