Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPS 2016 ELI SHAMIR, HEBREW UNIVERSITY JERUSALEM PARENTAL VIEW OF CONTEXT – FREE BIRTH AND EVOLUTION.

Similar presentations


Presentation on theme: "MPS 2016 ELI SHAMIR, HEBREW UNIVERSITY JERUSALEM PARENTAL VIEW OF CONTEXT – FREE BIRTH AND EVOLUTION."— Presentation transcript:

1 MPS 2016 ELI SHAMIR, HEBREW UNIVERSITY JERUSALEM PARENTAL VIEW OF CONTEXT – FREE BIRTH AND EVOLUTION

2 Formal Languages Theory- Mid 50’s Confluence of several directions: -Natural Languages [NLP], Syntax Specifications -Early Prog. Languages, Syntax Specifications -Automata & Machine, Formal Specifications -Combinatorial Math. Sets of strings -Biological: L Systems -…

3 Formal Languages Generative Hierarchy Chomsky+ Recursive enumerable Context-sensitive (LBA) Mildly context- sensitive Context free* Regular = FA definable Subsequently integrated into space/time complexity hierarchy- the backbone of theoretical computer science. * Several sub-models studied, related to compiler constructions for programming languages.

4 Context-free [CF] central position due to: equivalence of several distinct models Algebraic equations [MPS] DUAL APPROACH IN ARGGEMENT AND PROOFS Production rules and trees BNF- Backus NF, Syntax of early prog. languages Categorical grammars Dependency structures Lambek algebraic calculus Pushdown Automata… Rich algebraic, combinatorial, algorithmic properties and problems, significant applications.

5 1957- 1963: Boston- Jerusalem Correspondence Linguists : MIT N. Chomsky Y. Bar Hillel HUJI Mathem: Harvard MPS (MARCO) H. Gaifman, M. Perles, E. Shamir Paris [Math PhD students] Main articles, monographs mainly on CF [listed next: 2-5, 19]. Up to 1969, Many other researches and groups in USA, Europe, Japan joined. See publication lists [next few slides]. Inclusion as a basic topic in CS education.

6 Central Publications up to 1969 1.J. Hopcroft and J. Ullman, Formal Languages and their relations to Automata, Assidon-Wesley, 1969. [Extensive reference list] 2.Y. Bar-Hillel, H. Gaifman and E. Shamir, On categorical and phrase structure grammars. Bulletin research council of Israel, vol. 9f (1960), 1-16. 3.Y. Bar-Hillel, M. Perles and E. Shamir, On formal properties of simple phrase, structure grammars, Z. Phonetik, Sprachwiss. Kommun., 14 (1961), 143-172. 2 & 3 reproduced in Y. Bar-Hillel, Language and information, Assidon-Wesley, 1964. 3 appeared as a monograph in Russian, 1964. 4.N. Chomsky, On certain formal properties of grammars, Inf. and Control, 2:2 (1959), 113-124. 5.N. Chomsky and M. P. Schutzenberger, The algebraic theory of context-free languages, Computer Programming and Formal Systems, North Holland, 1963. [Appeared as a monograph] 6.J. Evey, The theory and application of pushdown store machines, Doctoral Thesis, Harvard University, 1963. 7.R. W. Floyd, The syntax of programming languages- a survey, Professional Group Electronic Computers [PGEC], 13: 4 (1964), 346- 353.

7 9.S. Ginsburg, and H. G. Rice, Two families of languages related to ALGOL, JACM, 9: 3, 350- 371, 1962. 10. S. Ginsburg, The mathematical theory of context-free languages, 1966. 11. S. Greibach, A new normal form theorem for context-free grammars, JACM, 12:1, 42-52, 1965. 12.D. E. Knuth, a characterization of parenthesis languages, Inf. and Control, 11: 3, 269-289, 1967. 13.P. S. Landweber, Three theorems on phrase structure grammars of type 1, Inf. and Control, 6:2, 131- 136, 1963. 14.M. Nivat, Transduction des langages de Chomsky, PhD Thesis. Univ. de Paris, 1967. [Also in Annales de l’Institut Fourier, 18: 339- 456, 1968]. 15.R. J. Parikh, On context-free languages, JACM, 13, 570- 581, 1966. 16.D. J. Rosenkrantz, Matrix equations and normal forms for context-free grammars, JACM, 14:3, 501-507,1967. 17.J. Rhodes and E. Shamir, Complexity of grammars by group- theoretic methods, Journal of Combinatorial Theory, 222-239, 1968 18.E. Shamir, A representation theorem for algebraic and context-free power series in noncommuting variables, Inf. and Control, 11, 239- 254, 1967. 19.M. P. Schutzenberger [Several articles: 1960-1965] 20.D. H. Younger, Recognition and parsing of context-free languages in time n, Inf. and Control, 10: 2, 189-208, 1967. 3

8 Chosen Books & Publications After 1970 1.J. Autebert, J. Berstel and L. Boasson, Context-free language and pushdown automata. Chap. 3 In: handbook of formal languages Vol 1. G. Rozenberg and A. Salomaa (eds.), Springer-Verlag 1997. [Extensive reference list] 2.M. Droste, W. Kuich, H. Vogler (Eds.), Handbook of Weighted Automata, Springer 2009. 3.S. Greibach. The hardest context-free language. SIAM J. on computing 3 (1973), 304-310. 4.M. Harrison, Introduction to Formal Language Theory, Addison- Wesley, 1978. 5.L. Kallmeyer, Parsing Beyond Context Free Grammars, Springer, 2010. 6.E. Shamir, Some inherently ambiguous context-free languages. Inf. and Control 18 (1971). 7.J. Berstel, Transductions and context-free languages, Teubner Verlag, 1979. 8.A. Salomaa, Formal Languages, Academic Press, 1973. 9.J. Sakarovitch, Pushdown automata with terminal languages, 421 in Publication RIMS, Kyoto University, 1981, pp. 15- 29. 10.S. Eilenberg, Automata, Languages and Machines, Vol. A & B, Academic Press, 1973. 11.G. Rozenberg and A. Salomaa. The mathematical theory of L systems, Springer 1976. 12.P. Flojolet, Analytic models and ambiguity of context free languages Theor. Comp. Sci 49, 1987 283-309.

9 Hindsight of Central CF Results Chomsky- Schutzenberger Theorems: and their impact Each CFL L= h (DykeᴖR) Dyke= {well bracketed strings}, R= regular language A non-ambiguous L has an algebraic generating function (Sh 1967): Each CFL maps into Non-deter. lifting of 1 sided Dyke hence it is A universal CFL thus a “hardest CFL”. map a  φ(a)= […+…+], φ(a 1 a 2 … a n )= φ(a 1 )… φ(a n )= =[…+…+] […+…+]… […+…+] (multinom product) wϵL(G) iff opening multinom product gives a term in DYKE. (BGS 1960 ): Non-deter. lifting of CAT is also universal (hardest) CFL

10 DYKE-j: All well-bracketed strings with j pairs. CAT: Well-cancelled categories-strings. a  a / b b, a / b  a / b / c c, a  a / b / c b/ c they are determ. CFLs, their non-det. liftings are “Hardest CFL. Algebraic path: Gauss elim-> Greib.NF->SH. Thm. & Pushdown Automat. Derivation path: triplets (p, A, q) [in BPS 1960] -> Pushdown Autom -> Greib. Normal Form and SH. Thm Algorithm and Complexity : impact of the non-decidability results (BPS 1960). Membership and parsing – tabular dynamic prog. algorithms (CYK, Earley,…). Time complexity reduced to multip. of Boolean matrices (L. Valiant, L. Lee). (Hindsight (continued

11 Ambiguity- Complex Issues In (Linear)CFG, in Transductions, in Algeb Equations Inherent ambiguity proofs using pumping in D - trees and by generating function method (Ph. Flajolet) Effect of Transformations on ambiguity Effects on Parsing of product ambiguity degree Inherently 1 or infinite? Open question Eilenberg problem: decomposition of bounded degree language to union of 1 degree languages - open

12 Ambiguity in NLP: Ambiguity in natural languages can be resolved (or created) by cyclic rotation of the sentence: Bible Book of Job chapter 6 verse 14 (six Hebrew words). Translated : "a friend should extend # mercy to the sufferer $, even if he abandons God's fear." Anaphoric ambiguity: the pronoun "he" refers to the sufferer or to the friend? A poetic beautiful answer: to Both. Cyclic rotated sentences, starting at the symbols # and $, resolve the ambiguity towards one way or the other. Political loaded example: the policeman shot # the boy $ with the gun.

13 SRT: SPREAD - ROTATE TRANSFORMATION Of a grammar G, its trees and derived strings internal nodes labelled by prodacts of grammars: SRT TREE root label = #G, leaves labels = H(i) – linear grammars Thm ( invariance claim) 1-1 onto U {D – trees of H(i)} D - trees of #G mapped Mod. Cyclic rotations (of trees and derives strings) But Works perfect for non – expansive CF grammars (quasi-rational) but also for mild context – sensitive with CF skeleton (E.G.LIG grammars) SRT: enhance parsing alg, property tests, and applications cosmetics of the CFG model to enhance its NLP adequacy: *Avoid expansive pumping B BB BUT ADD GENER. POWER BY LOCAL STACKS (AS IN INDEXED GRAMMARS)

14 Top Trunk Rotation of MN to (M*N^) M M EXIT N^ x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y2y2 y1y1 N derived strings: m x 1 x 2 … n^ …y 2 y 1 …y 2 y 1 m x 1 x 2 … n^ for trees: M* 180 Cyclic rotation of

15 N grammar (top trunk) M* grammar B  B’C B’  CB B  DB’ B’  BD B  B^, B^  α B^= root(M) All productions not involving [B] carry over from N to M*; those of M unchanged. Note: Since M may contain symbols of [B] duplicate symbols [B] needed only for the new top trunk of M* The TTR rotation is invertible, one-one onto for the derivation trees, preserving weights and ambiguity degree in ‘cyclic rotated’ sense. SRT For grammars:

16 Example (from [Sh., 1971]) (M)(N) = (u $J u ) (v J$ v), u, v ε {0.1}* = J u = reversal of u, It has unbounded "direct (product) ambiguity" which increases time in CYK algorithm to n In one TTR step (see below) MN is rotated to (M*)(N^) = (v u $ J u v ) (J $), which has a linear grammar, with 3 pump classes. All (product ambiguity) trees are rotated to (union ambiguity) trees for M*N^. Each derived terminal string is CYC-rotated as well! R R R R R 3

17 MILD Context-Sensitive Models & SRT Many models proposed incl. 4 equivalent ones: Linear-Index [LIG], Tree-Adjoint [TAG],…. Should satisfy some formal requirements: Proper extension of CFG, Poly-time parsing algor… We define NE-LIG as follows: Has NE-CFG skeleton aux. symbols A, B,… Each pump-class [B] maintains stack (pushdown) index, stack empty at enter & Exit of several consecutive pump blocks- THUS, it can, with skeleton -symbols as “states”, simulate any PDM, any CFG. The form of production rules is: B[index] C B’[index’], Bˆ[ ]  D[ ] E [ ] Push Pop

18 Glossary CFG/L- Context Free Grammars/Language LIG- Linear Indexed Grammar TAG- Tree Adjoining Grammar NLP- Natural Language Processing CYK- Cocke, Younger, Kasami CNF- Chomsky Normal Form GNF- Greibach Normal Form SRT- Spread Rotate Tree D-Tree- Derivation Tree EPOS- Epoch Semi-Order TTR- Top Trunk Rotation DP- Dynamic Programming NE- Non Expansive POS- Parts of Speech PDM- Pushdown Machine NT- Non terminals (symbols)


Download ppt "MPS 2016 ELI SHAMIR, HEBREW UNIVERSITY JERUSALEM PARENTAL VIEW OF CONTEXT – FREE BIRTH AND EVOLUTION."

Similar presentations


Ads by Google