Presentation is loading. Please wait.

Presentation is loading. Please wait.

Starting With Complex Primitives Pays Off: Complicate Locally, Simplify Globally ARAVIND K. JOSHI Department of Computer and Information Science and Institute.

Similar presentations


Presentation on theme: "Starting With Complex Primitives Pays Off: Complicate Locally, Simplify Globally ARAVIND K. JOSHI Department of Computer and Information Science and Institute."— Presentation transcript:

1 Starting With Complex Primitives Pays Off: Complicate Locally, Simplify Globally ARAVIND K. JOSHI Department of Computer and Information Science and Institute for Research in Cognitive Science

2 2 Outline Introduction Towards CLSG Syntactic description Semantic composition Statistical processing Psycholinguistic properties Applications to other domains Discourse structure Folded structure of biomolecular sequences Summary

3 3 Introduction Formal systems to specify a grammar formalism Start with primitives (basic primitive structures or building blocks) as simple as possible and then introduce various operations for constructing more complex structures Such systems are string rewriting systems, requiring string adjacency of function and argument Alternatively,

4 4 Introduction: CLSG Start with complex (more complicated) primitives which directly capture some crucial linguistic properties and then introduce some general operations for composing them -- Complicate Locally, Simplify Globally (CLSG) CLSG systems are structure rewriting systems, requiring structure adjacency of function and argument CLSG approach is characterized by localizing almost all complexity in the set of primitives, a key property

5 5 Introduction: CLSG – localization of complexity Specification of the set of complex primitives becomes the main task of a linguistic theory CLSG pushes non-local dependencies to become local, i. e., they arise initially in the primitive structures to start with

6 6 CLSG CLSG approach as led to several new insights into Syntactic description Semantic composition Language generation Statistical processing Psycholinguistic properties Discourse structure

7 7 Context-free Grammars The domain of locality is the one level tree -- primitive building blocks CFG, G S  NP VP VP  V NP VP  VP ADV NP  DET N DET  the N  man/car V  likes ADV  passionately S NPVP man VPADV DET N passionately likes VP NPVADV N N car DET the VP NP V

8 8 Context-free Grammars The arguments of the predicate are not in the same local domain They can be brought together in the same domain -- by introducing a rule S  NP V NP However, then the structure is lost Further the local domains of a CFG are not necessarily lexicalized Domain of Locality and Lexicalization

9 9 Towards CLSG: Lexicalization Lexical item  One or more elementary structures (trees, directed acyclic graphs), which are syntactically and semantically encapsulated. Universal combining operations Grammar  Lexicon

10 10 Lexicalized Grammars Context-free grammar (CFG) CFG, G S  NP VP VP  V NP VP  VP ADV NP  Harry NP  peanuts V  likes ADV  passionately (Non-lexical) (Lexical)S NPVP Harry VPADV V NP passionately likespeanuts

11 11 Weak Lexicalization Greibach Normal Form (GNF) CFG rules are of the form A  a B 1 B 2... B n A  a This lexicalization gives the same set of strings but not the same set of trees, i.e., the same set of structural descriptions. Hence, it is a weak lexicalization.

12 12 Strong Lexicalization Same set of strings and same set of trees or structural descriptions. Tree substitution grammars (TSG) –Increased domain of locality –Substitution as the only combining operation

13 13 :: X  X  X  Substitution

14 14 Strong Lexicalization Tree substitution grammars (TSG) CFG, G S  NP VP VP  V NP NP  Harry NP  peanuts V  likes TSG, G’  1 S NP  VP V NP  likes 22 NP Harry  3 NP peanuts

15 15 Insufficiency of TSG Formal insufficiency of TSG G: S  SS (non-lexical) S  a (lexical) CFG: TSG: G’:  1 : S SS S a 2:2: S SS S a 3:3: S a

16 16 Insufficiency of TSG TSG: G’:  1 : S SS S a 2:2: S SS S a 3:3: S a  : S S S SS SS S S a a a a a G’ can generate all strings of G but not all trees of G. CFGs cannot be lexicalized by TSG’s, i.e., only by substitution.  grows on both sides of the root

17 17  X  X* X  X X  Tree  adjoined to tree  at the node labeled X in the tree  Adjoining

18 18 With Adjoining TSG: G’:  1 : S S*S a 2:2: S S a  3 : a S G: S  SS S  a Adjoining  2 to  3 at the S node, the root node and then adjoining  1 to the S node of the derived tree we have .  : S SS SS a a a CFGs can be lexicalized by LTAGs. Adjoining is crucial for lexicalization. Adjoining arises out of lexicalization

19 19 Lexicalized LTAG Finite set of elementary trees anchored on lexical items -- extended projections of lexical anchors, -- encapsulate syntactic and semantic dependencies Elementary trees: Initial and Auxiliary Operations: Substitution and Adjoining Derivation: –Derivation Tree How elementary trees are put together. –Derived tree

20 20 agreement: person, number, gender subcategorization: sleeps: null; eats: NP; gives: NP NP; thinks: S filler-gap: who did John ask Bill to invite e word order: within and across clauses as in scrambling and clitic movement function – argument: all arguments of the lexical anchor are localized Localization of Dependencies

21 21 Localization of Dependencies word-clusters (flexible idioms): non-compositional aspect take a walk, give a cold shoulder to word co-occurrences lexical semantic aspects statistical dependencies among heads anaphoric dependencies

22 22  S NP  V likes  S NP  V likes NP  e S transitive object extraction some other trees for likes: subject extraction, topicalization, subject relative, object relative, passive, etc. VP LTAG: Examples

23 23 S NP  V likes NP  e S VP S NP  V S*  think VP  V S does S* NP  who Harry Bill     LTAG: A derivation

24 24 S NP  V likes NP  e S VP S NP  V S*  think VP  V S does S* NP  who Harry Bill     substitution adjoining who does Bill think Harry likes LTAG: A Derivation

25 25 LTAG: Derived Tree S NP S V does S NP V think VP S NP V likes e VP who Harry Bill who does Bill think Harry likes

26 26 who does Bill think Harry likes  likes  who  think  Harry  does  Bill * Compositional semantics on this derivation structure * Related to dependency diagrams substitution adjoining LTAG: Derivation Tree

27 27 S a Sb S ab S a S b S a b Topology of Elementary Trees: Nested Dependencies Topology of elementary trees,  and  determines the Nature of dependencies described by the TAG grammar G:   a a a…b b b

28 28 S a S b S a S b S* Topology of elementary trees  and  determines the kinds of dependencies that can be characterized b is one level below a and to the right of the spine Topology of Elementary Trees: Crossed dependencies   

29 29 S a S b S a S b S* S a S b S S a b a a b b Linear structure Topology of Elementary Trees: Crossed dependencies

30 30 Examples: Nested Dependencies Center embedding of relative clauses in English The rat 1 the cat 2 chased 2 ate 1 the cheese Center embedding of complement clauses in German Hans 1 Peter 2 Marie 3 schwimmen 3 lassen 2 sah 1 (Hans saw Peter make Marie swim)

31 31 Examples: Crossed Dependencies Center embedding of complement clauses in Dutch Jan 1 Piet 2 Marie 3 zag 1 laten 2 zwemmen 3 (Jan saw Piet make Marie swim) It is possible to obtain a wide range of complex dependencies, i.e., complex combinations of nested and crossed dependencies. Such patterns arise in word order phenomena such as scrambling and clitic climbing and also due to scope ambiguities

32 32 Factoring recursion from the domain of dependencies (FRD) and extended domain of locality (EDL) All interesting properties of LTAG follow from FRD and EDL: mathematical, linguistic and processing Belong to the class of so-called mildly context-sensitive grammars Automaton equivalent of TAG, embedded pushdown automaton, EPDA LTAG: Some Important Properties

33 33 Processing of crossed and nested dependencies Jan 1 Piet 2 Marie 3 zag 1 laten 2 zwemmen 3 Crossed dependencies (CD): Nested dependencies (ND): Hans 1 Peter 2 Marie 3 schwimmen 3 lassen 2 sah 1 (Jan saw Peter make Marie swim) CD’s are easier to process (about one-half) than ND’s (Bach, Brown, and Marslen-Wilson (1986) Principle of partial interpretation (PPI) EPDA model correctly predicts BBM results Joshi (1990)

34 34 Some Important Properties of LTAG Extended domain of locality (EDL) –Localizing dependencies –Set of elementary trees are the domains for specifying linguistic constraints Factoring recursion from the domain of dependencies (FRD) All interesting properties of LTAG follow from EDL and FRD: mathematical, linguistic, and processing Belongs to the class of mildly context- sensitive grammars

35 35 A different perspective on LTAG Treat the elementary trees associated with a lexical item as if they are super part of speech (super-POS or supertags) Local statistical techniques have been remarkably successful in disambiguating standard POS Apply these techniques for disambiguating supertags -- almost parsing

36 36 Supertag disambiguation -- supertagging Given a corpus parsed by an LTAG grammar –we have statistics of supertags -- unigram, bigram, trigram, etc. –these statistics combine the lexical statistics as well as the statistics of the constructions in which the lexical items appear

37 37 Supertagging the purchase price includes two ancillary companies                                On the average a lexical item has about 8 to 10 supertags

38 38 Supertagging the purchase price includes two ancillary companies                                - Select the correct supertag for each word -- shown in green - Correct supertag for a word means the supertag that corresponds to that word in the correct parse of the sentence

39 39 Supertagging -- performance - Performance of a trigram supertagger - Performance on the WSJ corpus, Srinivas (1997), Chen (2002) Size of the training corpus Size of the test corpus # of words correctly supertagged % correct Baseline 47,000 35,391 75.3% 1 million 47,000 43,334 92.2%

40 40 Abstract character of supertagging Complex (richer) descriptions of primitives (anchors) –contrary to the standard mathematical convention –descriptions of primitives are simple –complex descriptions are made from simple descriptions Associate with each primitive all information associated with it

41 41 Complex descriptions of primitives Making descriptions of primitives more complex –increases the local ambiguity, i.e., there are more descriptions for each primitive –however, these richer descriptions of primitives locally constrain each other –analogy to a jigsaw puzzle -- the richer the description of each primitive the better

42 42 Complex descriptions of primitives Making the descriptions of primitives more complex –allows statistics to be computed over these complex descriptions –these statistics are more meaningful –local statistical computations over these complex descriptions lead to robust and efficient processing

43 43 Flexible Composition  X Split  at x X X  supertree of  at X  subtree of  at X Adjoining as Wrapping

44 44  X  X X  X X   wrapped around  i.e., the two components  and  are wrapped around   supertree of  at X  subtree of  at X Flexible Composition Adjoining as Wrapping

45 45 S V NP  likes NP(wh)  e S VP S NP  V S*S*  think VP  substitution adjoining Flexible Composition Wrapping as substitutions and adjunctions NP  - We can also view this composition as  wrapped around  - Non-directional composition

46 46 S* V NP  likes NP(wh)  e S VP S NP  V S*S*  think VP  substitution adjoining Adjoining as Wrapping Wrapping as substitutions and adjunctions NP    S   and  are the two components of   attached (adjoined) to the root node S of   attached (substituted) at the foot node S of 

47 47 Multi-component LTAG (MC-LTAG) - The components are used together in one composition step with the individual components being composed with either substitution or adjoining - The representation can be used for both -- predicate argument relationships -- scope information - The two pieces of information are together before the single composition step - However, after the composition there may be intervening material between the components

48 48 Tree-Local Multi-component LTAG (MC-LTAG) - How can the components of MC-LTAG compose preserving locality of LTAG - Tree-Local MC-LTAG -- Components of a set compose only with an elementary tree or an elementary component - Flexible composition - Tree-Local MC-LTAGs are weakly equivalent to LTAGs - However, Tree-Local MC-LTAGs provide structural descriptions not obtainable by LTAGs - Increased strong generative power

49 49 Scope ambiguities: Example   S*  NP DET NN every    S*  NP DET NN some  S NP  VPVP V hates  N student N course  ( every student hates some course)

50 50 Derivation with scope information: Example   S*  NP DET NN every    S*  NP DET NN some  S NP  VPVP V hates  N student N course  ( every student hates some course)

51 51 Derivation tree with scope information: Example  (hates)  (E)  (every)  (some)  (S)  (student)  (course) 0 0 1 2. 2 2 2 ( every student hates some course) -  and  are both adjoined at the root of  (hates) - They can be adjoined in any order -  will outscope  (S) if  (E)is adjoined before  (S) - Scope information represented in the LTAG system itself

52 52 Competence/Performance Distinction: A New Twist For a property, P, of language, how does one decide whether P is a competence or a performance property? The answer is not given a-priori It depends on the formal devices (grammars and corresponding machines) available for describing language

53 53 Competence/Performance Distinction: A New Twist With MC-TAG and flexible composition all word order patterns up to two levels of embedding can be described with correct structural structural descriptions assigned, i.e., with correct semantics Examples: center embedding of complement clauses, clitic movement, scope ambiguities, etc. Beyond two levels of embedding, although all word order patterns can be described, there is no guarantee that correct semantics can be assigned to all strings No corresponding result known so far for center embedding of relative clauses as in English

54 54 Summary Complex primitive structures (building blocks) CLSG: Complicate Locally, Simplify Globally CLSG makes non-local dependencies become local, i.e., they are encapsulated in the primitive building blocks New insights into Syntactic description Semantic composition Statistical processing Psycholinguistic properties Applications to other domains


Download ppt "Starting With Complex Primitives Pays Off: Complicate Locally, Simplify Globally ARAVIND K. JOSHI Department of Computer and Information Science and Institute."

Similar presentations


Ads by Google