Presentation on theme: "“Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory."— Presentation transcript:
“Mildly” Deterministic Language Processing November 2007 Jerry Ball Human Effectiveness Directorate Air Force Research Laboratory
2 Constraints on Human Language Processing Visual World Paradigm (Tanenhaus et al. 1995) – Subjects presented with a visual scene – Subjects listen to auditory linguistic input describing scene Immediate determination of meaning – Subjects look immediately at referents of linguistic expressions, sometimes before end of expression Incremental processing Interactive processing (Trueswell et al. 1999) – Ambiguous expressions are processed consistent with scene “the green…” “put the arrow on the paper into the box”
3 According to Crocker (1999), there are three basic mechanisms for dealing with ambiguity in natural language – Serial processing with backtracking or reanalysis – Deterministic processing with lookahead (Marcus 1980) – Parallel processing with alternative analyses carried forward in parallel (Gibson 1991; MacDonald, Pearlmutter & Seidenberg 1994; Trueswell & Tanenhaus 1994) According to Lewis (2000) “…existing evidence is compatible only with probabilistic serial-reanalysis models, or ranked parallel models augmented with a reanalysis component.” According to Gibson & Pearlmutter (2000) “noncompetitive ranked parallel models” are most consistent with the empirical evidence Constraints on Human Language Processing
4 Serial and deterministic with reanalysis for pathological input – Empirical evidence that we don’t carry forward all representations in parallel – Garden Path Sentences “The horse raced past the barn fell” (Bever 1970) – Empirical evidence that we don’t retract previously built representations (Christianson et al. 2001) “While Mary dressed the baby sat up on the bed” – In a post test, a majority of subjects answered yes to the question “Did Mary dress the baby?” – Processing doesn’t slow down with increasing length of non-pathological input – Typically only aware of a single interpretation Constraints on Human Language Processing
5 Parallel and probabilistic with reanalysis for pathological input – Empirical evidence that we may carry forward multiple representations in parallel – Garden Path Effects can be eliminated with sufficient context – Empirical evidence that dispreferred representations can affect processing time (Gibson & Pearlmutter 2000) It’s extremely difficult to empirically falsify either – Could be parallel slow down or occasional switch between serial alternatives that causes effect Don’t have all the answers, but maybe it’s both! – A parallel, probabilistic substrate may make a “mildly” deterministic serial processing mechanism possible! Constraints on Human Language Processing
6 Why should NLP Researchers Care? No NLP system to date has the full capabilities of the Human Language Processor (HLP) Constraints on HLP provide insight into how to build NLP systems – Focuses NLP research in directions which are likely to be productive – Narrows the search space for solutions Adherence to well-established constraints on HLP might actually facilitate development of NLP systems Don’t know what is given up when mechanisms which are not cognitively plausible are adopted
7 Cognitively Implausible Mechanism Serial processing with algorithmic backtracking – Algorithmically simple, but… Computationally intractable for NLP which is highly ambiguous Context which led to dead end is retracted on backtracking – Why give up the context? – How do we know it’s a dead end? Practical Consequences – No hope for on-line processing in real-time in large coverage NLP system – No hope for integration with speech recognition system – Performance degrades with length of input – Can’t easily handle degraded or ungrammatical input
8 Cognitively Implausible Mechanism Multiple pass or multi-stage parsing – First pass assigns part of speech of each word, but… Can’t use full context Errors get propagated – Second pass builds structure Typically limited to using part of speech of words – Third pass determines meaning Practical Consequences – Difficult to do on-line processing in real-time – Can’t easily integrate with speech recognition – Performance degrades with length of input – Limited context available to handle ambiguity at each stage
9 Outrageously Implausible Mechanism! Parsing input from right to left (Microsoft NLP system) – May have engineering advantages, but… Presumes a staged approach to NLP Completely ignores cognitive plausibility Practical consequences Impossible to do on-line processing in real-time – Must wait for end of input Nearly impossible to integrate with speech recognition
10 Cognitively Plausible Mechanism? Deterministic processing with lookahead – Many ambiguities resolved by looking ahead a few words, but… Don’t know how far to look ahead – Cognitive plausibility improved by limiting amount of lookahead 3 constituent lookahead (Marcus 1980) 1 word lookahead (Henderson 2004) Practical consequences – Difficult to use with eager algorithms for which there is good empirical evidence (immediate determination of meaning) – The smaller the lookahead, the less deterministic
11 Cognitively Plausible Mechanism Parallel processing with multiple analyses carried forward – “Full parallelism – where every analysis is pursued – is not psychologically possible” (Crocker 1999) – Cognitive plausibility improved by limiting number of analyses carried forward and ranking alternatives (bounded ranked parallelism) and not having analyses compete Practical Consequences – The longer the input, the less likely to have the correct representation in the parallel spotlight – necessitating a reanalysis mechanism – Impractical if multiple representations must be built at each choice point as opposed to just being selected
12 Some Larger-Scale Models which take Cognitive Plausibility Seriously Marcus, M. (1980). A Theory of Syntactic Recognition for Natural Language Shen, L. & Joshi, A. (2005). Incremental LTAG Parsing Kim, A., Srinivas, B. & Trueswell, J. (2002). A computational model of the grammatical aspects of word recognition as supertagging Brants, T. & Crocker, M. (2000). Probabilistic Parsing and Psychological Plausibility Vosse, T. & Kempen, G. (2000). Syntactic structure assembly in human parsing Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6 Lewis, R. (1993). NL-SOAR
13 LTAG & Supertagging Srinivas & Joshi (1999). Supertagging: An approach to almost parsing Linguistic Theory: Lexicalized Tree Adjoining Grammar (LTAG) – Complex trees associated with lexical items Multi-Pass Processing Mechanism – First Pass: Probabilistic mechanism used to select most coherent set of trees – aka “Supertagging” Probabilities learned using Machine Learning techniques Left and right context used – “the old man” N adj old N* Supertag!
14 LTAG & Supertagging – Second Pass: Parser used to integrate selected trees using substitution and adjunction operations NP det the N N adj old N* + NP det the NN adj old N adjunction + N man substitution NP det the NN adj old N man
15 Improving the Cognitively Plausibility of LTAG Shen & Joshi (2005) – Incremental processing (but still multi-pass for POS) – Eager parser (immediate determination of meaning) Kim, Srinivas & Trueswell (2002). – Incremental processing (but still multi-pass for POS) – Probabilistic, constraint mechanism limited to left context “…much of the computation of linguistic analysis, which has traditionally been understood as the result of structure building operations, might instead be seen as lexical disambiguation” – Substitution and adjunction still needed to connect trees, but typically only 1 way to do so (i.e. deterministic) “Stapling”
16 Double R Model Encoding of Referential and Relational Meaning (Ball in press) Construction Driven Language Processing (Ball 2007) – Activation, selection and integration of constructions corresponding to the linguistic input (lexicalized) – Mildly deterministic, serial processing mechanism (integration) operating over a parallel, probabilistic (constraint-based) substrate (activation & selection) Implemented in a Computational Cognitive Model (Ball, Heiberg & Silber 2007) using the ACT-R Cognitive Architecture (Anderson et al. 2004)
17 For this presentation, focus on integration Serial processing without backtracking! If current input is unexpected given the prior context, then accommodate the input – Adjust the representation – Coerce the input into the representation The following example demonstrates the context accommodation mechanism – “no airspeed or altitude restrictions” Double R Model
18 no “no” object specifier object referring expression = nominal construction
19 no airspeed “airspeed” object head Tree structures created from output of model automatically with new tool for dynamic visualization of ACT-R declarative memory (Heiberg, Harris & Ball 2007) integration
20 no airspeed or altitude “airspeed or altitude” object head Accommodation of conjunction via function overriding override
21 no airspeed or altitude restrictions “airspeed or altitude” modifier “restrictions” object head Appearance of parallel processing! airspeed or altitude = head vs. airspeed or altitude = mod Accommodation of new head via function shift shift
22 Coercion – “the running of the bull” – head of nominal “running” construed objectively, arguments not expressed (“of the bull” functions as a modifier) – “a Bin Laden supporter” Proper Noun functions as modifier – “you’re no Jack Kennedy” Proper Noun functions as head (following specifier) – “the newspaper boy porched the newspaper” – nonce expression (H. Clark 1983) “porched” construed as transitive action Types of Accommodation
23 Override – Single word vs. Multi-Word Expression (MWE) “kicked…” transitive verb – “kicked the bucket” idiomatic expression “take…” transitive verb – “take a hike” “take five” “take time” “take place” “take out” “take my wife, please” “take a long walk off a short pier” … many idiomatic expressions Not possible to carry all forward in parallel – Morphologically simple vs. complex “car…” noun – “carport” noun – “carpet…” noun “carpeting” noun or verb Types of Accommodation
24 Function Shift – “he gave it to me” direct object (initial preference due to inanimacy) – “he gave it the ball” direct object (initial preference) indirect object – “he gave her the ball” indirect object (initial preference due to animacy) – “he gave her to the groom” indirect object (initial preference) direct object Types of Accommodation
25 Function Shift – “he said that…” In context of “said”, “that” typically functions as a complementizer – But subsequent context can cause a function shift from complementizer – “he said that she was happy” To nominal specifier to – “he said that book was funny” To nominal head – “he said that” Types of Accommodation
26 Function Shift – “pressure” vs. “pressure valve” vs. “pressure valve adjustment” vs. “pressure valve adjustment screw” vs. “pressure valve adjustment screw fastener” vs. “pressure valve adjustment screw fastener part” vs. “pressure valve adjustment screw fastener part number” Serial nouns (and verbs) incrementally shift from head to modifier function as each new head is processed Functions like lookahead, but isn’t limited Not clear if a bounded ranked parallel mechanism can handle this! – 2 n possibilities if head or modifier at each word Types of Accommodation
27 Modulated Projection – “the rice” vs. “rice” – “the” projects a nominal and functions as a specifier – In the context of “the” “rice” projects a head which functions as the head of the nominal – When there is no specifier, “rice” projects a nominal as well as a nominal head Types of Accommodation Nominal spec the head rice + vs. head rice Nominal “the rice”“rice”
28 Context Accommodation is part and parcel of the Construction Integration mechanism – Not viewed as a repair mechanism (Lewis 1998) Processor proceeds as though it were deterministic, but accommodates the input as needed Gives the appearance of parallel processing in a serial, deterministic mechanism Summary of Context Accommodation
29 Combining Serial, Deterministic and Parallel, Probabilistic Mechanisms Tree Supertagging Construction Activation & Selection Supertag Stapling Construction Integration Rule ApplicationLexical Rule Selection Rule Selection Rule Application Rule Selection & Application Parallel Probabilistic Serial Deterministic Parallel Distributed Processing CFG PCFG Lexicalized PCFG Double R Probabilistic LTAG PDP Mildly Deterministic Range Nondeterministic The parallel probabilistic substrate makes a mildly deterministic serial processing mechanism possible!
30 Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6. Proceedings of the 8th International Conference on Cognitive Modeling. Ball, J. (2007). Construction-Driven Language Processing. Proceedings of the 2nd European Cognitive Science Conference. Heiberg, A., Harris, J. & Ball, J. (2007). Dynamic Visualization of ACT-R Declarative Memory Structure. Proceedings of the 8th International Conference on Cognitive Modeling. Questions? Ball, J. (in press). A Bi-Polar Theory of Nominal and Clause Structure and Function. Annual Review of Cognitive Linguistics.
31 Crocker, M. (1999). Mechanisms for Sentence Processing. Garrod & Pickering (eds.), Language Processing, London: Psychology Press. Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S, Lebiere, C, and Qin, Y. (2004). An Integrated Theory of the Mind. Psychological Review 111, (4) References Christianson et al. (2001). Thematic roles assigned along the garden path linger. Cognitive Psychology, 42, Bever, T. (1970). The cognitive basis for linguistic structures. In J.R. Hayes (ed.), Cognition and Language Development, New York: Wiley. Gibson, E. & Pearlmutter, N. (2000). Distinguishing Serial and Parallel Parsing. Journal of Psycholinguistic Research, 29, Clark, H. (1983). Making sense of nonce sense. In G. Flores d’Arcais & R. Jarvella (Eds.), The Process of Language Understanding, New York: John Wiley. Henderson, J. (2004). Lookahead in Deterministic Left-Corner Parsing. Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. Barcelona, Spain. Brants, T. & Crocker, M. (2000). Probabilistic Parsing and Psychological Plausibility. Proceedings of COLING,
32 References Lewis, R. (1993). An Architecturally-Based Theory of Human Sentence Comprehension. Unpublished doctoral dissertation, Carnegie-Mellon University. Lewis, R. (1998). Reanalysis and Limited Repair Parsing: Leaping off the Garden Path. In Fodor, J. & Ferreira, F. (eds). Reanalysis in Sentence Processing. Boston: Kluwer Academic. Lewis, R. (2000). Falsifying serial and parallel parsing models: Empirical conundrums and an overlooked paradigm. Journal of Psycholinguistic Research, 29, Marcus, M. (1980). A Theory of Syntactic Recognition for Natural Language. Cambridge, MA: The MIT Press. Joshi, A. & Srinivas, B. (1994). Disambiguation of super parts of speech (Supertags): Almost parsing. Proceedings of the 1994 International Conference on Computational Linguisics (COLING). Kim, A., Srinivas, B. & Trueswell, J. (2002). The convergence of lexicalist perspectives in psycholinguistics and computational linguistics. In Merlo, P. & Stevenson, S. (eds), Sentence Processing and the Lexicon: Formal, Computational and Experimental Perspectives, Philadelphia, PA: Benjamins Publishing Co.
33 References Tanenhaus, M., Spivey-Knowlton, M. Eberhard, K. & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, Trueswell, J. Sekering, I., Hill, N. & Logrip, M. (1999). The kindergarten path effect: studying on-line sentence processing in young children. Cognition, 73, Srinivas, B. & Joshi, A. (1999). Supertagging: An approach to almost parsing. Computational Linguistics, 25, Vosse, T. & Kempen, G. (2000). Syntactic structure assembly in human parsing. Cognition, 75, Shen, L. & Joshi, A. (2005). Incremental LTAG Parsing. Proceedings of the Conference on Human Language Technology and Emprical Methods in NLP, NJ: ACL.