Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 1 Chomsky’s Minimalism A Performance Viewpoint.

Similar presentations


Presentation on theme: "Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 1 Chomsky’s Minimalism A Performance Viewpoint."— Presentation transcript:

1 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 1 Chomsky’s Minimalism A Performance Viewpoint “Conic Sections” “Dynamics of Planetary Motion” PF LF Lexical Items Spell-Out Computational System Interface Levels Phonological Logical Form Form Competing Derivations Build Constituents Derivations and the Computational System Cognitive Structures (Schema Assemblages) Semantic Structures (Hierarchical Constituents expressing objects, actions and relationships) “Phonological” Structures (Ordered Expressive Gestures) ProductionProduction PerceptionPerception

2 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 2 Minimal Subscene Action-Object Frame Verb-Argument Structure Sentence Semantics Syntax

3 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 3 Grammaticization: From bag of tricks to systematic syntax Karine Megerdoomian: Unlocking the CF of verbs Unpacking syntactic categories: e.g., mass nouns versus count nouns. From scene components to constituents Heine: in front of/behind PF SF CF Language-Specific “Almost” Language-Independent Cognitive and Semantic Forms (CF & SF) I will use the term SF for Semantic Form (not San Francisco!) The idea is that this occupies the same place as LF in the approach of many linguists, but emphasizes that Logic is more likely to be a useful descriptive tool rather than a strict match for neural representations.

4 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 4 Visualize 3 “forces” acting to turn a cognitive form into a sentence Interpersonal Scene Cognitive Form ------ Sentence Discourse (Text) Michael Halliday: 3 dimensions of grammar

5 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 5 Vision and Language A “debate” with Denis Bickerton:  Bickerton, D., 1995, Language and Human Behavior, Seattle: University of Washington Press.

6 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 6 Abstracting from a Particular View Bickerton (1995, p.22) claims: k A sentence like "The cat sat on the mat" is far more abstract than the image of a particular cat sitting on a particular mat. k An image does not bring in the sense of time distinguishing "The cat sat on the mat" from "The cat is sitting on the mat" or "The cat will sit on the mat". k An image would not distinguish "The cat is sitting on the mat" from "The mat is underneath the cat". Yes, we must reflect these distinctions in characterizing language.

7 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 7 Pictures in the Mind Bickerton asserts that “it is not true that we build a picture of the world and dress it out in language. Rather, language builds us the picture of the world that we use in thinking and communicating.” This idea that language builds our picture of the world – rather than contributing to its richness - is misguided for it ignores the role of visual experience and then of episodic memory (linking episodes in temporal and other relationships) and then expectations in building the rich perceptions and cognitions of which sentences are just a précis. Bickerton’s approach leaves does not help us understand how the ability to mean that a cat is on the mat could be acquired in the first place. The language of the brain or schema network is vastly richer than a linear sequence of words. This does not deny that language can express what pictures cannot - or vice versa!

8 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 8 Individuals Perception is not invertible : Even if we see an actual cat on an actual mat, we are unlikely to recall more than a few details. And what one sees is knowledge-based: e.g., a familiar cat vs. a generic cat, or recognizing a specific subspecies. There is an intimate relation between naming and correct categorization. One important emergent in the transition from nonhuman to human, may be the ability to recognize individuals rather than generic members of a class. Contra this, note that many creatures can, e.g., distinguish their own offspring from other cubs. However, there is a distinction between forming a category with one element (and thus reacting differently to the dominant male or one’s own offspring) and recognizing an individual as a person with a history which (rather than any general category) is to be invoked in determining how one interacts with them.

9 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 9 From Concrete to Abstract 1 Bickerton (1995, p. 22-24) argues that one cannot picture "My trust in you has been shattered forever by your unfaithfulness." because no picture could convey the uniquely hurtful sense of betrayal the act of infidelity provokes if you did not know what trust was, or what unfaithfulness was, or what it meant for trust to be shattered. k “In the case of trust or unfaithfulness, there can be nothing beneath the linguistic concept except other linguistic representations, because abstract nouns have no perceptual attributes to be attached to them and therefore no possible representation outside those areas of the brain devoted to language.” This is wrong on three levels. (i) The words themselves (i.e., the sequences of letters on the page or spoken phonemes) do not convey “the uniquely hurtful sense of betrayal the act of infidelity provokes”. It is only if they “hook into” an appropriate body of experience and association, which not all people will share – the word is the "tip of the schema iceberg". Words must link into the network which itself links to non-verbal experience, both perceptual and behavioral k the schema encyclopedia

10 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 10 From Concrete to Abstract 2 My trust in you has been shattered forever by your unfaithfulness (ii) An image (whether static like a picture, or extended in time like a video clip) may tap a similar network of experience: k we see one person turning away with an expression of despair from the sight of another engaged in lovemaking. The words and the images have complementary strengths – the words make explicit the key relationships, the image provides a host of details that could be only supplied (if indeed they were deemed relevant) by the piling on of more and more sentences. (iii) Bickerton is creating a false dichotomy of vision vs. language. If one recalls a beautiful sunset, then it may be that “The sunset where we saw the green flash at Del Mar” will index the scene in our own thoughts or for communication with others, but one can only begin (palely at that) to recapture the beauty of the scene by forming an image of the setting and the colors of the sky.

11 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 11 From Concrete to Abstract 3 My trust in you has been shattered forever by your unfaithfulness As an exercise, let us try to link the sentence back to the schema network anchored in our action and perception. We look at the definitions of the words and see how they are – eventually – rooted in behavior, noting the necessary role of metaphor in the use of “shattered”, and in the use of “your” to indicate both possession of an object and possession of a disposition. k “My trust in you” expresses the objectification of the behavioral schema Trust (I, You), where Trust(A,B) means “For all C, B tells A that C is the case  A acts on the assumption that C is true”. k I do not argue that our mental states need exploit representations expressing such a formalism – rather the above formula is a shorthand for a whole range of behaviors and expectations that constitute the mental state of “trusting”.

12 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 12 k B is faithful to A is defined socially by a set of behaviors prescribed and proscribed for B by nature of his/her relationship to A. [So again – distinguish being faithful, a disposition, from reflecting on one’s fidelity which is a mental state.] Infidelity is then detected by, perhaps, repeated failure in a prescribed behavior, or possibly even one example of proscribed behavior. k Note the word “scribe” in both prescribe and proscribe – while the criteria may be tested behaviorally, the specification of the criteria is in the ideal case given by written laws. In general, these laws may be replaced by patterns of expected behavior. From Concrete to Abstract 4 My trust in you has been shattered forever by your unfaithfulness

13 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 13 k That an object is broken is, in the grounding case, testable either perceptually – the recognizable structure has been disrupted – or behaviorally – the object does not behave in the expected way. An object is shattered if it is broken into many pieces – implying that repairing the damage (making the object functional again) will be difficult or impossible. Repairing is acting upon an object in such a way as to make it look or perform as it is expected to. k "Shattered forever" then asserts that repair is impossible – there is no set of operations such that at any future time the object will function again. We thus introduce the element of time, and a hypothetical. For all times t > 0, and for all repair operations on object K, there is a function f of K, such that NOT B(f,t). We see that logical form enters into the semantic extension of schemas from the here and now of action and perception. But note too that planning and expectations are implicit in behavior, and relate to the notion of an internal model of the world. Moreover, many instances of these are effective without explicit use of logic – logic seeks to express explicitly a pattern of relationship. Our notions of future time rest on extrapolation from our experience of past times in relation to the expectations of even earlier times. From Concrete to Abstract 5 My trust in you has been shattered forever by your unfaithfulness

14 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 14 Having said all this, note k the many “inventions” required to go from simple wants and actions to a language+thought system rich enough to express the sentence. k But... the formal aspects sketched above do not begin to exhaust the meaning of the sentence, and this can only be done by consideration of the embodied self. k To say my "trust is shattered" also implies a state of emotional devastation that needs empathy of another human to understand. cf. Hadamard on scientists who say that they think in images rather than in words. Note the utility of "nonverbal inference" – e.g., my “Nixon is a male" example. This relates to our seeing language as part of a more general sensory-motor capability. But this does not deny the role of language in teasing out our intuitions for critique, refinement, and transmission. (Look at how skills are transmitted. Words are often just signposts for a complex terrain.) From Concrete to Abstract 6 My trust in you has been shattered forever by your unfaithfulness

15 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 15 Back to the Mirror System Let’s look at the “hints” of language structure in the monkey mirror system, then jump to the human brain and note the many problems that lie ahead.

16 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 16 Minimal subscene links action to actor and instrument, etc. Attention Object Motion Action ~ Minimal Subscene Action = Motion + Goal

17 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 17 A “Pre-Grammar” for F5 Canonical neurons The neural activity of which F5 activity is a part might be thought as coding a command (imperative): We view the firing of activity of F5 canonical neurons is part of the code for Command: Grasp-A(raisin) as a special case of Grasp-A(Object) where Grasp-A is a specific kind of grasp, applicable to a raisin. F5 activity is only part of the code. Neural activity must include many sublinguistic parameters to do with the specification of reach and grasp movements. Hypothesis: F5 “knows” the general class of the action (the motor schema) while parietal cortex “knows” the parametric coding of the specific action. What binds the specific raisin to the role of object? Other parts of the brain (e.g., basal ganglia) then determine whether and when that command will be executed.

18 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 18 A “Pre-Grammar” for F5 Mirror neurons The neural activity of which F5 activity is a part might be thought as coding inter alia a declarative "observation”: We say that the firing of F5 mirror neurons is part of the code for Declarative: Grasp-A(Leo, raisin) as a special case of Grasp-A(Agent, Object) where Grasp-A is a specific kind of grasp, which is seen to be applied to the raisin by the agent. What binds the specific raisin to the role of object as object of observed action rather than goal of intended action? What binds the Leo to the role of agent?  Neural activity for observation need not include many of the sublinguistic parameters to do with the specification of reach and grasp  What is really coded?

19 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 19 A “theory of other minds” question How general is the notion of agent employed by the monkey? Does the monkey recognize Leo as a distinct individual, and if so what does this recognition entail? Repeated warning: We must always take care to ensure that descriptive categories are not automatically ascribed to the “mental strategies” of the subject. Thus: The monkey may have machinery for recognizing that a particular agent is performing a particular grasp on a particular object, yet have no general notion of agent, grasp, or object, nor any means to transfer categorization from the grasping system to other systems in the brain. But on the other hand, he might! How do we decide? (Factor this into the Great Move discussion. Contrast neural systems with dedicated pathways between them from whose data are in a pre-determined code, from pathways that can handle “arbitrary” data. Link to the CCP story in the Saltworks paper.)

20 Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 20 Beware the Hurford Trap Just because we (the human observer) can use a formalism to describe some behavior of an animal does not mean that the formalism corresponds to a representation that plays a causal role within the animal’s brain’s mediation of that behavior. Example: Planets do not use differential calculus to “figure out” their trajectories!


Download ppt "Arbib and Itti: CS 664 (University of Southern California, Spring 2002) Integrating Vision, Action and Language 1 Chomsky’s Minimalism A Performance Viewpoint."

Similar presentations


Ads by Google