Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIDST and IMAGES-M Masao Yokota Fukuoka Institute of Technology.

Similar presentations


Presentation on theme: "MIDST and IMAGES-M Masao Yokota Fukuoka Institute of Technology."— Presentation transcript:

1 MIDST and IMAGES-M Masao Yokota Fukuoka Institute of Technology

2 Background & motivation Intelligent systems should be more human-friendly considering…  Floods of multimedia information  Increase of highly matured societies  Development of robots for practical use  The others Solution Integrated Multimedia Understanding System IMAGES-M

3 Speech Processing Unit (SPU) Action Data Processing Unit (APU) Text Processing Unit (TPU) Picture Processing Unit (PPU) Sensory Data Processing Unit (SDPU) Knowledge Base (KB) Inference Engine (IE)

4 Demonstration of IMAGES-M ---Collaboration of TPU and PPU--- (Phase 1) Text to Picture translation Input : Text Output : Pictorial interpretation (Phase 2) Q-A about Picture by Text Input : Query Text Output: Answer Text

5 The lamp above the chair is small. The red pot is 1m to the left of the chair. The blue big box is 3m to the right of the chair. Input text (Japanese/ English/ Chinese) Output picture

6 The octagon is to the upper right of the triangle. The octagon is above the quadrangle. The triangle is to the lower left of the octagon. ・ Output text Input picture

7 Input sentence: Taro ga kubi wo furu (=Taro shakes his head). Output animation:

8 Cross-reference between picture and text

9 Integrated Multimedia Understanding based on L md Picture Animation Text Action Speech Sensory data …… Descriptive power and Computability of Meta Language L md for Intermediate Representation Intermediate Representation

10 Mental Image Directed Semantic Theory (MIDST) proposed by Yokota,M. Information Processing by intelligent entities = Mental Image Processing Mental Images Sensory Images = Sensations coded by Sensors Conceptual Images= Sensory Images processed by Brains ( e.g. Word Concepts)

11 Multimedia Description Language L md based on Mental Image Directed Semantic Theory (MIDST) Syntax Many-sorted predicate logic with a special predicate constant L called “Atomic Locus” Semantics Interpretation in association with an omnisensual mental image model so called “Loci in attribute spaces”

12 LOCATION SHAPE COLOR Omnisensual Mental Image Model Coded Sensations  Loci in Attribute Spaces Sensation (= Sensory event) = Spatio-temporal distribution of stimuli.

13 Atomic Locus L(x,y,p,q,a,g,k) titi tjtj p q x y a titi tjtj PqPq x y a Gt : temporal event Gs : spatial event g= “Matter ‘x’ causes Attribute ‘a’ of Matter ‘y’ to keep or change its value temporally or spatially over a time interval, where the value ‘p’ and ‘q’ are relative to Standard ‘k’.”

14 Terms of Atomic Locus L(  1,  2,  3,  4,  5,  6,  7) TermType NameSemantic Role 11 MatterEvent Causer (EC) 22 Attribute Carrier (AC) 33 Attribute Value Beginning of Locus 44 Ending of Locus 55 AttributeDomain of Attribute Value 66 Event TypeRelation between AC and FAO 77 StandardUnit, Origin, Scale etc for Values

15 (S1) The bus runs from Tokyo to Osaka. (  x, y, k) L( x, y, Tokyo, Osaka, A12, Gt, k)  bus(y) (S2) The road runs from Tokyo to Osaka. (  x, y, k) L( x, y, Tokyo, Osaka, A12, Gs, k)  road(y) A12 : Physical Location Event types Temporal event Tokyo Osaka Spatial event FAO AC

16 Attributes Table 1 Attributes

17 Table 2 Standards Categories of standards Remarks Rigid Standard Objective standards such as denoted by measuring units (meter, gram, etc.). Species Standard The attribute value ordinary for a species. A short train is ordinarily longer than a long pencil. Proportional Standard ‘Oblong’ means that the width is greater than the height at a physical object. Individual Standard Much money for one person can be too little for another. Purposive Standard One room large enough for a person’s sleeping must be too small for his jogging. Declarative Standard The origin of an order such as ‘next’ must be declared explicitly just as ‘next to him’. Standards

18  1  i  2  (  1   2 )   i (  1,  2 )  i : tempo-logical connective  j : locus  : binary logical connective (i.e., , , ,  )  : ‘AND’  i : temporal relation between loci such as ‘before’, ‘during’, etc. Tempo-logical connectives

19 Definition of  i The durations of  1 and  2 are [t 11, t 12 ] and [t 21, t 22 ], respectively.

20 Conceptualization of sensory events...L(x,x,p,q,A12,Gt,k)  L(x,y,p,q,A12,Gt,k)  x  y  p  q... x y x y A12 : Location Time Conceptualization Event 1 Event N Formalization x y

21 (  x, y, p1, p2, k) L(x, x, p1, p2, A12, Gt, k)  (L(x, x, p2, p1, A12, Gt, k)  (L(x, y, p2, p1, A12, Gt, k))  x  y  p1  p2  : Simultaneous AND (SAND)  : Consecutive AND (CAND) t1t1 t2t2 p1 x A12 t3t3 y p2 t SAND and CAND Image of ‘x fetches y’

22 A13: Direction

23 The square is between the circle and the triangle. The circle, square and triangle are in a line. (  u,x,y,z)((z,u,x,y,A12,Gs)  (z,u,y,z,A12,Gs))  (z,u, , ,A13,Gs)  isr(u)  C(x)  S(y)  T(z) x y z u isr: imaginary space region Description of Discrete Spatial Relations

24 Description of spatial events associated with temporal loci in attribute spaces (  x,y,z,p,q)(L(_,x,A,B,A12,Gs,_)  L(_,x,0,10km,A17,Gs,_)  L(_,x,Point,Line,A15,Gs,_)  L(_,x,East,East,A13,Gs,_))   s  (L(_,x,p,C,A12,Gs,_)  L(_,y,q,C,A12,Gs,_)  L(_,z,y,y,A12,Gs,_))  road(x)  street(y)  sidewalk(z)  p  q The road runs 10km straight east from A to B, and after a while, at C it meets the street with the sidewalk.

25 Event Patterns about Location(A12) A12 return meet separate carry start stop

26 Event Patterns about Color(A32)

27 Word meaning description M w  [C p :U p ] ( C p : Concept Part, U p : Unification Part) M w (red)=[ : ARG(Gov,X)] Color of X is red. The ‘governor’ is X. M w (box)=[ : ___ ] Shape of Y is like this. U p is ‘empty’, red box X Y Y

28 The robot carries the book. Surface Structure carries Dep1 Dep2 robot book Surface Dependency Structure the the Conceptual Structure (  x, y, p1, p2, k) L(x, x, p1, p2, A12, Gt, k)  L(x, y, p1, p2, A12, Gt, k)  robot(x)  book(y)  x  y  p1  p2 Mutual projection between surface and conceptual structures using word meaning descriptions and surface dependency structures.

29 Dep1 CARRY Dep2. M w (carry)  [(  x,y,p1,p2,k) L(x,x,p1,p2,A12,Gt,k)  L(x,y,p1,p2,A12,Gt,k)  x  y  p1  p2: ARG(Dep.1,x); ARG(Dep.2,y);] Example(1): ‘carry (verb)’

30 Example(2): ‘desk (noun)’ M w (desk)  [( x) desk(x) : __ ;], where ( x) desk(x)  ( x) (…L*(_,x,/,/,A29,Gt,_)  …  L*(_,x,/,/,A39,Gt,_ )  …) ‘At any time, a desk has no taste(A29), ….., no vitality(A39), …..’

31 Fundamental Semantic Processing on texts by IMAGES-M Detection of Semantic anomalies Semantic ambiguities Paraphrase relations

32 Postulates about the world X  Y*. . X  Y, where Y* denotes that Y holds true over any time-interval. L(x,y,p,q,a,g,k)  L(z,y,r,s,a,g,k). . p=r  q=s

33 Detection of Semantic Anomalies by using postulates (Postulate 1) L(x,y,p1,q1,a,g,k)  L(z,y,p2,q2,a,g,k). . p1=p2  q1=q2 ‘A matter has never different values of an attribute at a time.’

34 Example(1) Tom stays with the guest from Spain. D1 D2 M(stay)=[ (  x, y, p1, p2, k) L(x, y, p1, p2, A12, Gt, k)  x  y  p1=p2 :……. ] M(from)=[ (  x, y,p1, p2, k) L(x,y,p1, p2, A12, Gt, k)  p1  p2: ……… ] D2 violates Postulate 1.

35 Example(2) I drank the coffee on the desk, which was sweet. D1 D2 D1 violates Postulate 1. L(x,y,sweet,sweet,A29,Gt,k)  desk(y)  L(x,y,sweet,sweet,A29,Gt,k)  L(z,y,/,/,A29,Gt,k)  ‘sweet’ = /

36 Detection of Semantic Ambiguities Tom follows Jim with the stick. D1 D2 J s T J T s Pr(D1) Pr(D2)

37 Paraphrasing based on understanding (Input) The girl fetches the book from the village to the town. (Output) The girl goes to the village from the town, and then carries the book from the village to the town.) ( ∃ x1,x2,p1,p2,k) L(x1,x1,p1,p2,A12,Gt,k)( L(x1,x1,p2,p1,A12,Gt,k) ΠL(x1,x2,p2,p1,A12,Gt,k) ) ∧ girl(x1) ∧ book(x2) ∧ town(p1) ∧ village( p2)

38 Why cross-media translation (CMT) is important ? ---Problem --- I have one chair, one flower-pot, one box, one lamp and one cat in my room. The chair is 1m to the right of the flower-pot. The flower-pot is 4m to the left of the box. The red lamp hangs above the chair. The black cat lies under the chair.

39 Systematic CMT Explicit algorithms for : (C1) translating source representations into target ones as for contents describable by both source and target media. (C2) filtering out such contents that are describable by source medium but not by target one. (C3) supplementing default contents, that is, such contents that need to be described in target representations but not explicitly described in source representations. (C4) replacing default contents by definite ones given in the following contexts.

40 Algorithms for : (C1) translating source representations into target ones as for contents describable by both source and target media  APRs (C2) filtering out such contents that are describable by source medium but not by target one.  APRs (C3) supplementing default contents, that is, such contents that need to be described in target representations but not explicitly described in source representations.  X  Y  Z (C4) replacing default contents by definite ones given in the following contexts.  Only to memorize the processing history Realization of systematic CMT

41 Formalization of cross-media translation Y(S mt )=  (X(S ms )) In the case of text-to-picture CMT, S ms = All the attributes in previous Table. S mt = Visual attributes marked by * in previous Table.  is defined by a set of APRs shown in the next table.

42 CMT between Text and Picture Text = The ominisensual world specified by S ms Text Meaning Representation = X(S ms ) Picture Meaning Representation = Y(S mt ) Picture = The visual world specified by S mt  = APRs and Default reasoning

43 APRs Correspondenc es of attributes (Text : Picture) Value conversion schema (Text  Picture) Interpretations of the schema APR-01 A12 : A12 p  p’ ‘position’ into 2D coordinates (within the display area). APR-02 {A12, A13, A17} : A12 { p, d, l}  p’+l’d’ {‘position’, ‘direction’, ‘distance’} into 2D coordinates. APR-03 {A11, A10} : A11 {s, v}  v’s’ {‘shape’, ‘volume’} into a set of outlines of the object. APR-04 c  c’ ‘color’ into 3D coordinates of the color solid. APR-05 {A12, A44} : A12 {p a,m}  {p a ’, p b ’} {‘position’, ‘topology’} into a pair of 2D coordinates. A12 : A12 Attribute Paraphrasing Rules (APRs) Table 4 Attribute paraphrasing rules for text-to-picture translation For example, APR-02 is for such a sentence as “The box is 3 meters to the left of the chair.”

44 S1 = There is a hard cubic object. P1 = S2 = The object is large and red. color=red  C4 P2 = volume=large  C4 shape=cube  C1 hardness=indescribable  C2 color=default  C3 volume=default  C3

45 Discussions and conclusions · The cross-references between texts in several languages (Japanese, Chinese, Albanian and English) and pictorial patterns like maps were successfully implemented on our intelligent system IMAGES-M. · At our best knowledge, there is no other system that can perform cross-media reference in such a seamless way as ours.

46 Future works Automatic acquisition of word meanings from sensory data. Human-robot communication by natural language under real environments etc


Download ppt "MIDST and IMAGES-M Masao Yokota Fukuoka Institute of Technology."

Similar presentations


Ads by Google