Presentation is loading. Please wait.

Presentation is loading. Please wait.

CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing.

Similar presentations


Presentation on theme: "CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing."— Presentation transcript:

1 CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Informatics University of Ulster, Magee

2 Objectives of CONFUCIUS  To interpret natural language story and movie (drama) script input and to extract conceptual semantics from the natural language  To generate 3D animation and virtual worlds automatically, with speech and non-speech audio  To integrate the above components to form an intelligent multimedia storytelling system for presenting multimodal stories

3 CONFUCIUS’ context diagram Story in natural language CONFUCIUS Movie/drama script 3D animation non-speech audio Tailored menu for script input Speech (dialogue) Storywriter /playwright User /story listener

4 Literature review

5  Schank’s CD Theory (1972) Primitive & scripts SAM & PAM  Automatic Text-to-Graphics Systems WordsEye (Coyne & Sproat, 2001) ‘Micons’ and CD-based language animation (Narayanan et al. 1995) Spoken Image (Ó Nualláin & Smith, 1994) & its successor SONAS (Kelleher et al. 2000) Previous systems

6  MultiModal interactive storytelling AesopWorld KidsRoom Larsen & Petersen’s Interactive Storytelling Oz Computer games  Embodied intelligent agents divergence on agents’ behavior production BEAT (Cassell et al., 2000) Gandalf PPP persona

7 Architecture of CONFUCIUS 3D authoring tools visual knowledge (3D graphic library) Prefabricated objects (knowledge base) Script writer Script parser Natural Language Processing Text To Speech Sound effects Animation generation Synchronizing & fusion 3D world with audio in VRML Natural language stories Language knowledge mapping lexicon grammar etc semantic representations visual knowledge

8 MultiModal semantic representation Multimodal semantics Language modalityVisual modality Non-speech audio modality Media-independent representation Visual media-dependent representation Intermediate level High-level multimodal semantic representation: XML/frame-based Audio media-dependent representation

9 knowledge base Language knowledge Visual knowledge World knowledge Spatial & qualitative reasoning knowledge Semantic knowledge - lexicons (eg. WordNet) Syntactic knowledge - grammars Statistical models of language Associations between words Object model (nouns) Functional information Internal coordinate axes (for spatial reasoning) Associations between objects Knowledge base of CONFUCIUS Event model (event verbs, describes the motion of objects)

10 Categories of events  Atomic entities Change physical location such as position and orientation, e.g. “bounce”, “turn” Change intrinsic attributes such as shape, size, color, and texture, e.g. “bend”, and even visibility, e.g. “disappear”, “fade” (in/out)  Non-atomic entities Non-character events Two or more individual objects fuse together, e.g. “ melt ” (in) One object divides into two or more individual parts, e.g. “ break ” (into pieces) Change sub-components (their position, size, color), e.g. “ blossom ” Environment events (weather verbs), e.g. “ snow ”, “ rain ” Character events Action verbs  Intransitive verbs  Transitive verbs Non-action verbs (stative, emotion, possession, mental activities, cognition & perception) Idioms & metaphor verbs

11 Categories of action verbs  Intransitive verbs Biped kinematics, e.g. “walk”, “swim”, & other motion models like “fly” Face expressions, e.g. “laugh”, “anger” Lip movement, e.g. “speak”, “say”  Transitive verbs single object, e.g. “throw”, “push”, “kick” multiple objects direct and indirect objects, e.g. “ give ”, “ pass ”, “ show ” indirect object & the instrument, e.g. “ cut ”, “ hammer ” involve speech modality

12 Basic predicate-arguments 1) move(obj, xInc, yInc, zInc) 2) moveTo(obj, loc) 3) moveToward(obj,loc,displacement) 4) rotate(obj,xAngle,yAngle,zAngle) 5) faceTo(obj1, obj2) 6) alignMiddle(obj1, obj2, axis) 7) alignMax(obj1, obj2, axis) 8) alignMin(obj1, obj2, axis) 9) alignTouch(obj1, obj2, axis) 10) touch(obj1, obj2, axis) 11) scale(obj, rate) 12) squash(obj, rate, axis) 13) group(x, [y|_], newObj) 14) ungroup(xyList, x, yList) 3 rd level 2 nd level Atomic level moveToward(), alignMiddle(),alignTouch(), alignMax(), alignMin(), faceTo() move(), moveTo(), rotate(), scale(), squash() touch() Hierarchical structure of predicates

13 Visual definition & word sense verbword sensevisual definition entry mapping word sense -- minimal complete unit of meaning in the language modality visual definition entry -- minimal complete unit of meaning in the visual modality polysemy synonymy Example: “close” (a door) 1.a normal door (rotation on y axis) 2.a sliding door (moving on x axis) 3.a rolling shutter door (a combination of rotation on x axis and moving on y axis) one many

14 Implementation: semantics  VRML bounce(ball):- [moveTo(ball, [0,0,0]), moveTo(ball,[0,20,0])]L. (a) visual definition of “bounce ” Example: “A ball is bouncing” DEF ball Transform { translation 0 0 0 children [ Shape { appearance Appearance{ material Material{} } geometry Sphere { radius 5 } ] } (b) VRML code of a static ball DEF ball Transform { translation 0 0 0 children [ DEF ball-TIMER TimeSensor { loop TRUE cycleInterval 0.5 }, DEF ball-POS-INTERP PositionInterpolator { key [0, 0.5, 1 ] keyValue [0 0 0, 0 20 0, 0 0 0 ] }, Shape { appearance Appearance { material Material {} } geometry Sphere { radius 5 } }] ROUTE ball-TIMER.fraction_changed TO ball-POS-INTERP.set_fraction ROUTE ball-POS-INTERP.value_changed TO ball.set_translation } (c) Output  VRML code of a bouncing ball

15 Comparison of intelligent multimedia systems

16 Software Analysis  Java programming language parsing intermediate representation changing VRML code to create/modify animation integrating modules  Natural language processing tools Gate (pre-processing) PC-PARSE (morphologic and syntax analysis) WordNet (lexicon, semantic inference)  3D graphic modelling existing 3D models on the Internet 3D Studio Max (props & stage) VRML (Virtual Reality Modelling Language) 97, H-anim 2001 spec.  The Actors – using embodied agents Microsoft Agent (the narrator and minor actors) Character Studio, Internet Character Animator (protagonists)

17 Reuse NLP toolkits Semantic inference Coreference resolution Part-of-speech tagger Syntactic parser morphological parser Temporal reasoning Pre-processing GATE 2.0 PC-PARSER WordNet 1.6 LEXICON & MORPHOLOGICAL RULES FEATURES

18 Contribution & prospective applications Prospective practical applications  Children’s education  Multimedia presentation,  Movie/drama production,  Script writing,  Computer games,  Virtual Reality Contribution  multimodal semantic representation of natural language  automatic animation generation  multimodal fusion and coordination

19 Conclusion The objectives of CONFUCIUS meet the challenging problems in language visualisation:  formalizes meaning of action verbs and states  mapping language primitives with visual primitives  a reusable ‘common senses’ knowledge base for other systems  sophisticated spatial and temporal reasoning  representing stories by temporal multimedia requires significant coordination

20 Project schedule


Download ppt "CONFUCIUS: an Intelligent MultiMedia storytelling interpretation & presentation system Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing."

Similar presentations


Ads by Google