CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing.

CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Outline  Related research  Overview of CONFUCIUS  Automatic generation of 3D animation  Semantic representation  Natural language processing  Current state of implementation  Relation to other work  Conclusion & Future work

Faculty Research Student Conference Jordanstown, 15 Jan 2004  3D visualisation Virtual humans & embodied agents: Jack, Improv, BEAT MultiModal interactive storytelling: AesopWorld, KidsRoom, Larsen & Petersen’s Interactive Storytelling, computer games Automatic Text-to-Graphics Systems: WordsEye, CD-based language animation  Related research in NLP Lexical semantics Levin’s verb classes Jackendoff’s Lexical Conceptual Structure Schank’s scripts Related research

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Objectives of CONFUCIUS  To interpret natural language sentences/stories and to extract conceptual semantics from the natural language  To generate 3D animation and virtual worlds automatically from natural language  To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system Story in natural language CONFUCIUS Movie/drama script 3D animation non-speech audio Tailored menu for script input Speech (dialogue) Storywriter /playwright User /story listener

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Architecture of CONFUCIUS 3D authoring tools, existing 3D models & character models visual knowledge (3D graphic library) Prefabricated objects (knowledge base) Script writer Script parser Natural Language Processing Text To Speech Sound effects Animation generation Synchronizing & fusion 3D world with audio in VRML Natural language stories Language knowledge mapping LCS lexicon grammar semantic representations visual knowledge

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Software & Standards  Java parsing semantic representation changing VRML code to add/modify animation integrating modules  Natural language processing tools Connexor Machinese DFG parser (morphologic and syntax parsing) WordNet (lexicon, semantic inference)  3D graphic modelling Existing 3D models (virtual human/object) on Internet Authoring tools Humanoid characters: Character Studio Props & stage: 3D Studio Max Narrator: Microsoft Agent Modelling language & standard VRML 97 for modelling geometry of objects, props, environment H-Anim specifications for humanoid modelling

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Agents and Avatars—How much autonomy? Autonomy & intelligence: highlow autonomous agents avatarsinterface agents Virtual humans:  Autonomous agents have higher requirements for sensing, memory, reasoning, planning, behaviour control & emotion (sense-emotion- control-action structure)  “User-controlled” avatars require fewer autonomous actions-- basic naïve physics such as collision detection and reaction still required  Virtual character in non-interactive storytelling between agents and avatars--its behaviours, emotion, responses to changing environment described in story input characters in non-interactive storytelling

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Graphics library Simple geometry files geometry & joint hierarchy Files (H-Anim) animation library (key frames) objects/props characters motions instantiation

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Level of Articulation (LOA) of H-Anim Joints and segments of LOA1  CONFUCIUS adopts LOA1 in human animation  animation engine adds ROUTEs dynamically based on H-anim’s joints & animation keyframes  CONFUCIUS’ human animation adapted for other LOAs. Example site nodes on hands pushing objects holding objects

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Semantic representations

Faculty Research Student Conference Jordanstown, 15 Jan 2004  Lexical Visual Semantic Representation (LVSR): semantic representation between language syntax and 3D models  LVSR based on Jackendoff’s LCS adapted to task of language visualization (enhancement with Schank’s scripts)  Ontological categories: OBJ, HUMAN, EVENT, STATE, PLACE, PATH, PROPERTY OBJ -- props/places (e.g. buildings) HUMAN -- human being/other articulated animated characters (e.g. animals) as long as their skeleton hierarchy is defined EVENT -- actions, movements and manners STATE -- static existence PROPERTY -- attributes of OBJ/HUMAN Lexical Visual Semantic Representation

Faculty Research Student Conference Jordanstown, 15 Jan 2004 PATH & PLACE predicates  interpret spatial movement of OBJ/HUMANs  62 common English prepositions  7 PATH predicates & 11 PLACE predicates

Faculty Research Student Conference Jordanstown, 15 Jan 2004 NLP in CONFUCIUS Coreference resolution Part-of-speech tagger Syntactic parser Morphological parser Semantic inference Pre-processing Connexor FDG parser WordNet LCS database FEATURES Disambiguation Temporal reasoning Lexical temporal relations Post-lexical temporal relations

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Visual valency & verb ontology 2.2.1. Human action verbs 2.2.1.1. One visual valency (the role is a human, (partial) movement) 2.2.1.1.1. Biped kinematics: arm actions (wave, scratch), leg actions (walk, jump, kick), torso actions (bow), combined actions (climb) 2.2.1.1.2. Facial expressions & lip movement, e.g. laugh, fear, say, sing, order 2.2.1.2. Two visual valency (at least one role is human) 2.2.1.2.1. One human and one object (vt. or vi.+instrument) e.g. throw, push, kick, open, eat, drink, bake, trolley 2.2.1.2.2. Two humans, e.g. fight, chase, guide 2.2.1.3. Visual valency ≥ 3 (at least one role is human) 2.2.1.3.1. Two humans and one object (inc. ditransitive verbs), e.g. give, show 2.2.1.3.2. One human and 2+ objects (vt. + object + implicit instr./goal/theme) e.g. cut, write, butter, pocket, dig, cook 2.2.1.4. Verbs without distinct visualisation when out of context: verbs of trying, helping, letting, creating/destroying 2.2.1.5. High level behaviours (routine events), political and social activities e.g. interview, eat out (go to restaurant), go shopping

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Level-of-Detail (LOD) basic-level verbs & troponyms EVENT go run cause … event level verbs walkclimbjumpmanner level verbs limpstride swagger trot … skipbouncehopjogromp troponym level verbs

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Current status of implementation  Collision detection example (contact verbs: hit, collide, scratch, touch) The car collided with a wallThe car collided with a wall. using ParallelGraphics’ VRML extension--object-to-object collision non-speech sound effects  H-Anim examples: 3 visual valency verbs John put a cup of coffee on the table. H-Anim Site node locative tags of object (on_table tag for table object) 2 visual valency verbs John pushed the door. John ate the bread. Nancy sat on the chair. 1 visual valency verbs The waiter came to me: “Can I help you? Sir.” speech modality & lip synchronization camera direction (avatar’s point-of-view)

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Relation to other work  Domain-independent general purpose humanoid character animation  CONFUCIUS’ character animation focuses on language-to-humanoid animation process rather than considering human modelling & motion solely  Implementable semantic representation LVSR connecting linguistic semantics to visual semantics & suitable for action execution (animation)  Categorization and visualisation of eventive verbs based on visual valency  Reusable common sense knowledge base to elicit implied actions, instruments, goals, themes underspecified in language input

Faculty Research Student Conference Jordanstown, 15 Jan 2004 Prospective applications  Children’s education  Multimedia presentation  Movie/drama production  Computer games  Virtual Reality Conclusion & Future work  Humanoid animation explores problems in language visualization & automatic animation production  Formalizes meaning of action verbs and spatial prepositions  Maps language primitives with visual primitives  Reusable common senses knowledge base for other systems Further work  Discourse level interpretation  Action composition for simultaneous activities  Verbs concerning multiple characters’ synchronization & coordination (e.g. introduce)

CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing.

Similar presentations

Presentation on theme: "CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing.

Similar presentations

Presentation on theme: "CONFUCIUS: An Intelligent MultiMedia Storytelling Interpretation and Presentation System Minhua Eunice Ma Supervisor: Prof. Paul Mc Kevitt School of Computing."— Presentation transcript:

Similar presentations

About project

Feedback