Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech.

Similar presentations


Presentation on theme: "1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech."— Presentation transcript:

1 1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech Processing (ILSP) “Athena” Research Centre Athens, Greece

2 2 Basic and applied research in the field of Natural Language Processing focusing on the design of computational models for natural language recognition and understanding with application to three interwoven tracks: - information processing, extraction & retrieval - multilingual information processing (multilingual applications & translation systems) - multimedia information processing (fusion of language with other modalities) ILSP/LTA Goals

3 3 Areas of activity language-aware (EL mainly, but also EN) information retrieval and information extraction, text mining, knowledge management, etc. technology amenable to multilingual and multimedia information processing (machine translation, cross-media indexing and retrieval, multimedia summarization) (methods and techniques from machine learning, statistical and rule-based processing as well as hybrid combinations)

4 4 Research and Development directions Aiming at enhancing the capacity of processing multilingual multimedia content Enabling fusion of unimodal (text, speech, image) processing results in order to better understand the workings of language, information access and communication phenomena Preparing for the important role of language technologies in the forthcoming full-fledged convergence of information and edutainment channels (tv, radio, web)

5 5 Multilingual Information Processing  Use parallel corpora for automatically acquiring bilingual lexica in EN – EL  Employ contextual information for lexical transfer selection  Use the annotated parallel corpus and the automatically extracted lexica to build a statistical machine translation infrastructure TRAID  translation memory – Machine Translation Toolkit Research and Development projects (1)

6 6 Improving machine-assisted subtitling in a universal access framework  Investigate the cognitive models underlying human subtitling and implement the appropriate computational architectures   Integrate image processing to improve video segmentation and recognise subtitle unit   Investigate the extent to which existing subtitle generation methods are portable and can be parameterised across special classes of viewers, e.g. children Projects: MUSA /IST Research and Development projects (2)

7 7 Research and Development projects (3) Multimedia indexing and retrieval  Augment the content of multimedia documents with high-level semantic indexical information (e.g. names of entities, terms, topics, facts)  Develop cross-media and cross-language representations to enable linking of topically relevant video programmes, webtexts and images.  Build high-level functionalities like semantic search, retrieval, filtering, categorization, translation, summarization Projects: CIMWOS /IST, REVEAL THIS /IST REVEAL THIS /IST

8 8 Research Focus on Multimedia  Multimedia discourse relations (the COSMOROE framework) applications: cross-media indexing and retrieval, segmentation of audiovisual data, multimedia summarization  Sensorimotor & Symbolic Integration Resources Ongoing work for building an extensible computational resource which associates symbolic representations (words/concepts) with corresponding sensorimotor representations, enriched with patterns of combinations among these representations for forming conceptual structures at different levels of abstraction; focus on human action and interaction in every day life. Going bottom-up in the resource (from sensorimotor representations to concepts) one will get a hierarchical composition of human behaviour, while going top- down (from concepts to sensorimotor representations) one will get intentionally- laden interpretations of those structures

9 9 Cross-Media Decision Mechanisms Mechanisms that decide on the relation that holds between medium specific pieces of information:  across documents (Boll et al. 1999)  within documents (Pastra 2006) The mechanisms decide whether medium-specific pieces of information within the same Multimedia Document are:  associated (multimedia integration)  complementary  semantically compatible/incompatible complementarity independence equivalence

10 10 Cross-media Relation Examples EquivalenceEquivalence: “the yellow taxi-boats…” Essential complementarityEssential complementarity: “…[ pollution has taken its toll ] on that..” Non-essential complementarityNon-essential complementarity: “…we are heading to Patmos…” IndependenceIndependence: “…I have finally found a place that’s not overrun by tourists…”

11 11 Cross-media relations  Equivalence : info expressed by different media refers to the same entity (object, state, event or property)  Complementarity : info in one medium is an (essential or not) complement of the info expressed in another. Essential complementarity  usually indicated through association signals (e.g. indexicals) Non-essentially complementarity  info in one medium is a modifier/adjunct of info expressed in another  Independence : each medium carries an independent (but coherent) part of the MM message Incoherence due to errors in medium-specific processing or artistic/editorial reasons

12 12 Non-essential complementarityNon-essential complementarity: “…we are heading to Patmos…” Essential complementarity Essential complementarity : “…[ pollution has taken its toll ] on this..” Independence Independence : “…I have finally found a place that’s not overrun by tourists…” Application example: a cross-media indexer’s decisions Equivalence Equivalence : “the yellow taxi-boat…” or/and and andandandand 2¬choice2¬choice2¬choice2¬choice 1¬and1¬and1¬and1¬and

13 13 Cross-Media Interaction Relations  Intelligent multimedia systems (IMMS) need mechanisms for analysing and generating semantic links between different modalities (Andre and Rist94, Feiner and McKeown93, Green02, Gut et al. 02, Martin and Kipp04 etc.)  Focus: either image-language, or gesture-language  Semiotics: seminal analysis by Barthes84 (image-text), and Kendon04 (gesture-language)  Automation of relation identification restricted to equivalence/association relations (cf. e.g. Barnard et al.03) mainly between images and text.  Criticism : beyond different wording, different perspectives, different (or lack of clear) criteria, all attempts to define cross- media relations incorporate a qualitative notion of “ contribution ” of each medium to the message, some of them employ the Rhetorical Structure Theory (Mann & Thompson87)

14 14 The case against RST for describing multimedia discourse (1)  Inappropriate nucleus vs. satellite distinction (and the related notion of “contribution”) because: it relies on a single, unique message reading directionality - language manifests itself linearly in time and space vs. - dynamic multimedia that are parallel in space and time (cf. AV data) vs. - static multimedia that are perceived linearly but not in a strictly pre- determined, unique order (cf. illustrated documents) its identification relies usually on lexical cues and syntactic patterns Such subtle cues are abundant in language to denote relations between text segments, only very few denote relation between language and other modalities it presumes that segments are comparable in size Interacting modality units are not comparable (e.g. sentence – image region, word-sequence of frames etc.)

15 getting around the island… S N Means N S Purpose RST RelationNucleusSatellite PurposeI drove a moppetfor getting around the island MeansI got around the islandby driving a moppet Example 1:

16 I got around the island by driving a moppet Example 2:

17 17 The case against RST for describing multimedia discourse (2)  No compliance with media characteristics  image characteristics: specificity, lack of subtle focus/salience indicators and explicit abstraction mechanisms  language characteristics: abstraction, meta-language functions, lack of direct access to sensorimotor entities cf. the following RST relation: “ Elaboration ” = the satellite presents additional detail about the content of the nucleus (e.g. member of a set, an instance of an abstraction, an attribute of an object, something specific in a generalisation)  Lack of descriptive power and computational applicability  mutual exclusiveness of RST relation categories inappropriate for capturing intentionality (Moore & Pollack92)  fuzzy definitions of relations make manual annotation of data for training systems to identify the relations automatically problematic (low- inter-annotator agreement – cf. Carlson et al.03) But images always present more details…

18 Refining the relation set

19 “… helmet for safety...” Token-token equivalence Semantic equivalence in which one modality refers to exactly the same entity that the other also refers too.

20 “…the ever increasing population of Athens has the city bursting at the scenes and has created a vast concrete sprawl of housing...” Type-token equivalence Semantic equivalence in which one modality refers to a class of entities and the other to one or more representative members of the class.

21 “The city, of course, is Athens, and it is here that I will begin my exploration of modern Greece.” Metonymy The two referents come from the same domain, have same array of associations, there is no transfer of qualities from one to another – the two modalities refer to different entities but the user intends the two modalities to be considered semantically equivalent

22 “ It’s very serene …” a) c) b) Metaphor The two modalities refer to different entities of different domains; the user intends the two modalities to be considered semantically equivalent – there is a transfer of qualities

23 a) b) “Do you see the black at the top of the ceiling there ?” Equivalance Signal Equivalence signals present in discourse indicate that one modality is essentially complemented by the other

24 24 Defining Apposition One modality provides extra information to another, information that identifies or describes something/someone and which –when vital for the clear comprehension of the message – is defining. Non-Defining Apposition One modality provides extra information to another, information that identifies or describes something/someone and which and which is not vital for the clear comprehension of the message. Note: Apposition is Different from the Equivalence type:token relation ! (e.g. Bush is an instance of a president not generally, but in certain time and space) “the president…” “the deceased handcuffed”  by nature images will usually give such info, some applications rely on such identification of extra info that seemed originally not important (therefore not present in textual discourse) but then considered significant e.g. crime scene investigation applications Apart from a type:token equivalence relation: “deceased” – image of victim (only part of the image/body shown here), one may identify an apposition relation too: e.g. the tattoe on the hand of the man is extra, descriptive information, complementary to the textual discourse, but not necessarily vital for comprehension  by nature images will usually give such info, some applications rely on such identification of extra info that seemed originally not important (therefore not present in textual discourse) but then considered significant e.g. crime scene investigation applications

25 “…The city is a jumble of the ancient and the modern...” Exophora A pragmatic “anaphora” case

26 OCR: “ Acropolis ” “…we are heading to Patmos… ” Adjunct Non-essential complementarity – one modality functions as an adjunct to the other (place-position, place-direction, time, manner)

27 Athens has been described as the last city in the West and the first city in the East. It's a place that is rich and spectacular in its history. Many empires have held it in their sway: Romans, Venetians, Turks and Byzantines, and the result is a cosmopolitan city of three and a half million people. Symbiosis Each modality expresses different pieces of information the conjunction (in time) of which serves phatic communication (visual fillers ¬ speech fillers)

28 28 Meta-information One modality expresses information that comments on aspects of what the other expresses, going beyond the message communicated to creation-related comments (who created the message, when, why, how – cf. typical archival metadata) “an aerial view of Athens….”

29 29 Annotating Corpora with COSMOROE (2)  Tool = ANVIL (Kipp00)  Levels of association (local context – diff. granularity)  Annotation levels - Audiovisual Topic - Transcript (manual SR, subtitles, manual-OCR) - Body movement (indication of: body-part : hands, head, legs, whole-body type : deictic, iconic, emblem, beat, metaphoric - Images Frame-Sequence : foreground, background, both Keyframe-region : bounding box, free-text label, moving vs. static object indication, corresponding FrameSequence - Relations binding AnchorText entities with movement(s), image(s) etc.

30 30

31 31 Annotation Objectives  T o test the theory for coverage and applicability  T o answer questions on the semantics of multimedia discourse, e.g. - In which cases “ what one sees is not what one hears ” in discourse? - Which concepts are usually visualised in accompanying images or expressed through gestures in discourse and which is their level of abstraction? - How is the interaction between modalities signalled ? - Which concepts are usually complemented with visual or gestural adjuncts ? Could it be that one may predict the selectional restrictions for the arguments of a predicate when knowing its visual/gestural compliments (and vice versa)? - How is exophora realised ? Could one use anaphora resolutions mechanisms to resolve exophora?  T o use machine learning (ML) for automating relation identification for different applications

32 32 Future Work  First Phase : 5h Greek–5 hours English (to be reached by July07)  Investigation of phenomenon of “ entailment ” in multimedia discourse in the above dataset – internal collaboration with Stelios Piperidis-ILSP  Cognitive experimentation (on coherence relations in Multimedia Discourse – notion of degree of fit between modalities as indicator of coherence – collaboration with Dublin Trinity College-Carl Vogel)  Machine Learning for auto identification of relations for indexing of Audiovisual Files in an extended dataset


Download ppt "1 REVEAL THIS and the COSMOROE cross-media relations framework Katerina Pastra Language Technology Applications Department Institute for Language and Speech."

Similar presentations


Ads by Google