Presentation is loading. Please wait.

Presentation is loading. Please wait.

MediaView -- Towards a Semantic Multimedia Database Model Qing Li Dept of Computer Science City University of Hong Kong.

Similar presentations

Presentation on theme: "MediaView -- Towards a Semantic Multimedia Database Model Qing Li Dept of Computer Science City University of Hong Kong."— Presentation transcript:

1 MediaView -- Towards a Semantic Multimedia Database Model Qing Li Dept of Computer Science City University of Hong Kong

2 Outline Motivation & Introduction Modeling Constructs Logical Implementation Real-World Applications Conclusion

3 State-of-the-art Multimedia Systems and Applications an explosive growth in recent years demand on managing multimedia using databases Database techniques for multimedia data modeling indexing query processing presentation & synchronization

4 Semantic Gap semantics-intensive multimedia systems & applications non-semantic multimedia data models requiremodel semantic meaning of the data raw data, primitive properties (size, format, etc) Semantic Gap

5 Semantic modeling of multimedia -- Why hard? Context-dependency Semantics is not a static and intrinsic property The semantics of an object often depends on: the application/user who manipulate the object the role that the object plays other objects in the same context Van Gogh s paintings flower Example:

6 Why hard? (cont.) Modality-independency Media objects of different modalities may suggest the similar/related semantic meanings. Example: Harry Potter has never been the star of a Quidditch team, scoring points while riding a broom far above the ground. He knows no spells, has never helped to hatch a dragon, and has never worn a cloak of invisibility. Query: Results: image videotext

7 MediaView – A Semantic Bridge An object-oriented view mechanism that bridges the semantic gap between multimedia systems and databases Core concept – media view (MV) a customized context for semantic interpretation of media objects (text docs, images, video, etc) collectively constitute the conceptual infrastructure of an multimedia system & application

8 Architecture MediaView Mechanism

9 Fundamentals of MediaView Basic concepts – class vs. MV View operators – basic functions of MV View algebra – derivations of MV Comparison – other dynamic data models

10 Basic Concepts Definition 1: Set C as the set of base classes. A base class Ci C has a unique class name, a type description, and a set of objects associated with it. The type of Ci is referred to as type(Ci), which defines a set of properties as the common interface of all the instances of Ci. The set of properties are referred to as properties(Ci), and each property in it can be a value of a simple type, an instance of a certain class, or a method. The set of objects associated with Ci is defined as extent(Ci)= {o | o Ci}.

11 Basic Concepts Definition 2: A media view MVi is a virtual class that has a unique view name, a type description, and a set of objects associated with it. The type of MVi is referred to as type(MVi), which defines a set of properties properties (MVi) as the common interface of all its instances. Similarly, a property can be a value of a simple type, an instance of a media view, or a method. The set of objects associated with MVi is defined as extent(MVi)= {o | o MVi}.

12 Basic Concepts So, a media view MVi can be represented as a triple: MVi= Where: Mi - a set of objects that are included into MVi as its members. Each object o Mi belongs to a certain source class, and different members of MVi may belong to different source classes. Piv - a set of properties (attributes and methods) applied on either MVi itself (Piv) or on all the members (Pim). Ri - a set of relationships, and each r Ri is in the form of, which denotes a relationship of type t between member oj and ok in MVi; Ri itself may exhibit a graph.

13 Basic Concepts Definition 3: A base class Ci is defined as a subclass of another base class Cj if and only if the following two conditions hold: (1) properties(Cj) properties(Ci), and (2) extent(Ci) extent(Cj). If Ci is the subclass of Cj, we also say that there is an is-a relationship from Ci to Cj. A base schema (BS) is a directed acylic graph G=(V, E), where V is a finite set of vertices and E is a finite set of edges as a binary relation defined on V×V. Each element in V corresponds to a base class Ci. Each edge in the form of e= E represents an is-a relationship from Ci to Cj (or Ci is a subclass of Cj).

14 Basic Concepts Definition 4: A media view MVi is a subview of another media view MVj (or there is an is-a relationship from MVi to MVj) if and only if properties(MVj) properties(MVi) and extent(MVi) extent(MVj). A view schema (VS) is a directed acyclic graph G={V, E}, where a vertex in V corresponds to a media view MVi, and an edge e= E represents an is-a relationship from MVi to MVj (or MVi is a subview of MVj).

15 Basic Concepts An example …

16 Basic Concepts Semantics-based data reorganization via media views

17 Basic Concepts Definition 5: The semantic graph (SG) is an undirected graph G={V, E}, where V is a finite set of vertices and E is a finite set of edges. Each element Vi V corresponds to a multimedia object Oi in the database. E is a ternary relation defined on V×V×N. Each e= E represents a semantic link of degree n between object Oi and Oj, where n is the number of media views to which both objects belong. We define n as the correlation factor between Oi and Oj.

18 Basic Concepts Definition 6: The correlation matrix M =[M ij ] is an adjacency matrix of the semantic graph. Specifically, each element M ij contains the correlation factor between Oi and Oj, with all the diagonal elements set to be zero.

19 Basic Concepts Semantic Graph Model

20 View Operators A set of operators that take media views and view instances as operands. Our intension is not to come up with a complete set of operators, but to focus on those that are indispensable in supporting queries and navigation over multimedia objects.

21 View Operators type-level V-overlap syntax := v-overlap ( ) semantics true, if and only if ( o O)(o extent( ) and o extent( )) Cross syntax{ }:= cross ( ) semantics{ } := {o O | o extent( ) and o extent( )} Sum syntax{ }:= sum ( ) semantics{ } := {o O | o extent( ) or o extent( )} Subtract syntax{ }:= subtract ( ) semantics{ }:= {o O | o extent( ) and o extent( )}

22 View Operators instance-level Class syntax := class( ) semantics is a instance of components syntax{ } := components ( ) semantics { } := { o O | o is a component (direct or indirect) of } i-overlap syntax := i-overlap (, ) semantics true, if and only if ( o O) (o components ( ) and o components( ))

23 View Algebra Functions -- derivation of new MVs from existing MVs Heuristic Enumeration 1. Blind enumeration 2. Content-based enumeration 3. Semantics-based enumeration

24 View Algebra Definition 7. The n-level correlation matrix M(n) is derived from correlation matrix M by the following formula: where n is a positive integer and k (0

25 View Algebra Algebra Operators select from src-MV where project from src-MV intersect (src-MV1, src-MV2) union (src-MV1, src-MV2) difference (src-MV1, src-MV2)

26 Comparison (vs. class) media viewobject class membership heterogeneous objectsuniform objects member acquisition dynamic inclusion/exclusion of existing objects of other classes creating new objects mapping one object can belong to multiple media views one object has exactly one class relationship inter-member semantic relationshipN/A

27 Comparison (vs. traditional object view) media viewobject view membership heterogeneous objectsuniform objects relationship inter-member semantic relationship N/A member properties instance-level properties (user-defined) inherited or derived properties (for view instances) global properties MV-level properties (user- defined) N/A

28 Logical Implementation MediaView Construction MediaView Customization MediaView Evolution

29 MediaViews Construction Work with CBIR systems to acquire the knowledge from queries Learn from previously performed queries A multi-system approach to support multi-modality of media objects Organize the semantics by following WordNet

30 Why WordNet? Different queries may greatly vary with the liberty of choosing query keywords We need an approach to organize those knowledge into a logic structure A simple context : a concept in WordNet Common media views: corresponds to simple contexts We provide all common media views, based on which users can build complex ones.

31 Navigating the Multimedia Database Navigating via semantic relationships of WordNet Semantic RelationshipExamples Synonymy (similar)pipe, tube Antonymy (opposite)fast, slow Hyponymy (subordinate)tree, plant Meronymy (part) chimney, house Troponomy (manner)march, walk Entailmentdrive, ride

32 Navigating the Multimedia Database

33 MediaViews Construction

34 Multi-dimensional Semantic Space IS-A relationship in thesaurus For example, Season has a 4-dimension semantic space [ spring, summer, autumn, winter ]

35 Encoding with Probabilistic Tree A Probabilistic Tree specifies the probability of one media object semantically matching a certain concept in thesaurus.

36 Encoding with Probabilistic Tree Procedure: Step i: Following the thesaurus, trace from the target concept C1 to the root concept Root in thesaurus. Assume the path is:. Start from CC=Cn and initially set P=1. Step ii: Suppose CC=Ci, and the next concept Ci-1 is one of the k sub-concepts of Ci. If CC is encoded in the Probabilistic Tree of this media object, then let If not, we let Step iii: If CC has not reached C1, repeat Step ii. Or, P is the probability of the media object matching concept C1.

37 Evolution through Feedback A progressive approach MediaView is accumulated along with the processes of user interactions Two phases of feedback System-feedback User-feedback

38 Evolution through Feedback

39 Procedure: 1. Record each feedback performed by users. 2. For each CBIR system i involved, calculate its accuracy rate of retrieval. That is, simply divide the total number of retrieved results by the number of correct results according to user feedback. 3. Reset the value of to its accuracy rate respectively. 4. Wait for next session of user feedback.

40 Fuzzy Logic based Evolution Approach Due to the uncertainty of the semantics, can not make an absolute assertion that a media object is relevant or irrelevant to a context A media object in a database may be retrieved as a relevant result to a context several times: the more times a media object is retrieved, the more confidence it has to be considered as relevant to the context.

41 Fuzzy Logic based Evolution Approach For a media object e, a context c, - the accumulation of historial feedback information (from both system and user s) - the adjustment of after each feedback session

42 Inverse Propagation of Feedback The drawback of up-down fashion of calculating the probability E.g. Whether a media object matches season can not leverage from that the media object was a match of spring Solution: propagate the confidence value of a media object being relevant to a concept along the hierarchical structure from bottom- up

43 Inverse Propagation of Feedback Procedure: 1. Wait for a feedback session. 2. For each positive feedback, namely, stating a concept C is relevant to a media object. Following the thesaurus, trace from C to the root concept Root in thesaurus. Assume the path is:. 3. Append Ci as also positive feedback to that media object, where i=1 to n.

44 MediaView Customization Two level MediaView Framework

45 MediaView Customization Dynamically construct complex-context-based media views based on simple ones An example complex context: the Grand Hall in City University Several user-level operators are devised to support more complex/advanced contexts, besides the basic operators

46 User-level Operators INHERIT_MV(N: mv-name, NS: set-of-mv- refs, VP: set-of-property-ref, MP: set-of- property-ref): mv-ref UNION_MV(N: mv-name, NS: set-of-mv-refs): mv-ref INTERSECTION_MV(N: mv-name, NS: set-of- mv-refs): mv-ref DIFFERENCE_MV(N1: mv-ref, N2: mv-ref): mv-ref

47 Build a MediaView in Run-time Example: find out info about "Van Gogh" Who is "Van Gogh"? What is his work? Know more about his whole life. Know more about his country. See his famous painting "sunflower"

48 Build a MediaView in Run-time Who is Van Gogh ? INHERIT_MV( V. Gogh, { },name= Van Gogh,); What is his work? INTERSECTION_MV( work, {, vg}); Know more about his whole life. INTERSECTION_MV( life, {, vg}); Know more about his country. INTERSECTION_MV( country, {, vg}); See his famous painting sunflower Set sunflower = INTERSECTION_MV( sunflower, {, }); Set vg_sunflower = INTERSECTION_MV( vg_sunflower, {vg_work, sunflower});

49 Authoring Scenario Creates a new media view named after the subject All multimedia materials used in the document would be put into this MediaView for further reference. To collect the most relevant materials for authoring, the user performs the MediaView building process. Import suitable media objects by browsing media views Reference the manner and style of authoring, to find other media views with similar topics. Drag & Drop learning-from-references

50 Interface of Our Authoring System

51 System Features A Dynamic Environment Helps a user select materials from the database to incorporate into the document Query other similar media views for referencing the manner and/or style of authoring

52 Real-World Applications A Multimedia Recipe Database Modeling basis Personalized (context-aware) manipulation Cross-media indexing and retrieval system Novel way of annotating and retrieving media objects Lead to new indexing strategies

53 A Personalized Recipe Database System People can not live without foods Existing recipe websites provide huge amounts of recipes throughout the world Fail to give support on analyzing and comparing recipes (What are important cooking principles & skills; what makes two dishes taste so different, etc.) Unable to help users find similar recipes in a comprehensive manner (only keyword-based search on recipe names) Fail to adapt recipes to meet the real-world situation (e.g. due to lack of ingredients or user preference)

54 A Personalized Recipe Database System -- Our Contributions Propose a recipe model which encompasses static attributes as well as dynamic behaviours (e.g. cooking procedures and constraints) Present a novel perspective of evaluating the quality of a recipe by constructing and analysing its cooking graph (capture both action flows and data/ingredient flows) Provide a promising way to address the problem of recipe adaptation heuristically (with flexible and feasible solutions)

55 Recipe on the Web

56 Sample Recipe -- The Cooking Procedure of Triple Cheese Pasta Primavera Step number Recipe cooking procedure in steps 1Dice bell peppers. Slice squash and mushrooms. 2Cook pasta according to package directions in unsalted water. 3 Meanwhile, in a large skillet melt butter. Add bell peppers; cook and stir occasionally until barely tender, about three minutes. 4 Add squash and mushrooms; cook and stir occasionally until barely tender, about four minutes. 5Drain pasta; toss with vegetables in skillet. 6 In the saucepot in which the spaghetti was cooked, combine ricotta, mozzarella, milk, Parmesan, Italian seasoning, salt and black pepper. Over a medium-low heat cook and stir cheese mixture just until hot, about 1 minute. 7 Add reserved pasta and vegetables; toss to coat; remove to a serving platter.

57 Sample Recipe Parsing the Cooking Procedure of Triple Cheese Pasta Primavera

58 Recipe Model A recipe R is modeled and represented by a tuple of three elements: R = where (a) M={Mi | i = 1.. m} – a set of ingredients. An ingredient Mi is either a basic ingredient or a set of ingredients: Mi =, MIDunique identity, MPmember level properties (and functions) such as the name, quantity and image An ingredient Mi belongs to one of the three classes: Main, Minor and Seasoning; (b) RP is a set of recipe-level properties (and functions) applied on R itself, such as the main cooking style, region, nutrition and images of the dish of the recipe;

59 Recipe Model (c) SP = (V, E, Cons, Ingr) is a labeled directed Cooking Graph, V={vi | i = 1..n} is a set of nodes. via cooking action cooking action constraints: Cons(vi)associated constraint conditions that should be satisfied when the action of vi takes place. e.g. conditions on temperature and duration etc. E is a set of directed edges on Vtemporal execution flow of the cooking actions; named action flows. An edge vj should take place after vi. cooking transition constraints: Cons(vi, vj) –the conditions that should be satisfied for the flow to take place. Ingr(vi) – ingredients that should be added into vi O(vi) –the output ingredients of vi These inputs and outputs for the nodes are called ingredient flows.

60 Cooking Graph The Cooking Graph of Triple Cheese Pasta Primavera

61 Basic Properties Definition 1. (Reachability) A cooking graph is defined as reachable if each of its nodes is reachable; a node is reachable if it is on a directed path from a starting node to the end node. Definition 2. (Consistency) A cooking graph is defined to be consistent if the conditions for each node/edge is consistent (i.e. there exists assignment to variables to make the conditions true).

62 Constraints and Rules Definition 3. (Constraint) A constraint is a predicate followed by one or more terms, enclosed in parentheses and separated by commas; a term is either a constant, variable or function expression. Constraints specify all kinds of conditions or restrictions in the recipe model; Three categories: intra-recipe constraints, inter-recipe constraints and outer-recipe constraints. Incompatible(Spinach, Tofu) says spinach and tofu are incompatible and should not be cooked together.

63 Constraints and Rules Definition 4. (Rule) A rule is a logical implication of the form If Ф Then Ψ (or, ), where Ф and Ψ are sentences. Validate the correctness of a recipe through reasoning and recognition process. Handle complex situations such as to make necessary adjustment or compensation once an improper cooking action occurs. Describe cooking skills that have been widely accepted and commonly used. Over_Put(salt) Add(vinegar|water) says that if too much salt has been put into a dish, then neutralize the salty taste by adding either vinegar or water.

64 Recipe Cooking Graph Mining Pattern Some subgraphs occur in one or more cooking graphs and they have certain influence on the cooking effects (e.g. taste, appearance). Find patterns for a set of recipes Whats usually done and whats usually put in the cooking procedure (one action, a series of actions, an ingredients, a set of ingredients, actions combined with ingredients) Cooking graphs of different recipes may share the same pattern Distinct subgraphs that determine the cooking effect (e.g. taste) should be identified

65 Sample Patterns

66 Sample Cooking Style Cooking StylePattern with Dominating Action Soft Deep-fryingCoating + Passing Oil + deep-fry Dry Deep-fryingMarinating + Coating + deep-fry Cooked-fryingPassing Oil/Blanching/Steaming+ stir-fry (+ Thickening) Slip-frying (Marinating + Coating) + Passing Oil + stir-fry + Thickening Soft StirringBlanching/Steaming+ stir + Thickening Braising Passing Oil/Blanching/Steaming + simmer in sauce (+ Thickening) SimmeringBlanching + simmer in water/broth Generally describe how a recipe is cooked in a Pattern Combination or in Graph Abstraction.

67 User Adaptation Usually a user wants to make a dish that has the same cooking result (e.g. taste, appearance) as the recipe exhibits. Unfortunately, the user is very likely to get a slightly or even totally different dish as he/she modifies the cooking procedure. Objective reasonse.g. lack of some ingredients, Subjective reasonse.g. wrong cooking actions by carelessness or personal preference.

68 User Adaptation When the user makes an adaptation, the system will check if the modified cooking graph is feasible. If not, a set of feasible templates are provided. The remaining subgraph is replaced by the user selected one. Property check (Reachability, Consistency) Template Selection and Instantiation

69 Prototype System Global System vs. User Space

70 Prototype System – Recipe Browser

71 Prototype System – Cooking Pattern Miner

72 Prototype System – Similarity Calculator

73 Summary Proposed a data model to represent a recipe Advocated cooking graph mining to find frequent used patterns (actions, ingredients) Attempt to solve recipe adaptation problem by using patterns as templates Developed a prototype systemRecipeView Further work include: discover patterns of cooking graphs Refine and strengthen the algorithm of recipe adaptation

74 Application Scenario

75 Advantages (vs. traditional retrieval techniques) Easy-to-compose query By browsing (to get seed objects of arbitrary modalities) By subject (simply keyword) at various abstraction level Multi-modal results a collection of images, text docs, videos, etc vs. a single type of media Semantically relevant results natural outcome of exploring previously learnt knowledge vs. a set of specifically chosen features

76 Advantages (cont d) Hill-climbing Effect – retrieval performance grows as more user interactions are conducted Materialized knowledge Retrieval process exploration encourage learning User interactions

77 Conclusion MediaView – a semantic multimedia database modeling mechanism to bridge the semantic gap between conventional database and semantics-intensive multimedia applications A set of user-level operators to accommodate the specialization/generalization relationships among the media views

78 Conclusion MediaView promises more effective access to the content of media databases Users could get the right stuff and tailor it to the context of their application easily. Providing the most relevant content from pre- learnt semantic links between media and context high performance database browsing and multimedia authoring tools can enable more comprehensive applications to the user

79 Conclusion Users could customize specific media view according to their tasks, by using user-level operators The effectiveness of using MediaView in the experimental problem domains Multimedia recipe database Cross-media indexing and retrieval

80 Further Issues The development and transition of MediaView to a fully-fledged multimedia database system supporting declarative queries Intensive and extensive performance studies Advanced semantic relations (eg. temporal and spatial ones) can also be incorporated in combining individual media views

81 Thank you! Q & A

Download ppt "MediaView -- Towards a Semantic Multimedia Database Model Qing Li Dept of Computer Science City University of Hong Kong."

Similar presentations

Ads by Google