Presentation is loading. Please wait.

Presentation is loading. Please wait.

©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia 08.20.04 Philippe Cudré-Mauroux Distributed.

Similar presentations


Presentation on theme: "©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia 08.20.04 Philippe Cudré-Mauroux Distributed."— Presentation transcript:

1 ©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia 08.20.04 Philippe Cudré-Mauroux Distributed Information Systems Laboratory (LSIR) Swiss Federal Institute of Technology, Lausanne (EPFL)

2 ©2004, Philippe Cudré-Mauroux Outline I.Classical Information Integration (overview) –Global Schema –Multidatabase Language Approach –Federated Databases II.Information Integration in the Large –Context: The Semantic Web –Shared ontologies –The Chatty Web III.State of the Art in Ontology Alignment (overview) IV.Semantic Integration in a Large-Scale Image Sharing Scenario

3 ©2004, Philippe Cudré-Mauroux I. Classical Information Integration Goal: providing a uniform access to multiple heterogeneous information sources More than data exchange (e.g., ASCII, EDI, XML) Old problem, difficult, well-know (partial) solutions

4 ©2004, Philippe Cudré-Mauroux Global Schema Integration Merge multiple databases into one global database Performed by human expert Time consuming and error prone Local autonomy lost Static solution Book(ISBN, Title, Price, Author) Author(Name, ISBN) Livre(ISBN, Prix, Titre) Auteur(Prenom, Nom, ISBN) Book(ISBN, Title) Author(Name, ISBN) S1 S2

5 ©2004, Philippe Cudré-Mauroux Multidatabase Language Approach No attempt at integrating schemas Language (e.g., MSQL) used to integrate information sources at run-time Simple example: Not transparent Heavy burden on (expert) users Global queries subject to local changes Use S1, S2 Select Titre From S1.Book, S2.Livre Where S1.Book.ISBN = S2.Livre.ISBN

6 ©2004, Philippe Cudré-Mauroux Federated Databases Idea: Each information source exports a schema specifying shared relations Tight-coupling: –Global schema integration on all export schema (cf. global schema integration) Loose-coupling: –Dynamic add / drop, e.g., by creating views (logical relations)

7 ©2004, Philippe Cudré-Mauroux GAV (Global as View) Global (mediated) schema is expressed as a view on local schemas Book(ISBN, Title, Author) […] Book(ISBN, Title) Author(Name, ISBN) Create VIEW Book As Select ISBN, Title, Author From S1.Book, S1.Author Where Book.ISBN = Author.ISBN Mediated Schema S1 S2 S3

8 ©2004, Philippe Cudré-Mauroux LAV (Local as View) Local schemas are expressed as a view on global schema Book(ISBN, Title, Author) […] Book(ISBN, Title) Author(Name, ISBN) Create VIEW S1.Book As Select ISBN, Title From Book Mediated Schema S1 S2 S3

9 ©2004, Philippe Cudré-Mauroux LAV / GAV (cont.) Transparent access to heterogeneous databases in the federation Local autonomy is (usually) preserved Query processing through query reformulation Requires global agreement on the mediated schema (tight semantic coupling) Does not scale well

10 ©2004, Philippe Cudré-Mauroux II. Information Integration in the Large Goal: providing a uniform access to many heterogeneous information sources Traditional approaches are inadequate –Lack of adaptability –Lack of transparency –Lack of scalability Hot research area

11 ©2004, Philippe Cudré-Mauroux Some Applications Agent Communication Web services integration Information retrieval from heterogeneous databases Catalog matching P2P information sharing Personal information delivery Vertical information publishing

12 ©2004, Philippe Cudré-Mauroux General Context: The Semantic Web Providing machine-processable data to the Web

13 ©2004, Philippe Cudré-Mauroux RDF/RDFS 2’ Overview RDF triple: RDF Schemas –Classes of resources –Classes of properties –Constraints on the subject (domain) or object (range) –Subclassing Extensible! –Full-fledged ontological language: OWL Subject Object Property

14 ©2004, Philippe Cudré-Mauroux Example: CreativeCommons <rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> Compilers in the Key of C A lovely classical work on compiling code. Yo-Yo Dyne Gnomophone 1842 audio/mpeg

15 ©2004, Philippe Cudré-Mauroux Semantic Interoperability in The Semantic Web Common ontologies provide for shared context –Requires global agreement! Intractable standardization effort! Back to stage 1… Two Plausible solutions: –Agreed-upon corpuses of basic concepts IEEE SUMO Stanford TAP … –Local federation of ontologies fostering global interoperability EPFL Chatty Web U. Washington Piazza … Complementary approaches

16 ©2004, Philippe Cudré-Mauroux The Chatty Web A lab in Trondheim species EMBLChange site at Cambridge Swissprot site at Geneva A lab at MIT organism Query posted at EPFL organism EMBLChange peers species, … SwissProt peers authors, titles, organism, … other peers authors, … organism  authors organism  species species  organism Local translations enabling global agreements Analyzing transitive closures of local mappings

17 ©2004, Philippe Cudré-Mauroux On Translations Links (ontology mappings)

18 ©2004, Philippe Cudré-Mauroux III. State of the Art on Ontology Alignment Problem: Given two ontologies which describe each a set of discrete entities, find the relationships holding between the entities Alignments can then be used to foster interoperability locally Difficult problem (fully automatic solutions?) Active area of research

19 ©2004, Philippe Cudré-Mauroux Local Ontology Alignment Techniques 1.Terminological methods –string-based –language-based Intrinsic Extrinsic Multilingual 2.Structural methods –Internal –External 3.Others –Extensional –Semantic –User Feedback

20 ©2004, Philippe Cudré-Mauroux 1. Terminological Methods String-based: compare labels of entities –(sub-) String equality –Edit distances –Token-based distances (e.g., TF/IDF on substrings) Language-based –Intrinsic Terminological matching with morphological / syntactic analysis (allomorphies) –Extrinsic Use of external resources (e.g., WordNet synsets) –Multilingual methods Matches terms in different languages

21 ©2004, Philippe Cudré-Mauroux 2. Structural Methods Internal (constraint-based): –Data-based domain comparison –Multiplicities / Properties comparison –Similarity between collections External –Mereologic structures –Taxonomic structures –Relations bw similar entities

22 ©2004, Philippe Cudré-Mauroux 3. Other Extensional methods –Extension  set of instances of a class –Jaccard similarity: –Similarity-based extension comparison Semantic Methods –Based on model-theoretic semantics –SAT problem (e.g., subsumption) User Feedback

23 ©2004, Philippe Cudré-Mauroux A Handful of Systems APrompt (Stanford) [T,I,S,U] Cupid (Microsoft research) [T,I,S] Bibster (U. Karlsruhe) [T,I,S] Glue (U. Washington) [E] S-Match (U. Trento) [T,S,M] …  Typically: a mix of techniques [Terminological, Internal structure, external Structure, Extensional, seMantic, User]

24 ©2004, Philippe Cudré-Mauroux IV. Semantic Integration in a Large-Scale Image Sharing Scenario Problem: retrieve a specific image from a large collection of shared images So far: most application mix CB and text analysis –CB image analysis provides a low-level objective representation of an image Good for comparing image features Not so good w.r.t. end-users needs expressed in N.L. –Surrounding text / filenames might sometimes be a high- level subjective view of the image Incomplete, out-of-context description Good w.r.t. N.L. (cf. Google images)

25 ©2004, Philippe Cudré-Mauroux Potential Opportunity Emerging applications make use of high-level, local and semi-contextualized image metadata –Structured metadata (Photoshop Album, XML) –Ontological metadata (RDF, Adobe XMP) –Type-based metadata (Microsoft WinFS) Paradigm shift from the old metadata standards (e.g., keywords, EXIF) –Extensible formats Personal conjecture: –Metadata will be prominent in a few years –Huge opportunity for image retrieval

26 ©2004, Philippe Cudré-Mauroux Structured Metadata Ex.: Photoshop Album Hierarchy of tags Stored in a relational, proprietary, local database Non-exportable

27 ©2004, Philippe Cudré-Mauroux Ontological Metadata (1) Ex.: Extensible Metadata Platform (XMP) Subset of RDF/S Metadata might be embedded into the file Supported by a wide range of Adobe applications –Adobe® Acrobat® –Adobe FrameMaker® –Adobe GoLive® –Adobe Illustrator® –Adobe InCopy® –Adobe InDesign® –Adobe LiveMotion™ –Adobe Photoshop® –Adobe Document Server –Adobe Graphics Server –Version Cue™

28 ©2004, Philippe Cudré-Mauroux Ontological Metadata (2) Ex.: Photoshop XMP schema

29 ©2004, Philippe Cudré-Mauroux Type-Based Metadata (1) New file-system for Longhorn (NTFS +++ ) No more hierarchies (i.e., folders) but metadata Items – Attributes – Relationships – Schemas – Sub-Schemas (extensions) –Déjà vu?

30 ©2004, Philippe Cudré-Mauroux Type-Based Metadata (2) Ex.: image schema in WinFS

31 ©2004, Philippe Cudré-Mauroux Observations So far, all applications using these metadata are local –It is a typical semantic interoperability problem! Efficient, distributed WinFS is not for tomorrow… Image metadata will always be incomplete and subjective #images >> #peers >> #schemas All these formats can be formally described by a subset of Description Logics –Use them all in and in the darkness bind them!

32 ©2004, Philippe Cudré-Mauroux Outline of my Project Objective: large-scale image retrieval framework taking advantage of metadata Outline –Import images –Import metadata –Extract low-level features (thanks to Lei :-) –Store everything in a common, scalable representation –Export data in a shared repository SQL server P2P network (SP2 ?) –Infer Metadata / Schema mappings locally –Cross validate mappings –Cluster peers / images vis-à-vis their subjective views

33 ©2004, Philippe Cudré-Mauroux Specificities Different metadata models Incompleteness of metadata (e.g., WinFS dangling links) Metadata sparseness Few (but widely-used) core-classes Many custom extensions Many resources Low-level representation of the resources Embedded user feedback  Unique application

34 ©2004, Philippe Cudré-Mauroux Finding Mapping Candidates (sketch) [T,I,U] U-Inference based on mutual information (scalable!) schema, metadata Low-level features Low-level features metadata schema, metadata schema, metadata schema feedback metadata, schemas

35 ©2004, Philippe Cudré-Mauroux Cross-Validating Mappings (sketch) [S,M] Cross-validation based on graph partitioning, semantic gossiping or SAT techniques Ref.: Instance-based Schema Matching for Web Databases by Domain-specific Query Probing Jiying Wang, Ji-Rong Wen, Frederick H. Lochovsky, Wei-Ying Ma VLDB 2004

36 ©2004, Philippe Cudré-Mauroux Conclusions Leveraging local metadata produced by end-users –Complex problem! Good heuristics could take years to be developed… Local communications / computations –Scalability Hopefully, better results than keywords / low- level analyses even for simple heuristics –Take advantage of context Images given local semantics Analyze the dynamics of the overall system Objectivity vs. subjectivity of interpretation becomes a measurable quality

37 ©2004, Philippe Cudré-Mauroux References RDF/S: http://www.w3.org/http://www.w3.org/ XMP: http://www.adobe.com/products/xmp/http://www.adobe.com/products/xmp/ WinFS: http://msdn.microsoft.com/Longhornhttp://msdn.microsoft.com/Longhorn Chatty Web: http://lsirpeople.epfl.ch/cudre/http://lsirpeople.epfl.ch/cudre/ Ontology Alignment: knowledgeweb D.2.2.3. –(send me an email to f-pcudre)


Download ppt "©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia 08.20.04 Philippe Cudré-Mauroux Distributed."

Similar presentations


Ads by Google