Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Similar presentations


Presentation on theme: "Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth."— Presentation transcript:

1

2 Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth SemagixSemagix, Inc. and LSDIS Lab, University of GeorgiaLSDIS Lab

3 Some actual caveats that have been expressed by researchers  “As a constituent technology, ontology work of this sort is defensible. As the basis for programmatic research and implementation, it is a speculative and immature technology of uncertain promise.”  “Users will be able to use programs that can understand semantics of the data to help them answer complex questions … This sort of hyperbole is characteristic of much of the genre of semantic web conjectures, papers, and proposals thus far. It is reminiscent of the AI hype of a decade ago and practical systems based on these ideas are no more in evidence now than they were then.”  “Such research is fashionable at the moment, due in part to support from defense agencies, in part because the Web offers the first distributed environment that makes even the dream seem tractable.”  “It (proposed research in Semantic Web) pre-supposes the availability of semantic information extracted from the base documents -an unsolved problem of many years, …”  “Google has shown that huge improvements in search technology can be made without understanding semantics. Perhaps after a certain point, semantics are needed for further improvements, but a better argument is needed.” Oblivious to recent progress? Lack of pragmatic approach? Narrow vision of semantics and approaches to semantics?

4 Paradigm shift over time: Syntax -> Semantics Increasing sophistication in applying semantics  Relevant Information (Semantic Search & Browsing)  Semantic Information Interoperability and Integration  Semantic Correlation/Association, Analysis, Early Warning

5 Empirical observations based on real-world efforts (presented in the Data Engineering paper)  Applications validate the importance of ontology in the current semantic approaches.  Ontology population is critical.  Two of the most fundamental “semantic” techniques are named entity, and semantic ambiguity resolution.  Semi-formal ontologies that may be based on limited expressive power are most practical and useful. Formal or semi-formal ontologies represented in very expressive languages (compared to moderately expressive ones) have in practice, yielded little value in real-world applications.

6 Empirical observations based on real-world efforts (presented in the Data Engineering paper) … continued  Large scale (also high quality) metadata extraction and semantic annotation is possible.  Support for heterogeneous data is key – it is too hard to deploy separate products within a single enterprise to deal with structured and unstructured data/content management.  Semantic query processing with the ability to query both ontology and metadata to retrieve heterogeneous content is highly valuable.  A vast majority of the Semantic (Web) applications that have been developed or envisioned rely on three crucial capabilities namely ontology creation, semantic annotation and querying/inferencing.

7 Ontology at the heart of the Semantic Web Ontology provides underpinning for semantic techniques in information systems.  A model/representation of the real world (relevant set of interconnected concepts, entities, attributes, relationships, domain vocabulary and factual knowledge).  Basis of capturing agreement, and of applying knowledge  Enabler for improved information systems functionalities and the Semantic Web Ontology = Schema (Description) + Knowledge Base (Description Base) i.e, both T-nodes and A-nodes

8 Gen. Purpose, Broad Based Scope of Agreement Task/ App Domain Industry Common Sense Degree of Agreement Informal Semi-Formal Formal Agreement About Data/ Info. Function Execution Qos Broad Scope of Semantic (Web) Technology Other dimensions: how agreements are reached, … Current Semantic Web Focus Semantic Web Processes Lots of Useful Semantic Technology (interoperability, Integration) Cf: Guarino, Gruber

9 Types of Ontologies (or things close to ontology)  Upper ontologies: modeling of time, space, process, etc  Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), WordNet  Domain-specific or Industry specific ontologies  News: politics, sports, business, entertainment  Financial Market  Terrorism  (GO (a nomenclature), UMLS inspired ontology, …)  Application Specific and Task specific ontologies  Anti-money laundering  Equity Research

10 Ontology-driven Information Systems are becoming reality Software and practical tools to support key capabilities and requirements for such a system are now available:  Ontology creation and maintenance  Knowledge-based (and other techniques) supporting Automatic Classification  Ontology-driven Semantic Metadata Extraction/Annotation  Semantic applications utilizing semantic metadata and ontology Achieved in the context of successful technology transfer from academic research (LSDIS lab, UGA’s SCORE technology) into commercial product (Semagix’s Freedom)

11 Practical Experiences on Ontology Management today  What types of ontologies are needed and developed for semantic applications today?  Is there a typical ontology?  How are such ontologies built?  Who builds them? How long it takes? How are ontologies maintained?  People (expertise), time, money  How large ontologies become (scalability)?  How are ontologies used and what are computational issues?

12 Semagix Freedom Architecture (a platform for building ontology-driven information system) © Semagix, Inc.

13 Practical Ontology Development Observation by Semagix  Ontologies Semagix has designed:  Few classes to many tens of classes and relationships (types); very small number of designers/knowledge experts; descriptional component (schema) designed with GUI  Hundreds of thousands to several million entities and relationships (instances/assertions/description base)  Few to tens of knowledge sources; populated mostly automatically by knowledge extractors  Primary scientific challenges faced: entity ambiguity resolution and data cleanup  Total effort: few person weeks

14 SWETO Semantic Web Technology Evaluation Ontology “semantic test-bed” “ontology with large instances dataset” Public (non-commercial) use ontology built by the LSDIS lab in a NSF funded project using commercial product Semagix Freedom

15 Creating SWETO  Data Sources Selection  semi-structured format  with interconnected instances  with entities having rich metadata  public and open sources preferred

16 Big Picture

17 Statistics Subset of classes in the ontology # Instances Cities, countries, and states 2,902 Airports 1,515 Companies, and banks 30,948 Terrorist attacks, and organizations 1,511 Persons and researchers 307,417 Scientific publications 463,270 Journals, conferences, and books 4,256 TOTAL (as of January 2004) 811,819

18 Statistics … relationships (subset) # Explicit relations located in30,809 responsible for (event)1,425 Listed author in1,045,719 (paper) published in467,367

19 Browsing

20 Entity Disambiguation Disambiguation type# Times used Automatic (Freedom)248,151 Manual210 Unresolved (Removed)591

21 SWETO on the Web http://lsdis.cs.uga.edu/Projects/SemDis/Sweto/  Ontology (schema) in OWL  Visualization of the ontology  Quality, Size  Sources  API  Access via Web Service (to be released) Check it out! http://lsdis.cs.uga.edu/Projects/SemDi s/Sweto/

22 WWW, Enterprise Repositories METADATA EXTRACTORS Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images... Create/extract as much (semantics) metadata automatically as possible, from:  Any format (HTML, XML, RDB, text, docs)  Many media  Push, pull  Proprietary, Deep Web, Open Source Metadata extraction from heterogeneous content/data

23 Automatic Semantic Annotation of Text: Entity and Relationship Extraction KB, statistical and linguistic techniques

24 Automatic Semantic Annotation Limited tagging (mostly syntactic) COMTEX Tagging Content ‘Enhancement’ Rich Semantic Metatagging Value-added Semagix Semantic Tagging Value-added relevant metatags added by Semagix to existing COMTEX tags: Private companies Type of company Industry affiliation Sector Exchange Company Execs Competitors © Semagix, Inc.

25 Solution: In-memory semantic querying (semantic querying in RAM) Complex queries involving Ontology and Metadata Incremental indexing Distributed indexing High performance: 10M queries/hr; less than 10ms for typical search queries 2 orders of magnitude faster than RDBMS for complex analytical queries Knowledge APIs provide a Java, JSP or an HTTP-based interface for querying the Ontology and Metadata Semantic Query Processing and Analytics

26 What can current semantic technology do? (sample application)  Semi-automated (mostly automated) annotation of resources of various, heterogeneous sources (unstructured %, semi-structured, structured data; media content)*  Creation of large knowledge bases (ontology population) from the trusted sources *  Semantic Search and Browsing @, &  Unified access to multiple sources (semantic integration) *, #  Inferenceing #  Relationship/knowledge discovery among the annotated resources and entities; analytics* %  Both implicit^ and explicit* relationships * Commercial: Semagix; @ Commercial: Taalee Semantic Search (Sheth) # Commercial: Network Inference; %: Near-commercial: IBM/SemTAP & IBM: McCool, Guha, ^ LSDIS-UGA Research

27 BLENDED BROWSING & QUERYING INTERFACE ATTRIBUTE & KEYWORD QUERYING uniform view of worldwide distributed assets of similar type SEMANTIC BROWSING Targeted e-shopping/e-commerce assets access VideoAnywhere and Taalee Semantic Search Engine

28

29 Focused relevant content organized by topic (semantic categorization) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3 rd party content integration Equity Research Dashboard with Blended Semantic Querying and Browsing

30 Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)

31 Application to semantic analysis/intelligence  Documentary content and factual evidence are integrated semantically via semantic metadata Intelligence sub-domain ontology Group Alias Person Country Bank Account in Has alias Has email Involved in Occurred at Works for/ leads Location Time Email Add Event Occurred at Originated in Is funded by/works with Watch-list Appears on Watch-list Appears on Has position Role Classification Metadata: Cocaine seizure investigation Semantic Metadata extracted from the article: Person is “Giulio Tremonti” Position of “Giulio Tremonti” is “Economics Minister” “Guilio Tremonti” appears on Watchlist “PEP” Group is Political party “Integrali” “Integrali” is the “Italian Government” “Italian Government” is based in “Rome” Corroborating Evidence Evidence © Semagix, Inc.

32  Mechanisms for querying about and retrieving complex relationships between entities. A B C 1. A is related to B by x.y.z x y z z’ y’ 2. A is related to C by i. x.y’.z’ u v ii. u.v (undirected path)  3. A is “related similarly” to B as it is to C (y’  y and z’  z  x.y.z  x.y’.z’) So are B and C related? ? Semantic Associations: Beyond simple relationships

33 Context: Why, What, How?  Context => Relevance; Reduction in computation space  Context captures the users’ interest to provide the user with the relevant knowledge within numerous relationships between the entities  By defining regions (or sub-graphs) of the ontology we are capturing the areas of interest of the user

34 Context Weight - Example Region 1 : Financial Domain, weight=0.50 Region 2 : Terrorist Domain, weight=0.75 e7:Terrorist Organization e4:Terrorist Organization e8:Terrorist Attack e6:Financial Organization e2:Financial Organization e 1 :Person e 9 :Location e 5 :Person friend Of member Of located In e3:Organization supports has Account located In works For member Of involved In at location

35 Anti Money Laundering – Know Your Customer Risk Profiles are developed for individuals or companies. If the risk profile changes based on new information the individuals Risk Profile and Branch Aggregate Risk Profile is automatically updated R

36 View Risk Scores for a specific company or customer

37 Semantic Querying, Browsing, Integration to find potential antifungal drug targets Databases for different organisms Is this or similar gene in other organism? (most Antifungals are associated with Sterol mechanism ) Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize FGDB

38 Future: Geospatial Semantic Analytics: Thematic + Space + Time

39 Future: 3D Semantic Geo-visualization of Movements in Space and Time

40 Conclusion  Great progress from work in semantic information interoperability/integration of early 90s until now, re-energized by the vision of Semantic Web, related standards and technological advances  Technology beyond proof of concept  But lots of difficult research and engineering challenges ahead  More: (Technology) http://www.semagix.com/downloads/downloads.shtml (Research) http://lsdis.cs.uga.edu/proj/SemDis/http://www.semagix.com/downloads/downloads.shtmlhttp://lsdis.cs.uga.edu/proj/SemDis/  Demos available


Download ppt "Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth."

Similar presentations


Ads by Google