Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth.

Slides:



Advertisements
Similar presentations
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Talk Abstract Ontology-driven Integration and Analysis for Semantic Applications in Business Intelligence and National Security Ontology and Semantic Web.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
1 Dr Alexiei Dingli Introduction to Web Science Conclusion.
Information and Business Work
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Sensemaking and Ground Truth Ontology Development Chinua Umoja William M. Pottenger Jason Perry Christopher Janneck.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
AceMedia Personal content management in a mobile environment Jonathan Teh Motorola Labs.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Alternatives to Metadata IMT 589 February 25, 2006.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Implementing Metadata Marjorie M K Hlava, President Access Innovations, Inc. Albuquerque, NM
Software Documentation Written By: Ian Sommerville Presentation By: Stephen Lopez-Couto.
User Experiences of Enterprise Semantic Content Management Amit Sheth Panel at Symposium on the User Experience of Business Intelligence & Knowledge Management,
Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications WWW2004 (New York, May 22,
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Module 3: Business Information Systems Chapter 11: Knowledge Management.
© 2010 IBM Corporation © 2011 IBM Corporation September 6, 2012 NCDHHS FAMS Overview for Behavioral Health Managed Care Organizations.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Practical RDF Chapter 1. RDF: An Introduction
1 The BT Digital Library A case study in intelligent content management Paul Warren
White House Conference on Semantic Technology Presenter: Clemens Bertram, VP Engineering.
Talk Abstract Semantic Web in Action Ontology-driven information search, integration and analysis Net Object Days and MATES, Erfurt, September 23, 2003.
IST SEWASIE SEWASIE 3rd Review March 14, 2005 SEWASIE Value Proposition and End User Demo Andreas Becks.
Capturing and Applying Existing Knowledge to Semantic Applications or Ontology-driven Information Systems in Action Invited Talk “Sharing the Knowledge”
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Semantic (Web) Technology in Action - today The Semantic Web – Scientific American article considered harmful? WWW2003 Panel (PN2), Budapest, May 21, 2003.
Using Several Ontologies for Describing Audio-Visual Documents: A Case Study in the Medical Domain Sunday 29 th of May, 2005 Antoine Isaac 1 & Raphaël.
Welcome! Invitational Workshop on Database and Information Systems Research For Semantic Web and Enterprises Invitational Workshop on Database and Information.
Knowledge Enabled Information and Services Science New World Order for Interactions across Enterprise Information Systems in the Flat World Amit Sheth*
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Semantic Web: The Future Starts Today “Industrial Ontologies” Group InBCT Project, Agora Center, University of Jyväskylä, 29 April 2003.
OWL Representing Information Using the Web Ontology Language.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Working with Ontologies Introduction to DOGMA and related research.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Software Documentation
Service-enabling in Financial Domain
Geospatial and Problem Specific Semantics Danielle Forsyth, CEO and Co-Founder Thetus Corporation 20 June, 2006.
Amit Sheth, CTO, Semagix Inc
Presentation transcript:

Semantic Web in Action Ontology-driven information search, integration and analysis NASA Virtual Iron Bird Workshop, NASA Ames, March 31, 2004 Amit Sheth SemagixSemagix, Inc. and LSDIS Lab, University of GeorgiaLSDIS Lab

Some actual caveats that have been expressed by researchers  “As a constituent technology, ontology work of this sort is defensible. As the basis for programmatic research and implementation, it is a speculative and immature technology of uncertain promise.”  “Users will be able to use programs that can understand semantics of the data to help them answer complex questions … This sort of hyperbole is characteristic of much of the genre of semantic web conjectures, papers, and proposals thus far. It is reminiscent of the AI hype of a decade ago and practical systems based on these ideas are no more in evidence now than they were then.”  “Such research is fashionable at the moment, due in part to support from defense agencies, in part because the Web offers the first distributed environment that makes even the dream seem tractable.”  “It (proposed research in Semantic Web) pre-supposes the availability of semantic information extracted from the base documents -an unsolved problem of many years, …”  “Google has shown that huge improvements in search technology can be made without understanding semantics. Perhaps after a certain point, semantics are needed for further improvements, but a better argument is needed.” Oblivious to recent progress? Lack of pragmatic approach? Narrow vision of semantics and approaches to semantics?

Paradigm shift over time: Syntax -> Semantics Increasing sophistication in applying semantics  Relevant Information (Semantic Search & Browsing)  Semantic Information Interoperability and Integration  Semantic Correlation/Association, Analysis, Early Warning

Empirical observations based on real-world efforts (presented in the Data Engineering paper)  Applications validate the importance of ontology in the current semantic approaches.  Ontology population is critical.  Two of the most fundamental “semantic” techniques are named entity, and semantic ambiguity resolution.  Semi-formal ontologies that may be based on limited expressive power are most practical and useful. Formal or semi-formal ontologies represented in very expressive languages (compared to moderately expressive ones) have in practice, yielded little value in real-world applications.

Empirical observations based on real-world efforts (presented in the Data Engineering paper) … continued  Large scale (also high quality) metadata extraction and semantic annotation is possible.  Support for heterogeneous data is key – it is too hard to deploy separate products within a single enterprise to deal with structured and unstructured data/content management.  Semantic query processing with the ability to query both ontology and metadata to retrieve heterogeneous content is highly valuable.  A vast majority of the Semantic (Web) applications that have been developed or envisioned rely on three crucial capabilities namely ontology creation, semantic annotation and querying/inferencing.

Ontology at the heart of the Semantic Web Ontology provides underpinning for semantic techniques in information systems.  A model/representation of the real world (relevant set of interconnected concepts, entities, attributes, relationships, domain vocabulary and factual knowledge).  Basis of capturing agreement, and of applying knowledge  Enabler for improved information systems functionalities and the Semantic Web Ontology = Schema (Description) + Knowledge Base (Description Base) i.e, both T-nodes and A-nodes

Gen. Purpose, Broad Based Scope of Agreement Task/ App Domain Industry Common Sense Degree of Agreement Informal Semi-Formal Formal Agreement About Data/ Info. Function Execution Qos Broad Scope of Semantic (Web) Technology Other dimensions: how agreements are reached, … Current Semantic Web Focus Semantic Web Processes Lots of Useful Semantic Technology (interoperability, Integration) Cf: Guarino, Gruber

Types of Ontologies (or things close to ontology)  Upper ontologies: modeling of time, space, process, etc  Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), WordNet  Domain-specific or Industry specific ontologies  News: politics, sports, business, entertainment  Financial Market  Terrorism  (GO (a nomenclature), UMLS inspired ontology, …)  Application Specific and Task specific ontologies  Anti-money laundering  Equity Research

Ontology-driven Information Systems are becoming reality Software and practical tools to support key capabilities and requirements for such a system are now available:  Ontology creation and maintenance  Knowledge-based (and other techniques) supporting Automatic Classification  Ontology-driven Semantic Metadata Extraction/Annotation  Semantic applications utilizing semantic metadata and ontology Achieved in the context of successful technology transfer from academic research (LSDIS lab, UGA’s SCORE technology) into commercial product (Semagix’s Freedom)

Practical Experiences on Ontology Management today  What types of ontologies are needed and developed for semantic applications today?  Is there a typical ontology?  How are such ontologies built?  Who builds them? How long it takes? How are ontologies maintained?  People (expertise), time, money  How large ontologies become (scalability)?  How are ontologies used and what are computational issues?

Semagix Freedom Architecture (a platform for building ontology-driven information system) © Semagix, Inc.

Practical Ontology Development Observation by Semagix  Ontologies Semagix has designed:  Few classes to many tens of classes and relationships (types); very small number of designers/knowledge experts; descriptional component (schema) designed with GUI  Hundreds of thousands to several million entities and relationships (instances/assertions/description base)  Few to tens of knowledge sources; populated mostly automatically by knowledge extractors  Primary scientific challenges faced: entity ambiguity resolution and data cleanup  Total effort: few person weeks

SWETO Semantic Web Technology Evaluation Ontology “semantic test-bed” “ontology with large instances dataset” Public (non-commercial) use ontology built by the LSDIS lab in a NSF funded project using commercial product Semagix Freedom

Creating SWETO  Data Sources Selection  semi-structured format  with interconnected instances  with entities having rich metadata  public and open sources preferred

Big Picture

Statistics Subset of classes in the ontology # Instances Cities, countries, and states 2,902 Airports 1,515 Companies, and banks 30,948 Terrorist attacks, and organizations 1,511 Persons and researchers 307,417 Scientific publications 463,270 Journals, conferences, and books 4,256 TOTAL (as of January 2004) 811,819

Statistics … relationships (subset) # Explicit relations located in30,809 responsible for (event)1,425 Listed author in1,045,719 (paper) published in467,367

Browsing

Entity Disambiguation Disambiguation type# Times used Automatic (Freedom)248,151 Manual210 Unresolved (Removed)591

SWETO on the Web  Ontology (schema) in OWL  Visualization of the ontology  Quality, Size  Sources  API  Access via Web Service (to be released) Check it out! s/Sweto/

WWW, Enterprise Repositories METADATA EXTRACTORS Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images... Create/extract as much (semantics) metadata automatically as possible, from:  Any format (HTML, XML, RDB, text, docs)  Many media  Push, pull  Proprietary, Deep Web, Open Source Metadata extraction from heterogeneous content/data

Automatic Semantic Annotation of Text: Entity and Relationship Extraction KB, statistical and linguistic techniques

Automatic Semantic Annotation Limited tagging (mostly syntactic) COMTEX Tagging Content ‘Enhancement’ Rich Semantic Metatagging Value-added Semagix Semantic Tagging Value-added relevant metatags added by Semagix to existing COMTEX tags: Private companies Type of company Industry affiliation Sector Exchange Company Execs Competitors © Semagix, Inc.

Solution: In-memory semantic querying (semantic querying in RAM) Complex queries involving Ontology and Metadata Incremental indexing Distributed indexing High performance: 10M queries/hr; less than 10ms for typical search queries 2 orders of magnitude faster than RDBMS for complex analytical queries Knowledge APIs provide a Java, JSP or an HTTP-based interface for querying the Ontology and Metadata Semantic Query Processing and Analytics

What can current semantic technology do? (sample application)  Semi-automated (mostly automated) annotation of resources of various, heterogeneous sources (unstructured %, semi-structured, structured data; media content)*  Creation of large knowledge bases (ontology population) from the trusted sources *  Semantic Search and &  Unified access to multiple sources (semantic integration) *, #  Inferenceing #  Relationship/knowledge discovery among the annotated resources and entities; analytics* %  Both implicit^ and explicit* relationships * Commercial: Commercial: Taalee Semantic Search (Sheth) # Commercial: Network Inference; %: Near-commercial: IBM/SemTAP & IBM: McCool, Guha, ^ LSDIS-UGA Research

BLENDED BROWSING & QUERYING INTERFACE ATTRIBUTE & KEYWORD QUERYING uniform view of worldwide distributed assets of similar type SEMANTIC BROWSING Targeted e-shopping/e-commerce assets access VideoAnywhere and Taalee Semantic Search Engine

Focused relevant content organized by topic (semantic categorization) Automatic Content Aggregation from multiple content providers and feeds Related relevant content not explicitly asked for (semantic associations) Competitive research inferred automatically Automatic 3 rd party content integration Equity Research Dashboard with Blended Semantic Querying and Browsing

Blended Semantic Browsing and Querying (Intelligence Analyst Workbench)

Application to semantic analysis/intelligence  Documentary content and factual evidence are integrated semantically via semantic metadata Intelligence sub-domain ontology Group Alias Person Country Bank Account in Has alias Has Involved in Occurred at Works for/ leads Location Time Add Event Occurred at Originated in Is funded by/works with Watch-list Appears on Watch-list Appears on Has position Role Classification Metadata: Cocaine seizure investigation Semantic Metadata extracted from the article: Person is “Giulio Tremonti” Position of “Giulio Tremonti” is “Economics Minister” “Guilio Tremonti” appears on Watchlist “PEP” Group is Political party “Integrali” “Integrali” is the “Italian Government” “Italian Government” is based in “Rome” Corroborating Evidence Evidence © Semagix, Inc.

 Mechanisms for querying about and retrieving complex relationships between entities. A B C 1. A is related to B by x.y.z x y z z’ y’ 2. A is related to C by i. x.y’.z’ u v ii. u.v (undirected path)  3. A is “related similarly” to B as it is to C (y’  y and z’  z  x.y.z  x.y’.z’) So are B and C related? ? Semantic Associations: Beyond simple relationships

Context: Why, What, How?  Context => Relevance; Reduction in computation space  Context captures the users’ interest to provide the user with the relevant knowledge within numerous relationships between the entities  By defining regions (or sub-graphs) of the ontology we are capturing the areas of interest of the user

Context Weight - Example Region 1 : Financial Domain, weight=0.50 Region 2 : Terrorist Domain, weight=0.75 e7:Terrorist Organization e4:Terrorist Organization e8:Terrorist Attack e6:Financial Organization e2:Financial Organization e 1 :Person e 9 :Location e 5 :Person friend Of member Of located In e3:Organization supports has Account located In works For member Of involved In at location

Anti Money Laundering – Know Your Customer Risk Profiles are developed for individuals or companies. If the risk profile changes based on new information the individuals Risk Profile and Branch Aggregate Risk Profile is automatically updated R

View Risk Scores for a specific company or customer

Semantic Querying, Browsing, Integration to find potential antifungal drug targets Databases for different organisms Is this or similar gene in other organism? (most Antifungals are associated with Sterol mechanism ) Services: BLAST, Co-expression analysis, Phylogeny; If annotated, directly access DB, else use BLAST to normalize FGDB

Future: Geospatial Semantic Analytics: Thematic + Space + Time

Future: 3D Semantic Geo-visualization of Movements in Space and Time

Conclusion  Great progress from work in semantic information interoperability/integration of early 90s until now, re-energized by the vision of Semantic Web, related standards and technological advances  Technology beyond proof of concept  But lots of difficult research and engineering challenges ahead  More: (Technology) (Research)  Demos available