An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Knowledge Representation
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle Supported by the.
Semiautomatic Generation of Data-Extraction Ontologies Master’s Thesis Proposal Yihong Ding.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Ontology-Based Free-Form Query Processing for the Semantic Web by Mark Vickers Supported by:
FOCIH: Form-based Ontology Creation and Information Harvesting Cui Tao, David W. Embley, Stephen W. Liddle Brigham Young University Nov. 11, 2009 Supported.
Ontologies and the Semantic Web by Ian Horrocks presented by Thomas Packer 1.
PR-OWL: A Framework for Probabilistic Ontologies by Paulo C. G. COSTA, Kathryn B. LASKEY George Mason University presented by Thomas Packer 1PR-OWL.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Ontology-Based Free-Form Query Processing for the Semantic Web Thesis proposal by Mark Vickers.
Toward Making Online Biological Data Machine Understandable Cui Tao.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
UML CASE Tool. ABSTRACT Domain analysis enables identifying families of applications and capturing their terminology in order to assist and guide system.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Table Interpretation by Sibling Page Comparison Cui Tao & David W. Embley Data Extraction Group Department of Computer Science Brigham Young University.
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
1 DCS861A-2007 Emerging IT II Rinaldo Di Giorgio Andres Nieto Chris Nwosisi Richard Washington March 17, 2007.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
ONTOLOGY SUPPORT For the Semantic Web. THE BIG PICTURE  Diagram, page 9  html5  xml can be used as a syntactic model for RDF and DAML/OIL  RDF, RDF.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
Formalizing and Querying Heterogeneous Documents with Tables Krishnaprasad Thirunarayan and Trivikram Immaneni Department of Computer Science and Engineering.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Practical RDF Chapter 1. RDF: An Introduction
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Clément Troprès - Damien Coppéré1 Semantic Web Based on: -The semantic web -Ontologies Come of Age.
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s.
Dimitrios Skoutas Alkis Simitsis
An Aspect of the NSF CDI InitiativeNSF CDI: Cyber-Enabled Discovery and Innovation.
Exploitation of Dynamic Information Relations in the Service-Oriented AFRL Information Management Systems Andrzej Uszok, Larry Bunch, Jeffrey M. Bradshaw.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
RELATORS, ROLES AND DATA… … similarities and differences.
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Managing Semi-Structured Data. Is the web a database?
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
David W. Embley Brigham Young University Provo, Utah, USA.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Service-Oriented Computing: Semantics, Processes, Agents
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Semantic Database Builder
David W. Embley Brigham Young University Provo, Utah, USA
PREMIS Tools and Services
Information Networks: State of the Art
Presentation transcript:

An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation

From Data to Knowledge: Leveraging Ontology, Epistemology, and Logic Definitions A picture of the landscape of interest A workbench with toolkits (for “enhancing human cognition and generating new knowledge from [the] wealth of heterogeneous digital data”) Intellectual merit Broader impact

Definitions: “From Data to Knowledge” Progression of terms: symbols, data, conceptualized data, knowledge Symbols: characters and character-string instances Data: symbols as values in attribute-value pairs Conceptualized data: data in the framework of a conceptual model Knowledge: conceptualized data with a degree of certainty or community agreement From Data to Knowledge Recognize symbols Classify symbols with respect to meta-data attributes Embed attribute-value pairs into a conceptual framework of concepts, relationships, and constraints Present for community approval or integrate into community- approved conceptualizations

Examples: From Data to Knowledge Car Ads Symbols: $, 12k, ford, 4-Door Data: price(12k), mileage(12k), make(ford) Conceptualized data: Car(C 123 ) has Price($12,000) Car(C 123 ) has Mileage(12,000) Car(C 123 ) has Make(Ford) BodyType isa Feature Car(C 123 ) has Feature(Sedan) Knowledge Community agreement that the ontology is “correct” Community agreement that the facts in the ontology are “correct” Appointments Biology

Examples: From Data to Knowledge Appointments Biology

Examples: From Data to Knowledge Biology

Definitions: “Ontology,” “Epistemology,” and “Logic” Ontology Existence  answers “What exists?” For us, it answers: what concepts, relationships, and constraints exist and how they are interrelated. Epistemology The nature of knowledge  answers: “What is knowledge?”, “How is knowledge acquired?”, “What do people know?” For us, it answers: what is knowledge (conceptualized data with community agreement), how data becomes conceptualized and how conceptualized data becomes knowledge, and how someone’s conceptualized data corresponds with community-agreed-upon conceptualized data. Logic Principles of valid inference – answers: “What can be inferred?” For us, it answers: what can be inferred (in a formal sense) from conceptualized data.

Examples: “For-Us Answers” Ontology: What exists? In Car Ads: Car, Make, Model, Car has Make, Engine isa Feature In Appointments: Service Provider, Date, Appoint with Doctor In Biology: Protein Activity, Molecular Weight, Chromosome Location is aggregate of ChromosomeNumber and Start and End and Orientation Epistemology: What is knowledge? A fact-filled Biology ontology Chromosome Number (21) starts at Start (29,350,518) and ends at End (29,367,889) with Orientation(minus) How is it acquired? Creation of a fact-filled Biology ontology obtained from a reliable source Provenance: Was the source from which the Biology ontology was created reliable? What do people know? Does my knowledge that I have an appointment with Dr. Jones on Thursday align with the appointment ontology as established by the doctor’s office? I view the world with my car ads ontology  how does it align with the community standard ontology? Logic: Principles of valid inference Find red Nissans later than a 2002 with less than 100k miles In Appointments: can reason that a dermatologist is a medical service provider

Landscape of Data and Knowledge The creation of ontologies with community agreement Declaration of conceptual models (via ontology editor, forms, …) Recognition of meta-data in semi-structured text The conversion of heterogeneous digital data into knowledge under an ontology Ontology-based/layout-based information extraction/annotation Data integration within an ontological context The ability to match isolated ontologies with community ontologies (Semi-)automatic schema matching Traceability from symbols in a page of text to symbols as ontological components of knowledge The ability to reason over ontologies to retrieve information both given and implied Ontologies as first-order logic theories  potentially modal logics too Query (through both formal query languages and informal search) over populated ontologies for facts (both recorded and implied) Includes:

A Workbench for Knowledge Engineering Unified framework with a toolkit supporting: Ontology creation Data to knowledge conversion Knowledge solidification Community usability Usable by knowledge workers of varying degrees of sophistication

Ontology Creation Objective: Determine what concepts, relationships, and constraints exist and how they are interrelated Contributing Solutions (what we have done or have in progress) TANGO (creation, augmentation, adjustment) Forms to conceptual models (CT’s work) Table interpretation through forms to conceptual models (CT’s work) Open Problems (what we need, and believe we can do) Reverse engineer XML documents to an XML schema and then to C- XML (built on RA’s work) Extract a specific ontology from a more general ontology (like YD’s MS work) Merge ontologies (built on ZL’s work + LX’s work) Convert regular patterns in documents to conceptual models Named regular expressions over patterns (based on 598R work) Generation of layout patterns  converted to named patterns (based on YD’s work) … more ??

Data to Knowledge Conversion Objective: Find ways to capture facts ontologically. Contributing solutions Ontology-based information extraction Semantic annotation (YD’s work) Synergistic ontology-based/layout-based extractors (YD’s work) Data frames as data-to-knowledge converters Open problems User-directed annotation (like YD’s ASpaces work) User-directed conversion tools Named regular-expression extractors wrt RDF, Named Graphs, OWL ontologies, OSM ontologies (like 598R work) Generation of named regular-expression extractors from marked source documents (598R++ work) Storage structures? … more ?? A Semantic-Web page consists of the human-readable page (ordinary HTML, XML, …) one or more annotation attachments a reference to the ontology used for annotation RDF triples of extracted information pointers into the original source for every item highlighting possibilities for extracted data hover possibilities to connect to the ontology directly query to annotation attachment SPARQL SerFR

Knowledge Solidification Objective: Obtain community agreement for fact-filled ontologies. Contributing solutions Provide for recording provenance for individual facts (HC’s work for genealogical data) TANGO: assume published tables have community agreement and therefore fact-filled ontologies grown from tables have community agreement. Generally, assume published semi-structured data has community agreement and therefore ontologies and facts extracted from this semi-structured data has community agreement (CT’s work) Open problems How do we solidify knowledge captured only as conceptualized data (i.e., data extracted with respect to somebody’s “homegrown” ontology)? (Do we need to worry about this?) Can we link identical facts in different sites? (begun with HC’s work) Can we (should we) find ways to attach provenance to the ontology itself (not just to the facts ) Tool for community development of ontologies … more ??

Community Usability Objective: Provide (easy) access to knowledge  both ontological knowledge as well as facts. Contributing solutions Ordinary query processing including servicing requests via free-form queries and service requests (MA’s work) Information harvesting (CT’s work) Form/Table query processing (built on CT’s work and RPI’s query-by-table) Information linkage (HC’s work, SI’s work) Open problems Agents (e.g., in Aspaces  YD’s work) Learning and self-adjustment of individual knowledge (How does my knowledge align with community knowledge?), for the sake of Gaining encyclopedic knowledge Discovering gaps in knowledge Discovering potential adjustments and augmentations to community knowledge and solidifying community knowledge Seeing knowledge objects from a different point of view Orchestrating ontology-based services (MA’s future work) Practicalities? … more ?? Ease of Use Free-form queries (+ linguistics) Form-based queries (graphical?) Scalability Semantic indexes Caching (on the scale of Google) System Development Demos Open source tools How do we sell the idea?

Intellectual Merit Provides an answer to the question about how to turn syntactic symbols into semantic knowledge Shows how to create a web of data Shows how to establish a workbench with toolkits to convert heterogeneous digital data into knowledge under the auspices of an ontology Explores the synergistic interplay among ontology, epistemology, and logic for the advancement of knowledge New ways to think about What knowledge is How knowledge is acquired What individuals know Community knowledge Query and reasoning over fact-filled ontologies Achievable intellectual objectives of this research:

Broader Impact Harvests and make available facts from the wealth of available heterogeneous digital data Harnesses and manage community knowledge with the objective of enhancing human cognition Makes facts on the web (rather than pages) easily searchable by the general public Makes fact creation and maintenance easily attainable by fact providers Facilitates community agreement of ontologically specified knowledge Provides a practical set of tools for knowledge management Involve students, researchers, and knowledge workers from various disciplines in a community-wide effort to convert data into knowlege Worthwhile implications of this research: