Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Similar presentations


Presentation on theme: "An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation."— Presentation transcript:

1 An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation

2 From Data to Knowledge: Leveraging Ontology, Epistemology, and Logic Definitions A picture of the landscape of interest A workbench with toolkits (for “enhancing human cognition and generating new knowledge from [the] wealth of heterogeneous digital data”) Intellectual merit Broader impact

3 Definitions: “From Data to Knowledge” Progression of terms: symbols, data, conceptualized data, knowledge Symbols: characters and character-string instances Data: symbols as values in attribute-value pairs Conceptualized data: data in the framework of a conceptual model Knowledge: conceptualized data with a degree of certainty or community agreement From Data to Knowledge Recognize symbols Classify symbols with respect to meta-data attributes Embed attribute-value pairs into a conceptual framework of concepts, relationships, and constraints Present for community approval or integrate into community- approved conceptualizations

4 Examples: From Data to Knowledge Car Ads Symbols: $, 12k, ford, 4-Door Data: price(12k), mileage(12k), make(ford) Conceptualized data: Car(C 123 ) has Price($12,000) Car(C 123 ) has Mileage(12,000) Car(C 123 ) has Make(Ford) BodyType isa Feature Car(C 123 ) has Feature(Sedan) Knowledge Community agreement that the ontology is “correct” Community agreement that the facts in the ontology are “correct” Appointments Biology

5 Examples: From Data to Knowledge Appointments Biology

6 Examples: From Data to Knowledge Biology

7 Definitions: “Ontology,” “Epistemology,” and “Logic” Ontology Existence  answers “What exists?” For us, it answers: what concepts, relationships, and constraints exist and how they are interrelated. Epistemology The nature of knowledge  answers: “What is knowledge?”, “How is knowledge acquired?”, “What do people know?” For us, it answers: what is knowledge (conceptualized data with community agreement), how data becomes conceptualized and how conceptualized data becomes knowledge, and how someone’s conceptualized data corresponds with community-agreed-upon conceptualized data. Logic Principles of valid inference – answers: “What can be inferred?” For us, it answers: what can be inferred (in a formal sense) from conceptualized data.

8 Examples: “For-Us Answers” Ontology: What exists? In Car Ads: Car, Make, Model, Car has Make, Engine isa Feature In Appointments: Service Provider, Date, Appoint with Doctor In Biology: Protein Activity, Molecular Weight, Chromosome Location is aggregate of ChromosomeNumber and Start and End and Orientation Epistemology: What is knowledge? A fact-filled Biology ontology Chromosome Number (21) starts at Start (29,350,518) and ends at End (29,367,889) with Orientation(minus) How is it acquired? Creation of a fact-filled Biology ontology obtained from a reliable source Provenance: Was the source from which the Biology ontology was created reliable? What do people know? Does my knowledge that I have an appointment with Dr. Jones on Thursday align with the appointment ontology as established by the doctor’s office? I view the world with my car ads ontology  how does it align with the community standard ontology? Logic: Principles of valid inference Find red Nissans later than a 2002 with less than 100k miles In Appointments: can reason that a dermatologist is a medical service provider

9 Landscape of Data and Knowledge The creation of ontologies with community agreement Declaration of conceptual models (via ontology editor, forms, …) Recognition of meta-data in semi-structured text The conversion of heterogeneous digital data into knowledge under an ontology Ontology-based/layout-based information extraction/annotation Data integration within an ontological context The ability to match isolated ontologies with community ontologies (Semi-)automatic schema matching Traceability from symbols in a page of text to symbols as ontological components of knowledge The ability to reason over ontologies to retrieve information both given and implied Ontologies as first-order logic theories  potentially modal logics too Query (through both formal query languages and informal search) over populated ontologies for facts (both recorded and implied) Includes:

10 A Workbench for Knowledge Engineering Unified framework with a toolkit supporting: Ontology creation Data to knowledge conversion Knowledge solidification Community usability Usable by knowledge workers of varying degrees of sophistication

11 Ontology Creation Objective: Determine what concepts, relationships, and constraints exist and how they are interrelated Contributing Solutions (what we have done or have in progress) TANGO (creation, augmentation, adjustment) Forms to conceptual models (CT’s work) Table interpretation through forms to conceptual models (CT’s work) Open Problems (what we need, and believe we can do) Reverse engineer XML documents to an XML schema and then to C- XML (built on RA’s work) Extract a specific ontology from a more general ontology (like YD’s MS work) Merge ontologies (built on ZL’s work + LX’s work) Convert regular patterns in documents to conceptual models Named regular expressions over patterns (based on 598R work) Generation of layout patterns  converted to named patterns (based on YD’s work) … more ??

12 Data to Knowledge Conversion Objective: Find ways to capture facts ontologically. Contributing solutions Ontology-based information extraction Semantic annotation (YD’s work) Synergistic ontology-based/layout-based extractors (YD’s work) Data frames as data-to-knowledge converters Open problems User-directed annotation (like YD’s ASpaces work) User-directed conversion tools Named regular-expression extractors wrt RDF, Named Graphs, OWL ontologies, OSM ontologies (like 598R work) Generation of named regular-expression extractors from marked source documents (598R++ work) Storage structures? … more ?? A Semantic-Web page consists of the human-readable page (ordinary HTML, XML, …) one or more annotation attachments a reference to the ontology used for annotation RDF triples of extracted information pointers into the original source for every item highlighting possibilities for extracted data hover possibilities to connect to the ontology directly query to annotation attachment SPARQL SerFR

13 Knowledge Solidification Objective: Obtain community agreement for fact-filled ontologies. Contributing solutions Provide for recording provenance for individual facts (HC’s work for genealogical data) TANGO: assume published tables have community agreement and therefore fact-filled ontologies grown from tables have community agreement. Generally, assume published semi-structured data has community agreement and therefore ontologies and facts extracted from this semi-structured data has community agreement (CT’s work) Open problems How do we solidify knowledge captured only as conceptualized data (i.e., data extracted with respect to somebody’s “homegrown” ontology)? (Do we need to worry about this?) Can we link identical facts in different sites? (begun with HC’s work) Can we (should we) find ways to attach provenance to the ontology itself (not just to the facts ) Tool for community development of ontologies … more ??

14 Community Usability Objective: Provide (easy) access to knowledge  both ontological knowledge as well as facts. Contributing solutions Ordinary query processing including servicing requests via free-form queries and service requests (MA’s work) Information harvesting (CT’s work) Form/Table query processing (built on CT’s work and RPI’s query-by-table) Information linkage (HC’s work, SI’s work) Open problems Agents (e.g., in Aspaces  YD’s work) Learning and self-adjustment of individual knowledge (How does my knowledge align with community knowledge?), for the sake of Gaining encyclopedic knowledge Discovering gaps in knowledge Discovering potential adjustments and augmentations to community knowledge and solidifying community knowledge Seeing knowledge objects from a different point of view Orchestrating ontology-based services (MA’s future work) Practicalities? … more ?? Ease of Use Free-form queries (+ linguistics) Form-based queries (graphical?) Scalability Semantic indexes Caching (on the scale of Google) System Development Demos Open source tools How do we sell the idea?

15 Intellectual Merit Provides an answer to the question about how to turn syntactic symbols into semantic knowledge Shows how to create a web of data Shows how to establish a workbench with toolkits to convert heterogeneous digital data into knowledge under the auspices of an ontology Explores the synergistic interplay among ontology, epistemology, and logic for the advancement of knowledge New ways to think about What knowledge is How knowledge is acquired What individuals know Community knowledge Query and reasoning over fact-filled ontologies Achievable intellectual objectives of this research:

16 Broader Impact Harvests and make available facts from the wealth of available heterogeneous digital data Harnesses and manage community knowledge with the objective of enhancing human cognition Makes facts on the web (rather than pages) easily searchable by the general public Makes fact creation and maintenance easily attainable by fact providers Facilitates community agreement of ontologically specified knowledge Provides a practical set of tools for knowledge management Involve students, researchers, and knowledge workers from various disciplines in a community-wide effort to convert data into knowlege Worthwhile implications of this research:


Download ppt "An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation."

Similar presentations


Ads by Google