Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontologies & Machine Learning

Similar presentations


Presentation on theme: "Ontologies & Machine Learning"— Presentation transcript:

1 Ontologies & Machine Learning
Marko Grobelnik Blaz Fortuna Jozef Stefan Institute, Slovenia

2 Aim of the talk The main goal of this talk is to show knowledge modeling in relation to machine learning in two ways: …top-down modeling with “deep ontologies” on the example of Cyc system ( …bottom-up modeling of “light ontologies” on the example of OntoGen ontology learning system (

3 What areas of research are we trying to target?
Text-Mining, Link-Analysis and other analytic techniques dealing mainly with extracting and aggregating the information from raw data …they maximize the quality of extracted information Semantic Web dealing mainly with the integration and representation of the given data …it maximizes reusability of the given information Both areas are very much complementary and necessary for operational information engineering

4 Ontologies

5 What is an Ontology? Ontologies are main formal objects within Semantic Web and recently also within Text Analytics Ontologies have origin in philosophy, but within computer science they represent a data model that represents a domain and is used to reason about the objects in that domain and the relations between them …their main aim is to describe and represent an area of knowledge in a formal way

6 What is an Ontology? machine processable concepts, properties,
Formal, concepts, properties, relations, functions explicit specification, Consensual knowledge of a shared Abstract model of some domain conceptualisation. Frank.van Harmelen 2003:

7 Which elements represent an ontology?
An ontology typically consists of the following elements: Instances – the basic or “ground level” objects Classes – sets, collections, or types of objects Attributes – properties, features, characteristics, or parameters that objects can have and share Relations – ways that objects can be related to one another Analogies between ontologies and relational databases: Instances correspond to records Classes correspond to tables Attributes correspond to record fields Relations correspond to relations between the tables

8 Levels Semantic-Web formalisms
The W3C “Semantic Web Layer Cake” shows representation levels and related technologies Infrastructure Higher level of representation and reasoning (RIF) (OWL) Different Levels of Semantic Abstraction Addressing the information Character Level Encoding

9 Top-down modeling of knowledge Cyc system

10 Cyc …a little bit of historical context
Older AI-ers know about Cyc: …one of the boldest attempts in AI history to encode common sense knowledge in one KB The project started in 1984 at Stanford as US response to Japan’s project on “5th Generation Computer Systems” In 1994 the company Cycorp was established (in Austin, TX) In 2005 Cyc KB gets opened and available for research OpenCyc ( ResearchCyc ( In 2006 Cyc-Europe was established (in Ljubljana, Slovenia) Till 2006 ~$80M was spent for construction of the KB

11 The Cyc Ontology General Knowledge about Various Domains
Represented in: First Order Logic Higher Order Logic Context Logic Micro-theories Cyc contains: 15,000 Predicates 300,000 Concepts 3,200,000 Assertions Thing Intangible Individual Temporal Spatial Partially Tangible Paths Sets Relations Logic Math Time Agents Space Physical Objects Human Beings Organ- ization Activities Living Things Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Weather Agent Organizations Organizational Actions Plans Types of Human Nations Governments Geo-Politics Business, Military Law Human Artifacts Social Relations, Culture Anatomy & Physiology Emotion Perception Belief Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Software Literature Works of Art Language Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment General Knowledge about Various Domains Specific data, facts, and observations Cycorp © 2006 11

12 …part of Cyc Ontology on Human Beings

13 Structure of Cyc Ontology
Knowledge Base Layers Upper Ontology: Abstract Concepts Upper Ontology Core Theories: Space, Time, Causality, … Core Theories Domain-Specific Theories Domain-Specific Theories Facts (Database) The Knowledge Base (KB) itself comprises a massive taxonomy of concepts and specifically-defined relationships that describe how those concepts are related. This figure represents the context of the knowledge arranged by degrees of generality, with a small layer of abstract generalizations at the top and a large layer of real-world facts at the bottom. Facts: Instances

14 Structure of Cyc Ontology
Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Knowledge Base Layers Upper Ontology Core Theories Domain-Specific Theories Facts (Database) The Upper Ontology doesn’t say much about the world at all. It represents very general relations between very general concepts. For example, it contains the assertions to the effect that every event is a temporal thing, every temporal thing is an individual, and every individual is a thing. “Thing” is Cyc’s most general concept. Everything whatsoever is an instance of “thing.”

15 Structure of Cyc Ontology
Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Knowledge Base Layers Core Theories: Space, Time, Causality, … Upper Ontology For all events a and b, a causes b implies a precedes b Core Theories Domain-Specific Theories Facts (Database) The KB contains several core theories that represent general facts about space, time, and causality. These are the theories that are essential to almost all common-sense reasoning.

16 Structure of Cyc Ontology
Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Knowledge Base Layers Core Theories: Space, Time, Causality, … Upper Ontology For all events a and b, a causes b implies a precedes b Core Theories Domain-Specific Theories For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Domain-Specific Theories Facts (Database) Domain-Specific Theories are more specific than core theories. These theories apply to special areas of interest like military movement, the propagation of diseases, finance, chemistry, etc. These are the theories that make Cyc particularly useful, but are not necessary for common sense reasoning.

17 Structure of Cyc Ontology
Upper Ontology: Abstract Concepts EVENT  TEMPORAL-THING  INDIVIDUAL  THING Facts (Database) Upper Ontology Core Theories Domain-Specific Theories Knowledge Base Layers Core Theories: Space, Time, Causality, … For all events a and b, a causes b implies a precedes b Domain-Specific Theories For any mammal m and any anthrax bacteria a, m’s being exposed to a causes m to be infected by a. Facts: Instances The final layer contains what is sometimes called “ground-level facts.” These are statements about particular individuals in the world. For example, “John has anthrax” is a specific statement about one person. Generalizations would not go here, they would go in a layer above. Anything you can imagine as a headline in a newspaper would probably go here. John is a person infected by anthrax.

18 Cyc KB Extended w/Domain Knowledge
Thing Intangible Individual Temporal Spatial Partially Tangible Paths Sets Relations Logic Math General Knowledge about Terrorism: Terrorist groups are capable of directing assassinations: (implies (isa ?GROUP TerroristGroup) (behaviorCapable ?GROUP AssassinatingSomeone directingAgent)) If a terrorist group considers an agent an enemy, that agent is vulnerable to an attack by that group: (and (considersAsEnemy ?GROUP ?TARGET)) (vulnerableTo ?GROUP ?TARGET TerroristAttack)) Time Agents Space Physical Objects Human Beings Organ- ization Activities Living Things Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Weather Agent Organizations Organizational Actions Plans Types of Human Nations Governments Geo-Politics Business, Military Law Human Artifacts Social Relations, Culture Anatomy & Physiology Emotion Perception Belief Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Software Literature Works of Art Language Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment General Knowledge about Terrorism Specific data, facts, and observations about terrorist groups and activities Cycorp © 2006 18

19 Cyc KB Extended w/Domain Knowledge
Thing Intangible Individual Temporal Spatial Partially Tangible Paths Sets Relations Logic Math Time Agents Space Physical Objects Human Beings Organ- ization Activities Living Things Specific Facts about Al Qaida: (basedInRegion AlQaida Afghanistan) Al-Qaida is based in Afghanistan. (hasBeliefSystems AlQaida IslamicFundamentalistBeliefs) Al-Qaida has Islamic fundamentalist beliefs. (hasLeaders AlQaida OsamaBinLaden) Al-Qaida is led by Osama bin Laden. (affiliatedWith AlQaida AlQudsMosqueOrganization) Al-Qaida is affiliated with the Al Quds Mosque. (affiliatedWith AlQaida SudaneseIntelligenceService) Al-Qaida is affiliated with the Sudanese Intell Service (sponsors AlQaida HarakatUlAnsar) Al-Qaida sponsors Harakat ul-Ansar. (sponsors AlQaida LaskarJihad) Al-Qaida sponsors Laskar Jihad. … (performedBy EmbassyBombingInNairobi AlQaida) Al-Qaida bombed the Embassy in Nairobi. (performedBy EmbassyBombingInTanzania AlQaida) Al-Qaida bombed the Embassy in Tanzania. Artifacts Movement State Change Dynamics Materials Parts Statics Physical Agents Borders Geometry Events Scripts Spatial Paths Actors Actions Plans Goals Social Behavior Life Forms Animals Plants Ecology Natural Geography Earth & Solar System Political Weather Agent Organizations Organizational Actions Plans Types of Human Nations Governments Geo-Politics Business, Military Law Human Artifacts Social Relations, Culture Anatomy & Physiology Emotion Perception Belief Behavior & Actions Products Devices Conceptual Works Vehicles Buildings Weapons Mechanical & Electrical Software Literature Works of Art Language Business & Commerce Politics Warfare Professions Occupations Purchasing Shopping Travel Communication Transportation & Logistics Social Activities Everyday Living Sports Recreation Entertainment General Knowledge about Terrorism Specific data, facts, and observations about terrorist groups and activities Cycorp © 2006 19

20 An example of Psychoanalyst’s Cyc taxonomic context
#$Psychoanalyst (lexical representation: “psychoanalyst”, “psychoanalysts”) specialization-of #$MedicalCareProfessional | specialization-of #$HealthProfessional | specialization-of #$Professional-Adult | specialization-of #$Professional specialization-of #$Psychologist | specialization-of #$Scientist | specialization-of #$Researcher | | specialization-of #$PersonWithOccupation | | | specialization-of #$Person | | | | specialization-of #$HomoSapiens | | | | | instance-of #$BiologicalSpecies | | | | | | specialization-of #$BiologicalTaxon | | | | | instance-of #$SomeSampleKindsOfMammal-Biology-Topic

21 Example Vocabulary: Senses of ‘In’ relation (1/3)
Can the inner object leave by passing between members of the outer group? Yes -- Try #$in-Among Cycorp © 2006 21

22 Example Vocabulary: Senses of ‘In’ relation (2/3)
Does part of the inner object stick out of the container? None of it. -- Try #$in-ContCompletely Yes -- Try #$in-ContPartially No -- Try #$in-ContClosed If the container were turned around could the contained object fall out? Yes -- Try #$in-ContOpen Cycorp © 2006 22

23 Example Vocabulary: Senses of ‘In’ relation (3/3)
Is it attached to the inside of the outer object? Yes -- Try #$connectedToInside Can it be removed by pulling, if enough force is used, without damaging either object? No -- Try #$in-Snugly or #$screwedIn Does the inner object stick into the outer object? Yes – Try #$sticksInto Cycorp © 2006 23

24 Cyc’s front-end: “Cyc Analytic Environment” – querying (1/2)
Text query Query (semi) automatically translated in the First Order Logic Answers to the query

25 Cyc’s front-end: “Cyc Analytic Environment” – justification (2/2)
Query & Answer Justification Sources for Reasoning and Justification

26 Document Tagging

27 Document Tagging Document Tagging

28 Annotating the document with CycKB

29 Probabilistic Concept Tagging
“The plants that produced the cranes that NASA deployed in space in the 1990s are in Canada.” The plants (#$FactoryBuildingComplex #$Plant ) that produced (#$Production-Generic ) the cranes (#$Crane-MotorizedDevice #$Crane-Bird ) that NASA (#$NASA) deployed (#$DeployingMaterial) in space (#$OuterSpace 0.51 #$SpaceInAHOC #$ReservedSpaceRegion #$Area 0) in the 1990s ((#$DecadeFn 199)) are in Canada (#$Canada 1).

30 Knowledge Template Induction

31 Train Xp Wd MVp | | A | Jp Mp | | | | G--+--G-+--Ss--+---Os---+--Mp Dmcn N Sa Js-+ | | | | | | | | | | | | | | | | LEFT Royal.a Dutch Shell Plc halted.v output.n of 455,000 barrels.n a day.p in Nigeria . (#$and (#$isa (#$TheFn #$DecreaseEvent) (#$DecreaseInValueReturnedByFn (#$ExportRateOfByFn #$Petroleum-CrudeOil) #$Nigeria))        (#$doneBy (#$TheFn #$DecreaseEvent) #$RoyalDutchShell)        (#$quantityChangeAmount (#$TheFn #$DecreaseEvent) (#$BarrelsPerDay ))) Template Xp Wd MVp | | | Jp | | | Ss--+---Os---+--Mp Js-+ | | | | | | | | | | LEFT [Agent] halted.v output.n of [Quantity] in [Locn] . (#$and (#$isa (#$TheFn #$DecreaseEvent) (#$DecreaseInValueReturnedByFn (#$ExportRateOfByFn #$Petroleum-CrudeOil) [Locn]))        (#$doneBy (#$TheFn #$DecreaseEvent) [Agent])        (#$quantityChangeAmount (#$TheFn #$DecreaseEvent) [Quantity])) Example-Based Machine Translation (EBMT) is an approach to building translation rules between two natural languages by providing a learning system with pairs of sentences in the two languages. We can apply EBMT to the acquisition of IE rules by providing the system pairs in which the first element is a sentence in natural language, and the second element is an appropriate internal representation in CycL. Use Petróleos de Venezuela S.A. halted output of barrels a week in Maracaibo. (#$and (#$isa (#$TheFn #$DecreaseEvent) (#$DecreaseInValueReturnedByFn (#$ExportRateOfByFn #$Petroleum-CrudeOil) #$CityOfMaracaiboVenezuela))        (#$doneBy (#$TheFn #$DecreaseEvent) #$PetroleosdeVenezuelaSA        (#$quantityChangeAmount (#$TheFn #$DecreaseEvent) (#$BarrelsPerDay ))) 31

32 Learning Facts by Search

33 Learning Facts by Search
Query “What are symptoms of Whooping Cough?” (symptomOfAilment WhoopingCough ?SYMP ) NL Generation Partial English sentences “A symptom of whooping cough is ___” “Whooping cough can cause ___” “A symptom of Pertussis Bordetella is ___” “Symptoms (such as ____) of whooping cough”

34 Parsing Results Looking for something that matches the argument constraints on the predicate… “… symptoms of pertussis such as fever and a dry cough …” Parse back into existing CycL concepts (symptomOfAilment WhoopingCough Fever) (symptomOfAilment WhoopingCough Coughing-AilmentCondition)

35   KB Consistency Check Throw out provably wrong answers
Explicitly: perform one step of inference to throw out facts inconsistent with KB Implicitly: don’t even look at things that don’t match argument constraints Skip already known (provably right) knowledge

36 Initial Results Total Queries: 348 Web Searches: 4290 Initial: 817
verification: 3474 Sentences Found: 1014 Rejected Results: 954 inconsistent with KB: 4 already known to the KB: 384 rejected using Google: 566 Novel formulæ: 61 348 Queries 817 Searches (symptomOfAilment WhoopingCough Coughing-AilmentCondition) (symptomOfAilment WhoopingCough Blindness) (symptomOfAilment WhoopingCough Fever) 1016 Sentences Found 3474 Searches 566 Rejected 388 Sentences Rejected 61 Sentences Asserted

37 Microtheory (context) Suggestion

38 Automatic Ontology Placement
Cyc’s knowledge is contextualized into internally consistent Microtheories (MTs). New knowledge is inserted into that hierarchy manually by ontologists. An Mt Suggestor recommends appropriate placement of knowledge into the appropriate micro-theories (contexts)

39 MT Suggestor Approach Problem is similar to hierarchical text classification Much less data per instance Very rich (deep) structure Approaches: Generative Bayes model Multiclass SVM classification Inputs: Each assertion is broken into atomic terms Each unique term is given an index Each assertion is a list of term indices (as few as 3 for binary assertions, as many as 180 for complex rules) Training examples are indexed identically SVM Classification: outputs the index of the best Mt Bayes model: outputs probability of fit for each Mt

40 Precision Recall F1 Score
Results 89,000 Assertions, 64,000 distinct terms, 28 Mts 10 fold-cross validation Method Precision Recall F1 Score Bayes 0.85 0.95 0.9 Multiclass SVM 0.98 These are all micro numbers Precision Recall F1 Score

41 Induction of new rules with ILP

42 Learning Higher-Order Knowledge
Learning Rules with Inductive Logic Programming Integrating ALEPH ILP system into Cyc Verification (asking or experimenting) Asking a human directly Natural language processing of text Probabilistic analysis Maybe all mothers are female? All the mothers I know about are female… Cyc Ontology & Knowledge Base Mothers are female. So right now we have a preliminary version of rule induction working, and must improve it and automate using it Fill gaps New knowledge forces strengthening A lot of common sense knowledge still isn’t captured!

43 Performing Induction in Cyc
Integrate Cyc and Aleph FOL-ify CycL and export to Aleph Produce ILP learning bias from background knowledge Based on semantic content of predicate knowledge CycL-ify, review, and assert ILP-produced rules First-orderized Facts Facts & Background Perform Induction Background Knowledge Induced Rules Good Rules I have an idea. If I’m right, it would mean that… Status: I have 6 answers Because I think maybe all bacterial diseases that affect the lungs cause coughing. True Coughing is a symptom of whooping cough. Coughing is a symptom of Eastern Equine Encephalitis. Coughing is a symptom of bacterial anthrax. False Don’t Know Doesn’t make sense Score: 52 Hi, Cmat! Total score this session: 0 This Session: Last round: 0 Best round: 0 Best agrmnt: 0% Click Here to Play! How to Play High Scores Evaluate Results

44 Sample Rules Produced (implies (and
(cyclistPrimaryProject ?KE ?PROJECT) (projectTasks ?PROJECT ?TASK) (requestedEffortPercent ?TASK ?KE ?X)) (assignedEffortPercent ?TASK ?KE ?X)) (projectManagers ?PROJECT ?AGENT)) (projectParticipants ?PROJECT ?AGENT)) (primarySupervisor ?AGENT AGENT-1) (requestedEffortPercent ?TASK ?AGENT ?X) (projectManagers ?PROJECT ?AGENT-1) (projectTasks ?PROJECT ?TASK)) (assignedEffortPercent ?TASK ?AGENT ?X))

45 Sample Rules Produced If someone’s time has been requested for a task by that person’s primary project, the time will be assigned. People participate in the projects they manage. (One hopes!) People are assigned to tasks requested of them by projects managed by that person’s direct supervisor. These are only patterns, not always guaranteed to be true – but they’re useful and common-sensical.

46 Bottom-up modeling of knowledge OntoGen system

47 Underlying concepts Semi-Automatic Data-Driven
Text-mining methods provide suggestions and insights into the domain The user can interact with parameters of text-mining methods All the final decisions are taken by the user Data-Driven Most of the aid provided by the system is based on some underlying data provided by the system Instances are described by features extracted from the data (e.g. bag-of-words vectors)

48 Main Features Interactive user interface
User can interact in real-time with the integrated machine learning and text mining methods Concept discovery methods: Unsupervised k-means clustering Latent Semantic Indexing (LSI) Supervised Active learning Concept visualization Methods for helping at understanding the discovered concepts: Keyword extraction TFIDF and SVM-normal based keyword extraction Concept visualization LSI and multi-dimensional scaling based visualization Also available as a separate tool named Document Atlas:

49 Ontology management Ontology visualization Concept hierarchy
List of suggested sub-concepts Selected concept

50 Concept’s instance management
Concept management Selected concept Selected instance Concept’s details Keywords Concept’s instance management

51 Active Learning for concept learning
Query SVM New Concept Active Learning for concept learning SVM hyperplane distance based active learning algorithm First few labelled documents are bootstrapped from a query search Instances for final concept are selected using the final SVM model

52 Multiple views of the same data
Countries view Lloyd’s CEO questioned in recovery suit in U.S. Ronald Sandler, chief executive of Lloyd's of London, on Tuesday underwent a second day of court interrogation about … Topics view UK takeovers and mergers The following are additions and deletions to the takeovers and mergers list for the week beginning August 19, as provided by the Takeover … Multiple views of the same data Reuters news articles used in the upper example with two different sets of categories: topics or list of countries that appear in the news articles. Each set of categories offers a different view on the data. SVM based method detects importance of keywords for each view.

53 Concept’s instances visualization
Instances are visualized as points on 2D map. The distance between two instances on the map correspond to their similarity. Characteristic keywords are shown for all parts of the map. User can select groups of instances on the map to create sub-concepts.

54 Classification of selected document
New documents Selected document Classification of selected document Ontology population System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts. Users can finalize the classifications using an interactive user interface


Download ppt "Ontologies & Machine Learning"

Similar presentations


Ads by Google