Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Phone: (512) 342-4001 Fax: (512)

Similar presentations


Presentation on theme: "November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Phone: (512) 342-4001 Fax: (512)"— Presentation transcript:

1 November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Email: Lenat@cyc.com Phone: (512) 342-4001 Fax: (512) 342-4040 CYC: Lessons Learned in Large- Scale Ontological Engineering 2 July 2005

2 November 17, 2005 2 What Led to Cyc? 1. Programs need general world knowledge, and commonsense, to break the brittleness bottleneck NL understanding, speech understanding, robotics, learning, expert systems, search,… 2. We know enough to do this; it is more an engineering task than a scientific research task. 3. The time was right (1984).

3 November 17, 2005 3 ELIZA (DOCTOR) 1965 Joe Weizenbaum, MIT Carl Rogers-like reflection Patient: I swear a lot. How do you feel about the fact that you swear a lot? Count on patient-to-doctor respect/awe/… Patient: My dogs mother died recently. Tell me more about your mother. blah blah blah mother blah blah blah

4 November 17, 2005 4 40 years after ELIZA: ALICE JUDGE11[9:26:0]I WILL ASK YOU SOME SIMPLE QUESTIONS AND GIVE SIMPLE COMMANDS PROGRAM[9:26:01]Do you think your plan will succeed? JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR PROGRAM[9:26:16]WHAT COLOR IS a blue car JUDGE11[9:26:35]WHERE IS SUES NOSE WHEN SUE IS IN HER HOUSE PROGRAM[9:26:35]Where it belongs. Try searching the World Wide Web.

5 November 17, 2005 5

6 6 Is Natural Language a Good Repr.? + theres a lot of it Everyone understands it + index and search it, rapidly, using keywords Boolean combinations of keywords Synonyms, hyponyms, hypernyms,… of keywords - there are a lot of different languages - meanings vary (era, place, age group…) - often the analysts query requires finding and combining n pieces of data - can be inefficient Arithmetic Logic

7 November 17, 2005 7 Carol and Sam begat Sara and Fred. Fred and Jane begat Ethan, Elaine, and Edward. John and Sara begat Steven, Mary, and Seth. Ann and Andy begat Sue and Bob. But then Sara cleaved not to John and with Bob begat Joan. Is Edward an ancestor or descendant of Sue? Joan Steven Mary Seth Sara -- Carol -- Sam John Fred --Jane Ethan Elaine Edward Ann -- Andy Sue Bob --

8 November 17, 2005 8 Five friends get together to play 5 doubles matches, with a different group of 4 players each time. The sums of the ages of the players for the different matches are 124, 128, 130, 136 and 142 years. What is the age of the youngest player ? v+w+x+y = 124 v+w+x+z = 128 v+w+y+z = 130 v+x+y+z = 136 w+x+y+z = 142

9 November 17, 2005 9 Natural Language Understanding requires having lots of knowledge 1.The pen is in the box. The box is in the pen. 2. The police watched the demonstrators… …because they feared violence. …because they advocated violence. 3.Every American has a mother. Every American has a president.

10 November 17, 2005 10 Natural Language Understanding requires having lots of knowledge 4.Mary and Sue are sisters. Mary and Sue are mothers. 5. The White House announced today that... 6. John saw his brother skiing on TV. The fool…...didnt have a coat on! …didnt recognize him!

11 November 17, 2005 11 An example: an analysts query posed as part of HPKB (1996) that Cyc answered. Logically and Arithmetically Combining n Pieces of Info. )( Information from multiple sources Knowledge about the domain in general Commonsense knowledge about the real world

12 November 17, 2005 12

13 November 17, 2005 13

14 November 17, 2005 14

15 November 17, 2005 15

16 November 17, 2005 16

17 November 17, 2005 17 Ontology holds the key to doing this! BUT there are so many ways to cut corners and unwittingly fool oneself! Logically and Arithmetically Combining n Pieces of Info. )( Information from multiple sources Knowledge about the domain in general Commonsense knowledge about the real world The original dream of Arpanet, EDI, EDR, the Semantic Web,…

18 OFACDB 8 USGSNARCL FBI Most Wanted CATS CDE DB 4 Qusay Hussein Uday Hussein SuspN DB8 Prenom Qusai Hussein 30 Odai Hussein Surnomann Dec. 31, 1996 Sept. 9, 2003 YOB 1964 Non-ontology-based methods for DB inte- gration are quadratic Query: How different in age were Uday and Qusay Hussein?

19 you! HAL C YC #$QusayHusseinAl-Takriti #$UdaiHusseinAl-Takriti (age ?PERSON (YearsDuration ?AGE)) (birthDate ?PERSON ?BIRTH-DATE) RULES CONCEPTS DB4 YOB Qusay Hussein Uday Hussein 1964 DB8 Prenomann Qusai Hussein 30 Odai Hussein OFACDB 8 USGSNARCL FBI Most Wanted CATS CDE DB 4 Dec. 31, 1996 Sept. 9, 2003 SuspN Surnom 1966 32 Ontology-Based Methods of DB Integration Can Scale Linearly (…and, by the way, enables DB population/enrichment)

20 DB4 YOB Qusay Hussein Uday Hussein 1964 DB8 Prenomann Qusai Hussein 30 Odai Hussein OFACDB 8 USGSNARCL FBI Most Wanted CATS CDE DB 4 Dec. 31, 1996 Sept. 9, 2003 SuspN Surnom 1966 32 (…and, by the way, enables DB population/enrichment) A Solution that Scales Linearly

21 November 17, 2005 21 The answer is logically implied by data dispersed through several sources: USGS GNIS DB AMVA KB RAND R UN FAO DB DTRA CATS DB What major US cities are particularly vulnerable to an anthrax attack?

22 November 17, 2005 22 major US city ?C is a U.S. City with >1M population particularly vulnerable to an anthrax attack –the current ambient temperature at ?C is above freezing, and –?C has more than 100 people for each hospital bed, and –the number of anthrax host animals near ?C exceeds 100k What major US cities are particularly vulnerable to an anthrax attack? (> (NumberOfInhabitantsFn ?C) 10 6 ) Dont add #pullets and #chickens

23 November 17, 2005 23 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB state | name | type | county | state_fips | -------+-----------------------+-------+----------------+------------+ TX | Dallas | ppl | Dallas | 48 | MN | Hennepin County | civil | Hennepin | 27 | CA | Sacramento County | civil | Sacramento | 6 | AZ | Phoenix | ppl | Maricopa | 4 | primary_lat | primary_long| elevation | population | status | ------------+-------------+-----------+------------+------------------+ 32.78333 | -96.8 | 463 | 1022830 | BGN 1978 1959 45.01667 | -93.45 | 0 | 1032431 | 38.46667 | -121.31667 | 0 | 1041219 | 33.44833 | -112.07333 | 1072 | 1048949 | BGN 1931 1900 1897

24 November 17, 2005 24 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB So how do we explain to our system that: row 1 of that table is about the city of Dallas, TX the population field of that table contains the number of inhabitants of the city that that row is about here is exactly how to access tuples of that database that access will be fast, accurate, recent, complete

25 November 17, 2005 25 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB the population field of that table contains the number of inhabitants of the city that that row is about We provide the field encodings and decodings, some of which correspond to explicit fields like population, two-letter state codes, etc: (fieldDecoding Usgs-Gnis-LS ?x (TheFieldCalled population) (numberOfInhabitants (TheReferentOfTheRow Usgs-Gnis) ?x))

26 November 17, 2005 26 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB row 1 of that table is about the city of Dallas, TX We provide the field encodings and decodings, some of which correspond to explicit fields like population, and some correspond to entities whose existence is merely implied by the existence of that row in that table (in this case, the first row implies the existence of -- and describes some specifics of -- the geographic entity that is the real-world city of Dallas, Texas, which is represented in Cycs KB by the term #$CityOfDallasTexas) There is a logical field name for that entity, (TheReferentOfTheRow Usgs-Gnis), even though it is only talked about by the explicit fields.

27 November 17, 2005 27 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB how to access tuples of that database We provide all the information needed for a JDBC connection script: We assert, in the context (MappingMtFn Usgs-KS), all of these: (passwordForSKS Usgs-KS "geografy") (portNumberForSKS Usgs-KS 4032) (serverOfSKS Usgs-KS "sksi.cyc.com") (sqlProgramForSKS Usgs-KS PostgreSQL) (structuredKnowledgeSourceName Usgs-KS "usgs") (subProtocolForSKS Usgs-KS "postgresql") (userNameForSKS "sksi")

28 November 17, 2005 28 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc. We assert, in the context (MappingMtFn Usgs-KS): (schemaCompleteExtentKnownForValueTypeInArg Usgs-Gnis-LS USCity numberOfInhabitants 1)

29 November 17, 2005 29 USGS GNIS DB Cyc automatically gathers statistics like these, and uses them to order search: (resultSetCardinality Usgs-Gnis-PS (TheSet (PhysicalFieldFn Usgs-Gnis-PS "state")) TheEmptySet 60.0) (resultSetCardinality Usgs-Gnis-PS (TheSet (PhysicalFieldFn Usgs-Gnis-PS "primary_long") (PhysicalFieldFn Usgs-Gnis-PS "primary_lat") (PhysicalFieldFn Usgs-Gnis-PS "name")) (TheSet (PhysicalFieldFn Usgs-Gnis-PS "county") (PhysicalFieldFn Usgs-Gnis-PS "state")) 530.36)

30 November 17, 2005 30

31 November 17, 2005 31

32 November 17, 2005 32

33 November 17, 2005 33

34 November 17, 2005 34

35 November 17, 2005 35 Semantic Knowledge Source Integration (SKSI) summary Some of the knowledge needed will generally be in the Cyc KB already Some will reside in already-mapped sources: data bases, web pages, simulators, etc. For each needed new source, explain the meaning of its schema elements to Cyc –Write Cyc assertions to convey the meaning of each field, each polymorphism, each idiosyncratic entry code, plus meta- information: when this was created/updated, level of granularity, its sources, its degree of completeness, what it can do quickly, what it can do (slowly), how to access it, etc. Structured sources

36 November 17, 2005 36 What Led to Cyc? 1. Programs need general world knowledge, and commonsense, to break the brittleness bottleneck NL understanding, speech understanding, robotics, learning, expert systems, search,… 2. We know enough to do this; it is more an engineering task than a scientific research task. 3. The time was right (1984).

37 How general knowledge helps search Query: Someone smiling Caption: A man helping his daughter take her first step find information by inference (+KB)

38 November 17, 2005 38 Query: Show me pictures of strong and adventurous people Caption: A man climbing a rock face How general knowledge helps search find information by inference (+KB)

39 November 17, 2005 39 Text Document Query: Outdoor explosions in terrorist events Lebanon between 1990 and 2001 Document: 1993 pipe bombing on the patio of the Beirut Olive Garden How general knowledge helps search find information by inference (+KB)

40 November 17, 2005 40 Text Document Query: Threats to low-flying US airliners in Lebanon Document: Hezballah buys ten SA-7s. How general knowledge helps search find information by inference (+KB) + domain knowledge ^

41 November 17, 2005 41 Find and clean (consistency-check) information by inference (+KB) If Pat and Jan are married, their date of marriage should be the same; their address is likely to be the same; their genders are likely to differ; and so on.

42 November 17, 2005 42 What Led to Cyc? 1. Programs need general world knowledge, and commonsense, to break the brittleness bottleneck NL understanding, speech understanding, robotics, learning, expert systems, search,… 2. We know enough to do this; it is more an engineering task than a scientific research task. 3. The time was right (1984).

43 November 17, 2005 43 Cyc is… –The typical bird has 1 beak, 1 heart, lots of feathers,… –Hearts are internal organs; feathers are external protrusions –Most vehicles are steered by an awake, sane, adult,… human –Tangible objects cant be in 2 (disjoint) places at once –Badly injuring a child is much worse than killing a dog –Causes temporally precede (i.e., start before) their effects –A stabbing requires 2 cotemporal and proximate actors – etc. Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

44 November 17, 2005 44 -Each of these represented in formal logic -Info. about a set of hundreds of thousands of terms -Language-independent Penitentiary EnglishWord-Plume EnglishWord-Pen FrenchWord-Plume … WritingPen BirdFeather … Authoring ArabicWordForWritingPen Cyc is… Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world Corral

45 November 17, 2005 45 -Each of these represented in formal logic -Info. about a set of hundreds of thousands of terms An inference engine that produces the same sorts of inferences from those that people would. Interfaces so the system can communicate with people, data bases, spreadsheets, websites, etc. Cyc is… Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

46 November 17, 2005 46 Cyc Reasoning Modules Reasoning Modules Interface to External Data Sources Cyc API Knowledge Entry Tools User Interface (with Natural Language Dialog) Data Bases Web Pages Text Sources Other KBs Other Applications Other Applications Knowledge Authors Knowledge Authors Knowledge Users Knowledge Users External Data Sources External Data Sources Cyc Ontology & Knowledge Base

47 November 17, 2005 47 Painful Evolution of our Representation from Frames&Slots to Contextualized HOL Very specific information (some indirect, via SKSI) Upper Ontology Core Theories Domain-Specific Theories EVENT TEMPORAL-THING PARTIALLY-TANGIBLE-THING ( a, b ) a EVENT b EVENT causes( a, b ) precedes( a, b ) ( m, a ) m MAMMAL a ANTHRAX  causes( exposed-to( m, a ), infected-by( m, a ) ) (ist FtLaudHolyCrossERCase#403921 (caused CutaneousAnthrax (SkinLesions Ahmed_al-Haznawit))) First Order Predicate Calculus: unambiguous; enable mechanical reasoning Every American has a president. Every American has a mother. y. x. Amer(x) president(x,y) x. y. Amer(x) mother(x,y) Higher Order Logic (n th -order predicate calculus): contexts, predicates as variables, nested modals, reflection,…

48 The inference engine is a community of 720 agents that attack every problem and, recursively, every subproblem (subgoal). One of these 720 is a general theorem prover; the others have special-purpose data structures/algorithms to handle the most important, most common cases, very fast. The Knowledge Base is divided into thousands of contexts by: granularity, topic, culture, geospatial place, time,... Cyc is not monolithic Cyc is not committed to any one reasoning mechanism

49 Think of reasoning modules 721, 722, 723… as being all manner of external databases, simulators, translators… 98% of its content is marked as merely being usually true. So reasoning in Cyc is default (gather up all the pro/con arguments, and compare them). Cyc is not monotonic Cyc is not committed to its own reasoning mechanisms

50 November 17, 2005 50 Cyc Knowledge Base Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather General Knowledge about Various Domains Cyc contains: 15,000Predicates 300,000Concepts 3,200,000Assertions Represented in: First Order Logic Higher Order Logic Context Logic Micro-theories Specific data, facts, and observations

51 November 17, 2005 51 Cyc KB extended with domain knowledge about terrorism Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather General Knowledge about Terrorism Cyc contains: 15,000Predicates 300,000Concepts 3,200,000Assertions Represented in: First Order Logic Higher Order Logic Context Logic Micro-theories Specific data, facts, and observations about terrorist groups and activities Specific data, facts, and observations about terrorist groups and activities

52 November 17, 2005 Building Cyc qua Engineering Task amount known rate of learning learning by discovery learning via natural language Frontier of human knowledge

53 November 17, 2005 Building Cyc qua Engineering Task amount known rate of learning learning by discovery learning via natural language Frontier of human knowledge CYC

54 November 17, 2005 Building Cyc qua Engineering Task amount known rate of learning learning by discovery learning via natural language CYC 750 person-years 21 realtime years $75 million Frontier of human knowledge 198420042005 codify & enter each piece of knowledge, by hand

55 November 17, 2005 55 Guiding Principle: We have to get it to work, not appear to work –Dont defer hard problems (time/space/emotions…) –No NIH! Harness every good idea that others have –Take an engineering approach, not a scientific research one: Instead of one TOE (elegant full solution), find a set of partial solutions that together cover the most common cases –Pursue applications that require large amounts of real-world knowledge (they need Cyc and also will drive it)

56 November 17, 2005 56 Eschew the 5 pitfalls (ways to cut ontological corners and end up with something that only appears to work) Ignorance-based: Have a small theory size (#terms, #instances, #rules) Static KB (can be massively tuned, optimized, cached, etc. ahead of time) Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…) One global context (no contradictions, limited domain, simplified world) Dont do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, dont ever have truth maintenance turned on)

57 November 17, 2005 57 Eschew the 5 pitfalls (ways to cut ontological corners and end up with something that only appears to work) Ignorance-based: Have a small theory size (#terms, #instances, #rules) Static KB (can be massively tuned, optimized, cached, etc. ahead of time) Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…) One global context (no contradictions, limited domain, simplified world) Dont do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, dont ever have truth maintenance turned on) As with pharmaceuticals, what is toxic in one dosage is beneficial in a lesser dosage. E.g., contexts lead to locally-consistent locally-small theories (faster inference/KE) E.g., often some (sub)problems can be represented/solved in a simpler repr.

58 November 17, 2005 58 Choosing what to add to Cyc Bottom-up: Look at a sentence, see what knowledge the writer assumed the reader already had about the world. Generalize that piece of knowledge. Top-down: Articulate the scope of a (sub)topic, and articulate queries that should be answerable. Get missing K. by introspecting or just asking Cyc.

59 November 17, 2005 59 Represented in: First Order Logic Higher Order Logic Context Logic Microtheories The Cyc Knowledge Base Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather Real World Domain Knowledge Cyc contains: 15,000Predicates 300,000Concepts 3,200,000Assertions Specific cases, facts, details,…

60 November 17, 2005 60

61 November 17, 2005 61 Cyc KB Whitmans Sampler Temporal Relations Senses of x is a physical part of y Senses of x is physically in y Events and their performers (role types) Organizations Propositional Attitudes Biology Materials Devices Weather Information-bearing objects

62 November 17, 2005 62 Temporal Relations 37 Relations Between Temporal Things #$temporalBoundsIntersect #$temporallyIntersects #$startsAfterStartingOf #$endsAfterEndingOf #$startingDate #$temporallyContains #$temporallyCooriginating #$temporalBoundsContain #$temporalBoundsIdentical #$startsDuring #$overlapsStart #$startingPoint #$simultaneousWith #$after

63 November 17, 2005 63 Temporal Relations #$temporallyIntersects Some of these Relations are very General, such as: Such relations are particularly useful when they are known not to hold between a pair of individuals: (#$not (#$temporallyIntersects ?X ?Y)) That implies all of these: (#$not (#$spouse PERSON-X PERSON-Y)) (#$not (#$consultant AGENT-X AGENT-Y)) (#$not (#$accountHolder ACCOUNT-X AGENT-Y)) (#$not (#$residesInRegion AGENT-X REGION-Y)) (#$not (#$officiator EVENT-X PERSON-Y))

64 November 17, 2005 64 Senses of Part #$parts #$intangibleParts #$subInformation #$subEvents #$physicalDecompositions #$physicalPortions #$physicalParts #$externalParts #$internalParts #$anatomicalParts #$constituents #$functionalPart

65 November 17, 2005 65 Senses of In Can the inner object leave by passing between members of the outer group? –Yes -- Try #$in-Among

66 November 17, 2005 66 Senses of In Does part of the inner object stick out of the container? –None of it. -- Try #$in-ContCompletely –Yes -- Try #$in-ContPartially –If the container were turned around could the contained object fall out? No -- Try #$in-ContClosed Yes -- Try #$in-ContOpen

67 November 17, 2005 67 Senses of In Is it attached to the inside of the outer object? –Yes -- Try #$connectedToInside Can it be removed, if enough force is used, without damaging either object? –Yes -- Try #$in-Snugly or #$screwedIn Does the inner object stick into the outer object? Yes -- Try #$sticksInto

68 November 17, 2005 68 Event Types #$PhysicalStateChangeEvent #$TemperatureChangingProcess #$BiologicalDevelopmentEvent #$ShapeChangeEvent #$MovementEvent #$ChangingDeviceState #$GivingSomething #$DiscoveryEvent #$Cracking #$Carving #$Buying #$Thinking #$Mixing #$Singing #$CuttingNails #$PumpingFluid 11,000 more

69 November 17, 2005 69 A few event types pertaining to Vehicular Transportation #$TransportationEvent #$ControllingATransportationDevice #$TransportWithMotorizedLandVehicle (#$SteeringFn #$RoadVehicle) #$TransporterCrashEvent #$VehicleAccident #$CarAccident #$Colliding #$IncurringDamage #$TippingOver #$Navigating #$EnteringAVehicle

70 November 17, 2005 70 #$performedBy #$causes-EventEvent #$objectPlaced #$objectOfStateChange #$outputsCreated #$inputsDestroyed #$assistingAgent #$beneficiary #$fromLocation #$toLocation #$deviceUsed #$driverActor #$damages #$vehicle #$providerOfMotiveForce #$transportees Relations Between an Event and its Participants Over 400 more.

71 November 17, 2005 71 Here are some slot: value pairs for Attack874 isa: TerroristAttack. performedBy: JihadGroup. deviceUsed: Bomb8388. eventOccursAt: CityOfLondonEngland. victim: Person9399. victim: Person52666. assistingAgent: AlQaeda. objectsDestroyed: Structure2990. objectsDestroyed: Vehicle523452. These ActorSlots express each type of relation between an Event and its actors and subevents

72 November 17, 2005 72 Organization Slots #$governingBody #$parentCompany #$subOrgs-Command #$subOrgs-Permanent #$subOrgs-Temporary #$physicalQuarters #$hasHQinCountry #$officeInCountry #$memberTypes #$organizationHead #$PolicyFn #$mainProductType + those predicates that make sense for each generalization of Organization (e.g., #$startingTime, #$alsoKnownAs).

73 November 17, 2005 73 Emotion Types of Emotions: – #$Adulation – #$Abhorrence – #$Relaxed-Feeling – #$Gratitude – #$Anticipation-Feeling –Over 120 of these Predicates For Defining and Attributing Emotions: – #$contraryFeelings – #$appropriateEmotion – #$actionExpressesFeeling – #$feelsTowardsObject – #$feelsTowardsPersonType

74 November 17, 2005 74 Propositional Attitudes Relations Between Agents and Propositions #$goals #$intends #$desires #$hopes #$expects #$beliefs #$opinions #$knows #$rememberedProp #$perceivesThat #$seesThat #$tastesThat

75 November 17, 2005 75 Materials Common Substances Attributes of Materials States Of Matter Solutions Electrical Conductivity Thermal Conductivity Structural Attributes Tangible Attributes

76 November 17, 2005 76 Materials Common Substances Attributes of Materials States Of Matter – SolidStateOfMatter – LiquidStateOfMatter – GaseousStateOfMatter Solutions Electrical Conductivity Thermal Conductivity Structural Attributes Tangible Attributes – SolidTangibleThing – LiquidTangibleThing – GaseousTangibleThing

77 November 17, 2005 77 Devices Over 4000 Specializations of #$PhysicalDevice – #$ClothesWasher – #$NuclearAircraftCarrier Vocabulary for Describing Device Functions – #$primaryFunction-DeviceType Device Specific Predicates #$gunCaliber #$speedOf Device States (40+) #$DeviceOn #$CockedState

78 November 17, 2005 78 Vehicular Transport Devices Over 800 Specializations of #$RoadVehicle – #$AcuraCar – #$SportUtilityVehicle – #$Humvee Over 100 Specializations of #$AutoPart – #$AutomobileTire – #$ShockAbsorber – #$Windshield Five Facets of #$RoadVehicle #$RoadVehicleByChassisType #$RoadVehicleTypeByBodyStyle #$RoadVehicleTypeByModel #$RoadVehicleTypeByPowerSource #$RoadVehicleTypeByUse Specialized Predicates #$highwayFuelConsumption #$vehicleLoadClass #$trafficableForVehicle #$vehicle

79 November 17, 2005 79 Weather Weather Attributes – #$ClearWeather – #$Visibility – (#$LowAmountFn #$Raininess) Weather Objects #$CloudInSky #$SnowMob Weather Events #$TornadoAsEvent #$SnowProcess

80 November 17, 2005 80 Information-Bearing Things Books, web-page copies, radio broadcasts, utterances, intell cables, TV series,…

81 November 17, 2005 81 T i s M o b y D i c k ! (#$thereExists ?SEE (#$and (#$isa ?SEE Seeing) (#$objectPerceived ?SEE #$MobyDick) (#$perceiver ?SEE #$CaptainAhab))) AbstractInformationStructure (AIS) PropositionalInformationThing (PIT) InformationBearingThing (IBT) What is Moby Dick ?

82 November 17, 2005 82 PropositionalInformationThing (PIT) InformationBearingThing (IBT) ConceptualWork (CW) AbstractInformationStructure (AIS) textOfIBT instantiationOfCW InfoStructureOfCW #$infoStructureRepresents ContainsInfo-Propositional-CW PITOfIBTFn What is Moby Dick ?

83 November 17, 2005 83 Bridging the Knowledge Gap upper ontology lower ontology: task-specific knowledge HUMMVs lose 18% traction in 4-inch-deep mud Water is wet Intermediate ontology Vehicles slow down in bad weather

84 November 17, 2005 84 (in 1972), improving it over the years as -- but only as -- we needed to. KR Lessons Learned Fred Albertson ownsA: Dog isA: Person worksFor: UT. We started with a straightforward Frames & Slots representation

85 November 17, 2005 85 KR Lessons Learned But Frames&Slots are inadequate to naturally express disjunction (Fred owns a dog or a parakeet.) negation (Fred does not own a dog.) modals (Fred believes Israel wants Egypt to expect…) meta-assertions (That rule is 50 years old but reliable.) nested quantification ( w)( x)( y)( z)… Every American has a president. versus Every American has a mother. We started with a straightforward Frames & Slots representation

86 November 17, 2005 86 KR Lessons Learned 2.On the one hand, we must move from Frames&Slots to Logic. But on the other hand: Theorem-proving is too slow! Solution: Do it, and to recoup efficiency, separate: The Epistemological Problem (what should the system know?) The Heuristic Problem (how can it reason efficiently with&about what it knows?) I.e., represent each assertion in (at least) 2 ways: one standard logical (predicate calculus) form (EL), and one (or more) efficient special-purpose representations (HL)

87 November 17, 2005 87 Bridging the knowledge gap: do the intermediate theories. Rather than struggling to reason in NL sentences, use a more formal representation language. Make this as simple as possible (but, year by year, we had to make it ever more expressive.) Similarly, represent only – but all – useful distinctions. Sounds trivial but leads to huge ontologies of objects, predicates, scripts.. Distinguish the EL and HL. Rather than striving in vain for a single fast inference engine, use a suite of 720 heuristic modules that each handle some commonly-occurring problems very fast. Probabilities are great iff known; often relative likelihood known Most knowledge is default; reason by argumentation Rather than striving in vain for a monolithic consistent KB, divide the KB up into many locally-consistent contexts Lessons Learned

88 November 17, 2005 88 Contexts (Microtheories) Global Consistency: Cant Live With It, Cant Give It Up! Whats the real source of the problem? Each rule is rich: it is a simplified statement that obscures a plethora of unstated assumptions and details. As long as the rules are all in one coherent small context, they are likely to make the same simplifying assumptions, and hence are likely to work together consistently.

89 November 17, 2005 89 If its raining, carry an umbrella the performer is a human being, the performer is sane, the performer can carry an umbrella; thus: the performer is not a baby, not unconscious, not dead, the performer is going to go outdoors now/soon, their actions permit them a free hand (e.g., not wheelbarrowing) their actions wouldnt be unduly hampered by it (e.g., marathon-running) the wind outside is not too fierce (e.g., hurricane strength) the time period of the action is after the invention of the umbrella the culture is one that uses umbrellas as a rain- (not just sun-)protection device, the performer has easy access to an umbrella; thus: not too destitute, not someone who lives where it practically never rains, not at the office/theater/… caught without an umbrella the performer is going to be unsheltered for some period of time the more waterproof their clothing, the gentler the rain, and the warmer the air, the longer that time period the performer will not be wet anyway (e.g., swimming) the rain is annoying -- but merely annoying. Thus: not ammonia rain on Venus, radioactive post-apocalyptic rain, biblical (Noahs-ark-sized, or frogs/blood as rained on Pharaoh) the performer is not a hydrophobic person, gingerbread man, etc., and not a hydrophilic person, someone dying of thirst, etc.

90 November 17, 2005 90 Each assertion should be situated in a context: in a region of context-space We identified 12 dimensions of mt-space We developed a vocabulary of predicates and terms to describe points and regions along each of those 12 dimensions; and We have been situating assertions more and more precisely, and we have been working out calculi for inferring contexts –E.g., if P is true in C1, and P=>Q is true in C2, in what context C2 can Q be validly concluded? Anthropacity Time GeoLocation TypeOfPlace TypeOfTime Culture Sophistication/Security Topic Granularity Modality/Disposition /Epistemology Argument-Preference Justification

91 November 17, 2005 91 Mathematical Factoring of Context-space Dimensions UnitedStatesIn1985Context: Ronald Reagan is president. PennsylvaniaIn1985Context: Dick Thornburgh is governor. LehighCountyInFebruary1985Context: Dick Thornburgh is governor and Ronald Reagan is president. This inference depends on the time, space, and respective granularities of the contexts. There are at least 900,000 doctors. Dick Thornburgh is governor and there are at least 900,000 doctors.

92 November 17, 2005 92 Time Indices and Granularities But not: Doug is talking, at 10:55:11 to 10:55:13, on 11/17/05. Doug is talking, at 10:30 to 11:30, on 11/17/05. Doug is talking, at 10:50 to 11:05, on 11/17/05. Therefore:

93 November 17, 2005 93 Time Indices and Granularities t = that one hour interval Future t So: talking during that 15-minute interval? Yes Talking during that 2-second interval: Unknown Calendar Minutes P = Doug is talking. Doug is talking, at 10:30 to 11:30, on 11/17/05 with temporal granularity calendar minute.

94 November 17, 2005 94 Cyc is a power source, not a single application. Like oil, electricity, telephony, computers,… Cyc can spawn and sustain a new industry. It can cost-effectively underlie almost all apps. (Provide a common-sense layer to reduce brittleness when faced with unexpected inputs/situations) To apply Cyc, we extend its ontology, its KB, and possibly its suite of specialized reasoning modules Summary (1): Technology

95 November 17, 2005 95 20 Motivating Applications (1984)

96 November 17, 2005 96 5 More Recent Application Ideas

97 November 17, 2005 97 Recent/Current Government Apps Dept. of Defense (mostly DARPA, ONR) –CoABS, HPKB, CPoF, DAML, ACIP –RKF (OE-ing by non-logicians via clarification dialogue) –BUTLER: Knowledge-based machine learning –ResearchCyc: Clean, document, speed up, interface, etc. –ONR: Level 2 and 3 Information Fusion (sense-making) Other US Government Agencies (NSF, ARDA, NIST) –NIST ATP: Jumpstarting a Natl. Knowledge Infrastructure –AQUAINT, NIMD, Topsail, Eagle, KSP-ATD,… –Building a comprehensive terrorism KB for the US –Automated generation of plausible terrorism threat scenarios –Modeling intelligence analysts (script learning/recognition) –Semantic knowledge source integration –Efficient Inference in Large Knowledge Bases

98 November 17, 2005 98 using Cyc as the basis for a medical ontology –aligning Cyc with Snomed/UMLS/Mesh/... multiple-thesaurus manager (align n 300k-term lists) spider the entire Web (indexing it in terms of Cyc concepts) identify inter-sentential references in NPR transcripts improved web (and website) search query/follow-ups vulnerability assessment (reason about a scanned network) semantic matching for a better customer experience Recent/Current Commercial Apps

99 November 17, 2005 99 Summary (2): Cycorp 50 employees (almost all MTSs) Revenue about $7M/year (some commercial licenses and app.s, but >50% US Government R&D contracts) Employee-owned (VC-free and debt-free) $75M development effort (750 PYs over 21 years) –Mostly spent on building up its ontology and KB –To a lesser extent, its reasoning modules and interfaces –Focus: automatically growing Cyc via learning –Focus: enabling Cyc users to directly extend it –Focus: making inference orders of magnitude faster

100 November 17, 2005 100 bits/bytes/streams/network… alphabet, special characters,… words, morphological variants,… syntactic meta-level markups (HTML) semantic meta-level markups (SGML, XML) content (logical representation of doc/page/...) context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the sources purpose, etc.) Summary (3): The Message: What Needs to be Shared?

101 November 17, 2005 101 Summary (3): The Message: What Needs to be Shared? bits/bytes/streams/network… alphabet, special characters,… words, morphological variants,… syntactic meta-level markups (HTML) semantic meta-level markups (SGML, XML) content (logical representation of doc/page/...) context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the sources purpose, etc.) Tiny vocabulary (# distinctions) of standard relations: rdf:type, subclass, label, domain, range, comment,… Beyond which diversity is tolerated Which means divergence is inevitable What do you mean we have no standard, we have lots of standards!

102 November 17, 2005 102 Summary (3): The Message: What Needs to be Shared? bits/bytes/streams/network… alphabet, special characters,… words, morphological variants,… syntactic meta-level markups (HTML) semantic meta-level markups (SGML, XML) content (logical representation of doc/page/...) context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the sources purpose, etc.) Tiny vocabulary (# distinctions) of standard relations: rdf:type, subclass, label, domain, range, comment,… Beyond which diversity is tolerated Which means divergence is inevitable What do you mean we have no standard, we have lots of standards! DAML+OIL adds a few more distinctions: inverses, unambiguous properties, unique properties, lists, restrictions, cardinalities, pairwise disjoint lists, datatypes, … To do the logical/arithmetic combination across information sources, we need tens of thousands of relations, not tens

103 November 17, 2005 103 From the Users POV The user has a question they want answered The data needed to answer it is available to them, but not in one single, obvious, reliable place The answers follow logically (and/or arithmetically) from m elements in n sources Dont want to have to know, ahead of time, what sources to go to, how to access them, how to combine the intermediate results. Do want to be able to limit, ahead of time, the uncertainty, recency, granularity, ideology… (and/or see such meta-level info for each answer) Which first-run movies star a teenager born in Texas and are showing today at a theater < 10 minutes drive from this building?

104 November 17, 2005 104 From the Users POV The user has a question they want answered The data needed to answer it is available to them, but not in one single, obvious, reliable place The answers follow logically (and/or arithmetically) from m elements in n sources Dont want to have to know, ahead of time, what sources to go to, how to access them, how to combine the intermediate results. Do want to be able to limit, ahead of time, the uncertainty, recency, granularity, ideology… (and/or see such meta-level info for each answer)

105 November 17, 2005 105 From the Users POV The user has a question they want answered The data needed to answer it is available to them, but not in one single, obvious, reliable place Do want the answer to be found automatically, not a bunch of relevant pages for them to peruse. Dont want to have to know, ahead of time, what sources to go to, how to access them, how to combine the intermediate results. Do want to be able to limit, ahead of time, the uncertainty, recency, granularity, ideology… (and/or see such meta-level info for each answer) Which first-run movies star a teenager born in Texas and are showing today at a theater < 10 minutes drive from this building?

106 November 17, 2005 106 bits/bytes/streams/network… alphabet, special characters,… words, morphological variants,… syntactic meta-level markups (HTML) semantic meta-level markups (SGML, XML) content (logical representation of doc/page/...) context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the sources purpose, etc.) Summary (3): The Message: What Needs to be Shared?

107 November 17, 2005 107 End of The Message End of The Summary Delve into a typical domain – answering intelligence analysts queries – where Cyc can really help, because that domain thwarts all five of ontological corner-cutting solutions (+ digressions for OpenCyc, ResearchCyc,…)

108 November 17, 2005 108 Eschew the 5 pitfalls (ways to cut ontological corners and end up with something that only appears to work) Ignorance-based: Have a small theory size (#terms, #instances, #rules) Static KB (can be massively tuned, optimized, cached, etc. ahead of time) Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…) One global context (no contradictions, limited domain, simplified world) Dont do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, dont ever have truth maintenance turned on) As with pharmaceuticals, what is toxic in one dosage is beneficial in a lesser dosage. E.g., contexts lead to locally-consistent locally-small theories (faster inference/KE) E.g., often some (sub)problems can be represented/solved in a simpler repr.

109 November 17, 2005 109 "What sequences of events could lead to the destruction of Hoover Dam?" Were there any attacks on targets of symbolic value to Muslims since 1987 on a Christian holy day?" Cycorp Tools For: Ontology-Building, -Browsing, -Editing, & Fact/Rule Entry Domain Experts Scenario Generation Explanation Generation Query Formulation Scenario Generator Explanation Generator Query Formulator Others/GOTS Analysis and Collaboration Components AKB The Analysts Knowledge Base Relational DB projection of the AKB CT Analyst Terrorism Knowledge General Knowledge

110 2. Terrorism domain experts met to develop a schema for the missing knowledge. 4. Cyc uses general and domain knowledge to convert the simple English phrases into formal logic. TKS6 MIPTTKS3 MATRIX TTT TKS7TKS8 TKS2 1.Fusion of available structured terrorism knowledge sources: A tiny fraction of the Comprehensive AKB. Terrorism Knowledge Preexisting Structured Relevant Knowledge 1.92M 80k 5. The Comprehensive AKB: First useful state: will contain over 4M facts and rules of thumb, about half of which is pre-existing general knowledge already in Cyc. 3. They and others are working remotely, collaboratively, to flesh out the missing 95% of the AKB.

111 November 17, 2005 111 1) List the [ORGANIZATIONS] at which [AGENT] was [STATUS] and when. (1a) List the schools at which [Mohammed Atta] was [enrolled] and when. (1b) List the companies at which [Mark Fulton] was a [employed] and when. (3) What percentage of [ATTACK-TYPE] are [ATTACK-TYPE]? (3a) What percentage of [terrorist attacks] are [poisonings]? (3b) What percentage of [bombings] are [suicide bombings]? (4) Between what times was the [AGENT] a/an [ROLE-PREDICATE] in what types of acts and where? (4a) Between what times was the [Aum Supreme Truth] a [performer] in what types of acts and where? (4b) Between what times was the [Ulster Volunteer Force] an [assisting agent] in what types of acts and where? Templatized Terrorism Analysis Queries

112 November 17, 2005 112 (13) List all [AGENT-TYPE] in [LOCATION] that have used [DEVICE-TYPE] and list the specific types of (devices) that each has used. (13a) List all [revolt organizations] in [Northern Ireland] that have used [pipe bombs] and list the specific types of pipe bombs that each has used. (13b) List all [right wing terrorist groups] in [North America] that have used [package bombs] and list the specific types of package bombs that each has used. (22) List the [AGENT-TYPE] who have [RELATION] [TYPE] to [AGENT] and what those supplies were. (22a) List the [Terrorist groups] who have [given] [supplies] to [Hamas] and what those supplies were. (22b) List the [state sponsored terrorist agents] who have [provided] [support] to [Osama Bin Laden] and what those supplies were. Templatized Terrorism Analysis Queries

113 November 17, 2005 113 CIA Intelligence Report Seeking Information: Ahmad Said July 26, 2004 Ahmad Said, an expert on remote-controlled bombs with a degree in chemical engineering, was seen travelling to Lebanon early this month. Said claimed to be a member of the Lebanese Hizballah from the mid 1980s until late July 1999. It is currently believed that Said assisted in the July 22 nd car bombing in Beirut that damaged police barracks and destroyed several retail stores. Lebanese Hizballah's spokesman, Emad Mugniyeh, issued a statement on July 26th to the Al Aman newspaper denying the group's involvement in the attack.

114 November 17, 2005 114 Deeper Analytical Question Answering What factors argue the conclusion that ? For: - ETA often executes attacks near national election - ETA has performed multi-target coordinated attacks - Over the past 30 years, ETA performed 75% of all terrorist attacks in Spain - Over the past 30 years, 98% of all terrorist attacks in Spain were performed by Spain-based groups, and ETA is a Spain-based group. Against: -ETA warns (a few minutes ahead of time) of attacks that would result in a high number civilian casualties, to prevent them. There was no such warning prior to this attack. -ETA generally takes responsibility for its attacks, and it did not do so this time. -ETA has never been known to falsely deny responsibility for an attack, and it did deny responsibility for this attack.

115 November 17, 2005 115 Automatic Link Detection

116 November 17, 2005 116 Automatic Link Detection

117 Intelligent Fusion: Disparate Data USS Lake Champlain is scheduled to return to its homeport (NavBase San Diego) 1300 4 September Hurricane Howard predicted to make landfall at Tijuana, Mexico approx. 0100 5 September 0600 4 September: satellite imagery reveals 126 boats berthed Silver Gate Yacht Club. 1135 4 September: Coast Guard reports two cigarette boats, traveling together at 54 knots, on a trajectory consistent with a path from the Silver Gate Yacht Club to the entrance of the San Diego Naval Base. Monitoring of cell phone activity of a suspected Red Dawn terrorist cell member in Syria has identified four calls, each of 30 seconds duration, placed to that suspect from Shelter Island between 2300 September 3 and 1100 September 4. Intelligent Fusion: Disparate Data 0600 4 September: Silver Gate Yacht Club harbormaster manifest only lists 124 craft.

118 November 17, 2005 118 meet in middle Start from seed, if given one Generate chains of action and plausible reaction Each step should be both plausible and interesting End at target, if given one Grow whole populations of such paths, not just one. Employ heuristics to evaluate each nodes promise: plausibility x interestingness Automatic Generation of Plausible (Counter)Terrorism Scenarios

119 November 17, 2005 119 Each step can be a… Political event (e.g., an election) Diplomatic event (communique) Military event (buildup along border) Terrorist event (suicide bombing) Economic event (loan; arms sale) Infrastructure event (power outage) Act of Nature (illness; hurricane) Often a step is just a response, by 1 or more agents, to the prior step (or, if going right to left, it is an enabler/cause of the already-known successor step) Generate chains of action and plausible reaction

120 November 17, 2005 120 Each step can be a… Political event (e.g., an election) Diplomatic event (communique) Military event (buildup along border) Terrorist event (suicide bombing) Economic event (loan; arms sale) Infrastructure event (power outage) Act of Nature (illness; hurricane) Hoover dam is blown up Generate chains of action and plausible reaction

121 November 17, 2005 121 Hoover dam is blown up detonate a crude 100 kton nuclear bomb, 1 km away Al Qaida has high net worth (assets) and the will to do it buy it for $1M from Pakistan Al Qaida does a sudden, atypical liquidizing of $1M of its assets Destroy 3.24M tons of concrete Something that we can look for Pakistan has such devices and is financially hurting Generate chains of action and plausible reaction

122 November 17, 2005 122

123 November 17, 2005 123 Auto. Scen.Gen.: Lessons Learned Forward generation is too explosive Backward generation is too sterile Instead, use a sort of cardiac rhythm – Take a large step backward (ABDUCTION) – Work forward a little from it (DEDUCTION) – Repeat.

124 November 17, 2005 124 Targeted Fact Gathering: Web Search Abu Sayyaf was founded in ___ Al Harakat Islamiya, established in ___ ASG was established in ___ Search Strings Local storage Abu Sayyaf was founded in the early 1990s Parse (foundingDate AbuSayyaf (EarlyPartFn (DecadeFn 199))) Suggested Fact (foundingDate AbuSayyaf ?X)

125 November 17, 2005 125 (maritalStatus YassirArafat Single) (maritalStatus YassirArafat Married) (maritalStatus YassirArafat Divorced) … (maritalStatus YassirArafat Cohabitating-Unmarried) Search Strings (maritalStatus YassirArafat Married) Suggested Fact (maritalStatus YassirArafat ?X) Yasser Arafats fiance Yasser Arafats wife Yasser Arafats ex-wife Yasser Arafat divorced All Possible Facts PersonTypeByMaritalStatus Targeted Fact Gathering: Web Search

126 November 17, 2005 126 Harnessing Lots of Users useful distinguishing facts Identify underpopulated common sense predicates Use semantic constraints + shallow parsing to identify possible fact completions Present multiple choice questions to novices to complete facts 150-400 commonsense GAFs/hour Hat worn on: Head Neck Foot Leg

127 November 17, 2005 127 OpenCyc Open Source release of: [most of] the Cyc Ontology + Simple Relns. + Inference Engine ResearchCyc Almost All of Cyc (for free for R&D purposes)

128 November 17, 2005 128 The OpenCyc Release Runs on Windows, Linux OpenCyc Knowledge Base –LGPL license –47,000 terms –306,000 facts Cyc Inference Engine –Free license for binary runtime engine Application Programming Interface –Java, SubL, Python Extensive documentation –Ontological Engineers Handbook –Online Cyc 101 course

129 November 17, 2005 129 Why Do We Release All This? Advance the starting line for AI Enable a large number of users to in effect help us to grow the Cyc Knowledge Base Help Cyc become a critical component –in the Semantic Web –in more and more applications –using OpenCyc hopefully leads to using ResearchCyc for free, eventually licensed

130 November 17, 2005 130 OpenCyc is Upward- Compatible with ResearchCyc ResearchCyc contains OpenCyc Natural Language Processing subsystem Many more facts/rules per term –The extent of non-structural predicates

131 November 17, 2005 131 60,000 OpenCyc Users/Contributors, 50 Active ResearchCyc User Groups: Xerox PARC Daxtron Labs Lockheed Martin ATLD Government Government-related Commercial Houston VA Medical Center Air Force Rome Labs Institute for the Study Of Accelerating Change U of Maryland Language Computer Corporation NTT Communications Science Laboratories (Japan) Northwestern U Stanford NLP Dept. ANSER, Inc. LBJ School of Public Affairs Fraunhofer Institute U of Illinois Urbana-Champaign New Mexico Highlands Univ. Harvard U Linkoping U (Sweden) Radboud U (Netherlands) Tokyo Inst. of Technology Terra Incognita University Microfabrica, Inc. U of Stuttgart NPOs MIT Media Lab Witan International U of Pennsylvania SRI 21 st Century Technologies U of Minnesota Stones Throw Technologies ISI Trimtab Consulting U of Hawaii Rensselaer AI and Reasoning Lab TNO-DMV (Netherlands) Sapio Systems (Denmark) U of Toronto Knowledge Media Institute, Open University Austin Info Systems

132 November 17, 2005 132 End of The Message End of The Summary Delve into a typical domain – answering intelligence analysts queries – where Cyc can really help, because that domain thwarts all five of ontological corner-cutting solutions (+ digressions for OpenCyc, ResearchCyc,…)

133 November 17, 2005 133 Eschew the 5 pitfalls (ways to cut ontological corners and end up with something that only appears to work) Ignorance-based: Have a small theory size (#terms, #instances, #rules) Static KB (can be massively tuned, optimized, cached, etc. ahead of time) Simple assertions (e.g., SAT constraints; propositional calculus; Horn;…) One global context (no contradictions, limited domain, simplified world) Dont do all the bookkeeping and forward inference required for justification maintenance (or, equivalently, dont ever have truth maintenance turned on) As with pharmaceuticals, what is toxic in one dosage is beneficial in a lesser dosage. E.g., contexts lead to locally-consistent locally-small theories (faster inference/KE) E.g., often some (sub)problems can be represented/solved in a simpler repr.

134 November 17, 2005 134 5 Factors slowing IC inference Problem (F1) Constant stream of new assertions, new data to assimilate. –elaboration tolerance vs. tuned, optimized, compiled representations. (F2) Theory Size: Huge vocab. and #instances (people, specific reports,…) (F3) Sophisticated assertions and constraints strain even FOPC –More repr. language features (e.g., quantification) => slower inference (F4) Assertions are often true in one context and false in another –Contextualized data and queries => exponentially larger search space (F5) Truth maintenance must be on, to assimilate new data properly, and to provide the symbolic justifications behind its conclusions. –Each new datum can trigger an avalanche of TMS reactions in the KB –There can be multiple answers, each with multiple justifications

135 November 17, 2005 135 5 Factors slowing IC inference (F1) Constant stream of new assertions, new data to assimilate. –elaboration tolerance vs. tuned, optimized, compiled representations. (F2) Theory Size: Huge vocab. and #instances (people, specific reports,…) (F3) Sophisticated assertions and constraints strain even FOPC –More repr. language features (e.g., quantification) => slower inference (F4) Assertions are often true in one context and false in another –Contextualized data and queries => exponentially larger search space (F5) Truth maintenance must be on, to assimilate new data properly, and to provide the symbolic justifications behind its conclusions. –Each new datum can trigger an avalanche of TMS reactions in the KB –There can be multiple answers, each with multiple justifications Problem

136 November 17, 2005 136 Slow Queries Queries that take a long time (okay, but faster is better) – Generate scenarios resulting in destruction of NY Stock Exchange Still running after 2 months – Answer query Q modulo a small number of plausible unknown clauses Queries that take a long time and shouldnt – (capableOf ArnoldSchwarzenegger RunningForPresidentOfUS) Takes 40 minutes to return False. Why: Wasting time seeing if Arnold is an x where x cant be President (e.g., Cow) –(hasBeliefSystems AdolfHitler AntiSemitism) In the context of World History 1944, takes 16 minutes to return True. Why: Lots of ways this might not be true

137 November 17, 2005 137

138 November 17, 2005 138

139 November 17, 2005 139 Slow Queries Queries that take a long time (okay, but faster is better) – Generate scenarios resulting in destruction of NY Stock Exchange Still running after 2 months – Answer query Q modulo a small number of plausible unknown clauses Queries that take a long time and shouldnt – (capableOf ArnoldSchwarzenegger RunningForPresidentOfUS) Takes 40 minutes to return False. Why: Wasting time seeing if Arnold is an x where x cant be President (e.g., Cow) –(hasBeliefSystems AdolfHitler AntiSemitism) In the context of World History 1944, takes 16 minutes to return True. Why: Lots of ways this might not be true

140 November 17, 2005 140 Effic. Reasoning Hypotheses Hypothesis 1: There is no silver bullet, no one magic key waiting to be discovered which will unlock efficient pathfinding on huge knowledge-spaces. –Rather, such inference will only be improved incrementally, by bringing to bear a large number of efficient partial solutions.

141 November 17, 2005 141 Effic. Reasoning Hypotheses Hypothesis 2: These special-case solutions are not random, but factor into a handful of different categories. –A 2-day workshop meeting could productively be held for each such category –Important interstitial work to be done, collaboratively, before and after the meetings.

142 November 17, 2005 142 6 categories (workshop topics) Reasoners that exploit limitations in the expressivity of the repr. language they operate over –Description Logic, 1 st order, etc. –What simplifications enable what speedups? –At what risk? Domain-specific (incl. Context-specific) reasoners Statistical/Bayesian Reasoners Unsound (but presumably useful) reasoners Meta-reasoners (tacticians) and Meta 2 (strategists) Parellel Processing, HW Acceleration, Other

143 November 17, 2005 143 6 categories (workshop topics) Reasoners that exploit limitations in the expressivity of the repr. language they operate over –Description Logic, 1 st order, etc. –What simplifications enable what speedups? –At what risk? Domain-specific (incl. Context-specific) reasoners –What sorts of domain knowledge do they utilize? –How do they use that to speed up inference? –Contexts, dimensions of context-space, algorithms for exploiting that structure of the KB to do faster reasoning

144 November 17, 2005 144 6 categories (workshop topics) Statistical/Bayesian Reasoners –How can these cooperate with, help, and be helped by non-statistical reasoners (acting as independent agents)? –How can statistical and symbolic inference be more tightly integrated in a single reasoner (cf. Koller) ? Unsound (but presumably useful) reasoners –Abduction, induction, analogy, abstraction (ignoring details which hopefully wont matter), scen. generation –How can these cooperate with, help, and be helped…? –How can unsound and sound inference be more tightly integrated in a single reasoning engine?

145 November 17, 2005 145 6 categories (workshop topics) Meta-reasoners (tacticians) and Meta 2 (strategists) –Do/Improve object-level meta- level reasoning –Types of meta-… (prior & tacit; trails; reflection;…) Other –Parallel processing –Hardware acceleration (special purpose chips etc.) –New types of reasoning modules and strategies, that dont fit in any above group, that folks are working on. –What specific gaps are there (useful, doable, efficient reasoners no one has even started to research yet) ?

146 November 17, 2005 146 Background & Lit. Review Instantiation-based reasoning systems Lifted DPLL procedures (Davis Putnam Longemann Loveland) Completion/Boolean Ring based methods ContractNet TeamWork Scatter-gather algorithms Auto. theory decomposition by static analysis Explanation-based learning/partial evaluation mechanisms that learn generalized proof schemata

147 November 17, 2005 147 Effic. Reasoning Hypotheses 1.No silver bullet 2.6 types of powerful partial solutions already exist –Reasoners that exploit limitations in the expressivity of the representation language they operate over –Domain-specific (incl. Context-specific) reasoners –Statistical/Bayesian Reasoners –Unsound (but presumably useful) reasoners –Meta-reasoners (tacticians) and Meta 2 (strategists) –Other, HW accel., parallel processing 3.They can cooperate / synergize (neutral harness)

148 November 17, 2005 148 Effic. Reasoning Hypotheses Hypothesis 3: They can cooperate / synergize. –Explicitly characterize, for each agent (reasoner): A trigger -- in effect specifying its area of competence A procedure for estimating its cost, its chance to succeed, etc. –Cycs immense KB and EL HL architecture makes it an efficient reasoning module magnet or universal recipient

149 November 17, 2005 149 Effic. Reasoning Hypotheses Hypothesis 3: They can cooperate / synergize. More than that, we can and will harness ~10 of them. –Explicitly characterize, for each agent (reasoner): A trigger -- in effect specifying its area of competence A procedure for estimating its cost, its chance to succeed, etc. –Cycs immense KB and EL HL architecture makes it an efficient reasoning module magnet or universal recipient Use Cyc [and ARDA-related assertions/queries in it] as a testbed for –operationally publishing the results of each workshop –experiments on comparative and collaborative power /SOW Hold 3 workshops, on the 6 topics, in 2006 Participation by all the leading experts Pre: readings. Post: actually harness them

150 Efficient Pathfinding in Very Large Data Spaces GOALS Develop an ontology and a standard for specifying the applicability, % success, estimated resource cost, etc., of bringing various reasoning modules to bear on a problem Build an Integration Framework, a Harness, that enables several of the worlds leading reasoning systems to cooperatively solve problems [using the above ontology and standard to act as agents, broadcast subproblems, etc.] Actually hook them up to this Har- ness and run them, on test problems from NIMD, AQUAINT, etc. Overcome the 5 problems that make IC reasoning hard: (1) New assertions constantly (cant just compile the KB) (2) Each is true in some contexts (in 2003; believed by x) (3) Many are complex (x believes that y believes that…) (4) Huge vocabulary size and number of instances (5) Justifications / sources matter (truth maint. Must be on) Workshop Highlights 4Q 05Pre-start invitations and Steering Comm. planning 1Q 06 Project starts. 1 st workshop: gaining efficiency by limiting representation language expressivity 2Q 06 Interstitial work on ontology and standard; building the initial Framework/harness; try out 2 agents; 2 nd workshop: gaining efficiency by limiting the domain, the type of problem to be solved, etc. 3Q 06 3 rd workshop: Integrating Bayesian probability and statistical reasoning with symbolic theorem-proving 1Q 07 4 th workshop: meta-reasoning (tactics & strategy) 5 th workshop: unsound reasoning (e.g., analogy) 4Q 06 6 th workshop; Final Report; Hand-off to I.C./Ops Champions for tech transfer/operationalization APPROACH Identify the most important ways in which automated reasoners gain efficiency: limit domain, limit expressive-ness, integrate probabilistic and symbolic reasoning, meta-reasoning, and unsound reasoning (e.g., analogy) Hold a workshop on each topic (16 invitees; 15 said Yes) After/between the workshops, get these system builders to publish their reasoner to the growing Framework/harness so each can bid for, work on, and broadcast subproblems Workshop PIs: Doug Lenat, Cycorp Michael Genesereth, Stanford Workshop Steering Committee: R.V. Guha, Google; Chris Welty & Andrew Tompkins, IBM; Andrei Veronkov, Manchester; + I.C./Ops. Champions

151 November 17, 2005 151 2 July 2005 The pursuit of Artificial Intelligence -- from robotics to natural language processing to automated learning -- has been held back by the "brittleness bottleneck" caused by the need for common sense. For 21 years, we've been priming the pump, building up a formalized corpus of such knowledge, Cyc. Along the way, we've had to revise our preconceptions and theories, to expand our representation language and arsenal of inference methods, to find approximate yet adequate engineering solutions to problems that philosophers have grappled with for millennia such as ontologizing aspects of substances versus individual objects, time, space, causality, belief, social interactions, and so on. The process of ontological engineering had to grow and evolve throughout this enterprise, as well, such as how Cyc represents and reasons with contradictions and context. In this talk I will try to cover both the large scale picture of what we've built and why, and the detailed picture of how it's built, and the lessons learned along the way in how and how not to do large-scale OE. I will report on our recent efforts to make Cyc more accessible to the broader community through OpenCyc and ResearchCyc, which raises issues of how multiple individuals and groups can share and integrate their extensions (and settle their differences). Finally, I will discuss an exciting new effort we have just had funded, to gather automated reasoning researchers together for a series of workshops in 2006 on speeding up inference in large knowledge bases by orders of magnitude. CYC: Lessons Learned in Large- Scale Ontological Engineering


Download ppt "November 17, 2005 1 Dr. Douglas B. Lenat, 3721 Executive Center Drive, Suite 100, Austin, TX 78731 Phone: (512) 342-4001 Fax: (512)"

Similar presentations


Ads by Google