Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reading to Learn Q4 Review Peter Clark John Thompson Phil Harrison Bill Murray.

Similar presentations


Presentation on theme: "Reading to Learn Q4 Review Peter Clark John Thompson Phil Harrison Bill Murray."— Presentation transcript:

1 Reading to Learn Q4 Review Peter Clark John Thompson Phil Harrison Bill Murray

2 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

3 SRI-Boeing’s Reading to Learn Seedling Goal: –study issues in learning through reading by working with a reduced version of the problem, namely working with controlled, rather than unrestricted natural language. The NLP task is factored into two: full NL → CL, CL → logic Rationale: –by sidestepping some of the shallow linguistic issues of full NLP, can focus on deeper issues –methods for full NL → CL can be studied separately this project

4 SRI-Boeing’s Reading to Learn Seedling Approach: –Rewrite 5 pages of chemistry text into our controlled language, CPL –Extend and use our CPL interpreter to generate logic –Integrate this new knowledge with an existing chemistry knowledge base (from the Halo Pilot), which has the new knowledge surgically deleted from it –Report on the problems encountered and solutions developed

5 This Seedling in Mobius Knowledge Integration Introspection Natural Language Processing Test Generation This seedling

6 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

7 Recap: October 2005 Tutorial on the 5 pages of chemistry text –Acid-base reactions, proton transfer Where is that knowledge in the text? –Wanted: Clear, declarative statements –Got: obscure/missing/complex/indirect Where is that knowledge in the Halo KB? –Wanted: Modular, constructed from general pieces –Got: buried in procedures and code Very hard to ablate or extend –Suggestions for a better KB structure

8 (every Compute-Conjugate-Acid has (input ((a Chemical with (plays ((a Base-Role)))))) (parent_formula ((the term of (the nested-atomic-chemical-formula of (the has-basic-structural-unit of (the input of Self)))))) (target-unit ((if (the parent_formula of Self) then (:set (#'(LAMBDA () (GET-CONJUGATE-ACID-ATOMIC-FORMULA-BACK (KM0 '(|the| |parent_formula| |of| |Self|))))))))) (output ((if (oneof (the input of Self) where (It isa H2O-Substance)) then (a H3O-Plus-Substance) else ((forall (allof2 (the target-unit of Self) where ((not (It2 = (the parent_formula of Self))))) (the output of (a Identify-Chemical with (input ((a Chemical with (has-basic-structural-unit ((the output of (a Identify-Chemical-Entity with (input ((a Chemical-Entity with (nested-atomic-chemical-formula ((a Chemical-Formula with (term (It))))))))))))))))))))))) ? “An acid = a base + a proton”

9 (every Acid-Role has (intensity ( (a Intensity-Value with (value ( (:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or ((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else (if (((the played-by of Self) isa H3PO4-Substance) or ((the played-by of Self) isa HF-Substance)or ((the played-by of Self) isa HC2H3O2-Substance) or ((the played-by of Self) isa H2CO3-Substance)or Relative strengths of different acids

10 Two CPL versions: (i) close to text (ii) close to inference –Predictable performance Discussion of “bridging the gap” Recap: March 2006 IF there is an equation of a reaction AND a first chemical entity has a chemical formula AND a second chemical entity has a second chemical formula AND the first chemical formula is part of the left side of the equation ….. THEN the direction of the reaction is right AND the equilibrium side of the reaction is right. Manually bridging the “gap”

11 Inference-Supporting CPL: Predictable Performance Conjugate pairs Relative strengths Labelling acid/bases in a reaction Computing direction of the reaction Giant KM procedure for formula manipulation Qualitative absolute strengths (strong/weak/negligible) + qualitative comparison Giant KM procedure for reaction manipulation KM rule TaskHalo KB Lookup table Relative strength assertions if-then rule using conjugate pairs if-then rule CPL More general ≈ ≈ (equivalent)

12 Questions and Tasks from Last Time Analysis of “the gap” –What is the nature of the gap? –Can we characterize it? –Can we quantify it? AP chemistry vs. grade-school biology –How does the gap look in different texts? Domains? –What are the fundamental problems? –How severe are they? –How might they be overcome? Case Studies –Given text/naïve CPL formulation A Inference-capable target B –What knowledge is needed to get from A to B? –How much can be pump-primed, how much bootstrapped?

13 I: Understanding Language Knowledge Integration Introspection Natural Language Processing Test Generation This seedling

14 Natural and Controlled Languages Where is Reading to Learn/Mobius’s Achilles’ heel? –Schubert: “Dealing with real natural language” –Not (just) the grammatical complexity –It is the imprecision, messiness, incompleteness, and erroneous nature of real language Two styles of CPL usage: (i) As a declarative rule language (ii) As grammatically simpler real language Worked with both within this Seedling (i) does inference, but is far from original text (ii) is close to the text, but barely supports inference

15 (i) CPL as a declarative rule language “IF a first chemical is stronger than a second chemical AND the second chemical is stronger than a third chemical THEN the first chemical is stronger than the third chemical.” “IF there is an equation of a reaction AND a first chemical entity has a chemical formula AND a second chemical entity has a second chemical formula AND the first chemical formula is part of the left side of the equation AND the second chemical formula is part of the right side of the equation AND the first chemical entity is playing a base role AND the second chemical entity is playing a base role AND the first chemical entity is stronger than the second chemical entity THEN the direction of the reaction is right AND the equilibrium side of the reaction is right.”

16 (ii) CPL as grammatically simpler real language Acids have a sour taste. Acids cause some dyes to change color. Bases have a bitter taste. Bases have a slippery feel. All acids contain hydrogen. 37 percent of the mass of concentrated hydrochloric acid is HCl. The concentration of HCl in concentrated hydrochloric acid is 12 M. HCl reacts with NH3 without an aqueous solution. The reaction transfers a proton from an HCl molecule to an NH3 molecule. The "HX" in Equation 16.6 donates a proton. The donating leaves behind an X-minus ion. The X-minus ion plays a Bronsted-Lowry base in the reverse reaction. The H2O molecule in Equation 16.6 accepts a proton. The accepting produces an H3O-plus ion.

17 Two Paths from Language to Logic… Declarative CPL rules Inference- supporting Representation “The Knowledge Gap” Real Text Real(istic) CPL Text Literal/messy logic representation

18 “Israel’s Problem” Real(istic) CPL Text Inference- supporting Representation “The Knowledge Gap” Real Text Literal/messy logic representation Assume a perfect algorithm for English to (literal-like) logic. Are you done? Declarative CPL rules

19 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

20 An Analysis of the Gap What is the nature of the gap? Can we characterize it? Can we quantify it? How does the gap look in different texts? Domains? What are the fundamental problems? How severe are they? How might they be overcome?

21 Analysis Looked at these phenomena in two sets of text –5 target pages of AP chemistry –5 pages of grade-school level biology from the Web, about the heart and its function Categorization of main causes Loose quantification of their frequency

22 9 Fundamental Causes of the Gap 1.Many idiomatic words/phrases, each requiring a theory 2.Some knowledge is taught by example 3.Much important knowledge is conveyed by diagrams and tables 4.Generic sentences are ubiquitous 5.Some text teaches problem-solving knowledge 6.Discourse context is important (need sentence context) 7.Many sentences pose major representational challenges 8.Math/Algebraic models are extremely challenging 9.Text is full of ambiguity, metaphor and metonymy/loosespeak

23 1. Idiomatic/special-purpose words/phrases Many words/phrases require special interpretation –Breadth requirement is very challenging! 70% in chem, 40% in bio –Chemistry “The reaction favors transfer of…” “From the earliest days of experimental chemistry…” “The ion, however, more closely represents reality” “When we closely examine the reaction…” “According to their definition…” –Biology “This is important for the cells to do their work.” “On its way back to the heart…” “The right-side pumps stale blood…” “to smaller and smaller branched tubes…”

24 2. Examples Examples play a key role in human teaching How important are these for a machine? –Consolidation, verification, disambiguation? 35% chem, <5% bio

25 3. Diagrams and Tables “Teaching” how to compute conjugate acid/base pairs Relative strengths of acids 10% in chemistry but key ones!!! Incidental in bio. Show-stopper for some needed knowledge

26 4. Generics Reference to a collection rather than individual object Ubiquitous! 90% chemistry, 95% biology –Chemistry “Acids cause certain dyes to change color” “Acids have a bitter taste” “A substance that is …. is called amphoteric” –Biology “The blood leaving the aorta is full of oxygen” “Veins have thin walls” “The heart pumps blood to your lungs”

27 Why are generics hard? Quantification “Acids contain hydrogen.” Fuzzy quantifiers “An HO 3 + ion sometimes reacts with three H 2 0 molecules” Presuppositions “HCl dissolves in water.” “Acids cause some dyes to change color” “Acid irritates the skin” Need background knowledge! IF an acid touches some skin THEN that skin is irritated” or more generally “IF acid + skin are related in way where irritation may plausibly occur… THEN it will occur.”

28 5. Needing/Teaching Problem-Solving Knowledge Problem-solving knowledge –Chemistry (20%) biology (<5%) Worse, is often not even explicit in the text, e.g.:

29 6. Discourse Context Can we take sentences in isolation? (“bag of lines”) Obstacles: –Pronoun resolution (30% chem, 50% bio) –Context: unqualified compound nouns (most) “Every [Bronsted-Lowry] acid has a conjugate [BL] base” “The [human] heart…The [human] arteries…” –Other dependencies (15% chem, <5% bio) “Therefore, HX is the Bronsted-Lowry acid” “The other conjugate acids are HS -, PH 3 and CO 3 2- ”

30 Discourse Context (Biology) Sentences stand on their own more often, e.g.,

31 7. Major Representational Challenges Hard to quantify: ~70% chem, ~40% bio Potentiality: –an acid is a substance (molecule or ion) that can donate a proton to another substance. Likewise, a base is a substance that can accept a proton.” Conveying a proof: Imprecision and comparatives: –“About 37% by mass” –“Interacts strongly” –“The aorta is the largest artery in the body”

32 8. Math/Algebraic models ~65% chemistry use or manipulate formulae “NaOH dissociates into Na + and OH - ions.” “An H + ion is simply a proton with no electrons” “HX and X - differ only in the presence of a proton” Challenges –Relating the symbol system to the real world –Defining and apply operations on the symbol system –Relating those operations to the real world

33 Math/Algebraic models (cont) Minimal in grade-school biology –nearest is rates and measures “the heart contracts 70 times a minute” “The plasma is 95% water and the other 5% of dissolved substances” “In an adult’s body there is 10.6 pints of blood”

34 9. Loosespeak (metaphor, metonymy, etc) –Where a “literal interpretion” is incorrect Ignoring overgenerality –In these texts, 30% chem, 10% bio Probably both higher in general –Chemistry The molecule, substance, symbol distinction – Huge! Accounts for ~50% of the complexity of Halo KB. In other texts (not this one) metaphor also used often basic-unit “HC 2 H 3 O 2 (aq)+…C 2 H 3 O 2 - ” formula

35 Loosespeak (metaphor, metonymy, etc) Biology: metaphor more common “Your heart’s job is to pump blood” “Blood delivers oxygen…On the return trip, the blood picks up waste products”

36 Loosespeak (metaphor, metonymy, etc) Analysis by Univ Texas at Austin (chemistry) –Loosespeak is everywhere!

37 Relative Frequency of Phenomena

38 Relative frequency of phenomena AP Chemistry (5 pages)Grade-school biology (5 pages)

39 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

40 Case Study 1 of the Gap AP Level Chemistry

41 Some acids are better proton donors than other acids. Some bases are better proton acceptors than other bases. The conjugate base of a strong proton donor is a weak proton acceptor. The conjugate acid of a strong proton acceptor is a weak proton donor. A stronger acid has a weaker conjugate base. A stronger base has a weaker conjugate acid. A stronger acid is a better proton donor. A stronger base is a better proton acceptor. Original English CPL (like) How do we bridge the gap?

42 From Original English to CPL - 1 Resolve “others” to mean “other acids/bases” Use “likewise” to guide a parallel construction Need to represent “some,” “other,” “better” Assumes a scale of ability to donate/accept

43 From Original English to CPL – 2a Need to interpret “If we do X, we find that Y” as a mental exercise that draws a conclusion Need to have a concept of an ordering based on some ability (to donate a proton) Resolve “their ability” back to types of acids Resolve quantification – one proton per instance of acid molecule

44 From Original English to CPL – 2b Here “a substance” means an acid molecule Need to handle jumps between substance-level and molecule-level references Need to interpret “the more readily an A does B, the less readily a C does D” Need a model of two qualitative scales of ability, with an inverse relationship Resolve “its conjugate base” back to the acid

45 From Original English to CPL – 3 “Similarly” is a cue for a parallel construction Other issues are the same as in the previous sentence (inverse qualitative scales)

46 From Original English to CPL – 4a “In other words” is a cue for another view of the same knowledge in the previous sentence “the more readily an acid gives up a proton” = “the stronger an acid” Related qualitative scales again “the stronger an acid” is special syntax

47 From Original English to CPL – 4b Semicolon here denotes parallel constructions This is also another view of the same knowledge in the previous sentence “the more readily a base accepts a proton” = “the stronger a base” Inverse qualitative scales again

48 Overall Interpretation (sketch) Acid readily gives up a proton Acid strength Conjugate base readily accepts a proton Conjugate base strength “In other words”: “Similarly”: replace acid with base, replace conjugate base with conjugate acid inverse parallel

49 From Original English to Inference- Supporting Logic: Knowledge Requirements Discourse Knowledge: –Pragmatic knowledge for pronoun resolution –Ability to recognize and match parallel constructions E.g., with cue words Both within and across sentences –Ability to recognize a mental exercise (“if we do …”) Domain Knowledge: –Models of qualitative scales and relationships between two scales –Knowledge to handle substance/molecule metonymy –Models of abilities & give/receive –World knowledge to help resolve quantification e.g., one proton per molecule makes most sense

50 Case Study 2 of the Gap Grade-School Biology

51 Grade-school Biology Searched the Web, found 4 simple texts about the human heart and its function They are much simpler than our college chemistry text, but still exhibit lots of interpretation issues Only a few sentences from each text happened to be in pure CPL syntax By the time science is taught in school, the students are beyond the Dick & Jane reading level

52 Grade-school Biology Syntax - 1 Pronouns are everywhere –“Your heart is divided into two sides.” [anyone’s heart] Dependent clauses are common –“As blood begins to circulate, it leaves the heart …” –“… fresh oxygen that we have inhaled …” Conjunctions appear between various expressions –“… the vessels and the muscles that help and control …” –“Lizards don’t have hair or feathers … and can’t sweat …” Comparatives are common –“The tubes that more gently drain back to the heart …” Approximations are common –“… some 70 or so times a minute at rest …”

53 Grade-school Biology Syntax - 2 Negatives are sometimes used –They do not work on their own, but together as a team.” Phrases often modify other terms –“The blood leaving the aorta is full of oxygen.” –“On its way back to the heart, the blood travels …” Infinitives are sometimes used –“This is important for the cells … to do their work.” Parenthetical expressions are sometimes inserted –“… the carbon dioxide (a waste product) is removed...” –“… times a minute – more if you are exercising – and …”

54 Grade-school Biology Syntax - 3 Rhetorical questions to the reader –“Did you know that your heart is the strongest muscle?” Modals are sometimes used –“… so that your body can get rid of them.” –“… your blood vessels could circle the globe 2 ½ times!” Phrases about what something is called –“… a colorless liquid called plasma.” Omitted words –“… the other two [cavities] are called ventricles …” Adverbs, complex phrases, and other minor issues

55 Grade-school Biology Semantics Analyzed sample grade-school biology texts about the heart and circulation What commonsense knowledge is needed to correctly understand the text? –What pump-primed models would be needed? –What underlying knowledge could come from bootstrapping? As from tuple extraction from general texts

56 Rhetorical question – skip “Did you know that” “your heart” = a person’s heart (anatomy context) “strongest muscle” [in same body] (anatomy context) Build in pragmatics of reading for an anatomy context Knowledge: basic anatomy (bootstrapped)

57 “divided into” = partitioned (word sense for anatomy) “two sides” = two compartments (anatomy/container) Knowledge: Container/compartments model (pump-primed)

58 “right side” = [of the heart] (model of left/right parts) “pumps blood” = continuous process (anatomy) “to your lungs” could mean it fills up the lungs! what is “it”? – right side, or blood, or lungs? “picks up” = metaphor for absorbs (anatomy context) Knowledge: Containers, pumps, liquids (pump-primed)

59 “left side” = [of the heart] “oxygen-soaked blood” – but a liquid is already wet – Would like a model of blood cells, soaked in oxygen (fluid) – Not provided here, so just assume blood absorbed oxygen – Resolves previous sentence: pronoun “it” = blood Knowledge: model of left/right parts (pump-primed) “out” - liquid flow in & out of containers (pump-pr.)

60 “They” = the two sides of the heart (difficult) Rely on discourse pragmatics Knowledge: “work on their own” vs. “together as a team” Doing something alone vs. cooperating in an effort

61 “The body’s blood” = all its blood as a single blob Knowledge: “circulated through” - model of closed fluid circulation “1,000 times per day”- model of repeated events per time period

62 “five and six thousand” = 5 ≤ x ≤ 6,000? Use pragmatics to get: 5,000 ≤ x ≤ 6,000 “pumped each day” -- by which side? Or both sides? Could pose question: How much blood does a body contain? – 5 to 6 quarts (inference needed) Knowledge: Fluid flow, iteration, time periods

63 “your fist” -- interesting object, involves a pose Knowledge: “about the same size as” – model of comparative sizes

64 Summary of Biology Semantics Pragmatics for an anatomy context Pump-primed models: –Container & compartments & left/right parts –Continuously repeated biological events –Pumps & liquids & closed circulation –Working together vs. alone –Body parts in poses & comparative sizes Bootstrapped models: –Basic anatomy Some difficult pronoun resolutions

65 Grade-School Biology Conclusions Lots of pump-primed knowledge needed Bootstrapped knowledge can help Even grade-school texts have significant challenges Pragmatics need to be built in to NLP engine Is still substantially easier than AP chemistry!

66 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

67 Dimensions of Difficulty Complexity of Knowledge Educational Level of Text Grade-schoolCollegeElementary Grade-school biology AP Chemistry

68 Two Dimensions of Difficulty Dimension 1: Domain Chemistry (hardest) –Algebraic manipulation, chaining, procedures –Not so much “common sense” Physics –Map situations onto a few equations Biology (easiest) –Memorize and compare structures and functions

69 Two Dimensions of Difficulty Dimension 2: Educational Level College level (hardest): –Sophisticated writing styles –Often includes mathematical abstractions –Attempts to challenge the student –Problem-solving Grade-school level (easier): –Simpler sentence structures –Teaches common world knowledge –No/little mathematics –Learning basic facts

70 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

71 II: Integrating Knowledge Knowledge Integration Introspection Natural Language Processing Test Generation This seedling

72 Knowledge Integration: Principles for an Extensible KB The Halo KB was not easily extensible What should it have looked like?

73 Five Principles for an Extensible KB 1. Need Metonymy-Tolerant Repns The precision that logic requires of our written representations is a fundamental barrier to robustness IF “the acid on the left” is stronger than “the acid on the right” THEN the reaction direction is “to the right” “the acid denoted by the formula on the left side of the equation of the reaction” Alternative: –Preserve metonymy in the KB –Have it resolved at reasoning time

74 (every Compare-Relative-Strengths-of-Acids has (output ((if (((the1 of (the value of (the intensity of (the Acid-Role plays of (the first of (the input of Self)))))) = *strong) and ((the1 of (the value of (the intensity of (the Acid-Role plays of (the second of (the input of Self)))))) /= *strong)) then (the first of (the input of Self))))) (every Compare-Relative-Strengths-of-Acids has (output ((if ((the intensity of (the first of (the Chemicals)) = *strong) and ((the intensity of (the second of (the Chemicals)) /= *strong) then (the strongest of (the Chemicals)) = (the first of (the Chemicals))))) 1. Metonymy-Tolerant Repns (cont) if we had a metonymy-tolerant reasoner, we could instead write…

75 1. Metonymy-tolerance: Need Background Knowledge! Mixing chemical, molecular, and formula views Need background K to untangle the mess basic-unit “HC 2 H 3 O 2 (aq)+…C 2 H 3 O 2 - ” formula Note the fluidity of reference in written English!!!

76 2. Need to Separate Declarative and Procedural Knowledge input: a Base-Chemical output: convert Chemical → Molecule → Formula, append “H”, then → Molecule’ → Acid-Chemical Procedural: (Conjugate-Acid calculation) Declarative: Acid-Chemical = Base-Chemical + H + constraint reasoner to solve constraints

77 2. Need to Separate Declarative and Procedural Knowledge (cont) “Every acid has a conjugate base, formed by removing a proton from the acid.... Similarly, every base has associated with it a conjugate acid, formed by adding a proton to the base.” Acid-Chemical = Base-Chemical + H The English text often doesn’t help…

78 3. Syntactic Organization Matters! Elaboration tolerance: –Add/modify knowledge (semantics) by (only) adding formulae (syntactics) (every Acid-Role has (intensity ( (a Intensity-Value with (value ( (:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or ((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else Not elaboration-tolerant

79 3. Syntactic Organization Matters! Better…. intensity(HCl-Substance, *strong) intensity(HBr-Substance, *strong) intensity(HI-Substance, *strong) intensity(HClO3-Substance, *strong) intensity(HClO4-Substance, *strong) intensity(H2SO4-Substance, *strong) intensity(HNO3-Substance, *strong) … intensity(HF-Substance, *weak) intensity(HC2H3O2-Substance, *weak) intensity(H2CO3-Substance, *weak) … Elaboration-tolerant

80 4. Use a linguistically motivated ontology Key: mapping from English words/phrases to knowledge-base concepts Good: Words and concepts match easily: Less good: Linguistic concepts are missing Even worse: Different conceptual view in the KB HCl-Substance ↔ “HCl” Easy Direction of equilibrium: Attached to reaction, not eqn, in KB *strong/*weak/*negligible ↔ “HCl is stronger than H 2 O”

81 4. Use a linguistically motivated ontology Key: mapping from English words/phrases to concepts Good: Words and concepts match easily: –HCl-Substance ↔ “HCl” Less good: Linguistic concepts are missing –strong/weak/negligible↔“HCl is stronger than H 2 O” Even worse: Different conceptual view in the KB –Direction of equilibrium: Attached to reaction, not eqn, in KB

82 5. Need Error-Tolerant Reasoning KM can go belly-up with a contradiction Rather need to detect and correct contradictions –Detect: explore (ruminate), not just myopic backchaining richer background knowledge –Correct: reasoner supports suspension of assumptions/rules (TMS?) search mechanism to control this

83 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

84 Knowledge Mining There is a largely untapped source of general knowledge in texts, lying at a level beneath the explicit assertional content, and which can be harnessed. “The camouflaged helicopter landed near the embassy.”  helicopters can land  helicopters can be camouflaged Schubert’s Conjecture: Our attempt: “lightweight” LFs generated from Reuters LF forms: (S subject verb object (prep noun) (prep noun) …) (NN noun … noun) (AN adj noun)

85 Knowledge Mining HUTCHINSON SEES HIGHER PAYOUT. HONG KONG. Mar 2. Li said Hong Kong’s property market remains strong while its economy is performing better than forecast. Hong Kong Electric reorganized and will spin off its non-electricity related activities. Hongkong Electric shareholders will receive one share in the new subsidiary for every owned share in the sold company. Li said the decision to spin off … Newswire Article Shareholders may receive shares. Companies may be sold. Shares may be owned. Implicit, tacit knowledge

86 Knowledge Mining – our attempt ;; Atoms can combine (S "atom" "combine") ;; For example, combustion reactions are redox reactions because elemental oxygen is converted to compounds of oxygen (Section 3.2). (S "reaction" "be" "reaction") (S-ADJ "oxygen" "converted" ("to" "compound")) (AN "elemental" "oxygen") ;; Plan: Metals react with acids to form salts and gas. (S "metal" "react" (PP "with" "acid")) ;; Extensive oxidation can lead to the failure of metal machinery parts or the deterioration of metal structures. (S "oxidation" "lead" (PP "to" "failure")) (S "oxidation" "lead" (PP "to" "deterioration")) (AN "extensive" "oxidation") Fragment of the raw data (Brown & Lemay)

87

88 Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary

89 Summary: Overall Findings and Products CPL: two formulations –"naive CPL": 275 sentences –rule-language CPL: ~15 complex rules CPL language interpretation algorithm Understanding Language –Characterization and quantification of the main challenges –Detailed case studies on the five pages Integrating Knowledge –Characterization of the main challenges –Set of principles for overcoming them –Study and algorithms for some of them Bridging the Gap: Useful conceptual framework Text Mining –2 tuple databases: 15k chemistry, 25k biology

90 Summary: Recommendations for Mobius Significant work needed on –math/symbol manipulation –handling generics –idiomatic words/phrases –Loosespeak Cycle, not just bottom-up/top-down! Discourse structure needs to be taken seriously –Not just individual sentences Need some radical KB changes –extensible units of knowledge, not intertwined structures –Error-tolerant/Robust reasoning


Download ppt "Reading to Learn Q4 Review Peter Clark John Thompson Phil Harrison Bill Murray."

Similar presentations


Ads by Google