Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Project Halo: Towards a Digital Aristotle April 30 th, 2003 Noah S Friedland, PhD.

Similar presentations


Presentation on theme: "Slide 1 Project Halo: Towards a Digital Aristotle April 30 th, 2003 Noah S Friedland, PhD."— Presentation transcript:

1 Slide 1 Project Halo: Towards a Digital Aristotle April 30 th, 2003 Noah S Friedland, PhD

2 Slide 2 Overview Project Halo is a staged research and development effort towards a digital Aristotle: An application capable of providing user appropriate answers and justifications for questions in an ever growing number of domains

3 Slide 3 Step One: The Halo Pilot Goals: To investigate the state-of-the-art in knowledge representation and reasoning (KRR), especially “deep reasoning” To identify leaders in the field and to get to know them well To quickly come up-to-speed on the algorithmic and technical issues critical for good program management To establish and “test run” an evaluation methodology To determine a roadmap for possible future research Complete scientific transparency within the program and ultimately with the entire scientific community Limited/tight timeframe (6 months)

4 Slide 4 Domain/Syllabus Selection 50 pages from the AP-chemistry syllabus (Stoichiometry, Reactions in aqueous solutions, Acid- Base equilibria) Small and self contained enough to be do-able in a short period of time Large enough to create many novel questions complex “deep” combinations of rules Standardize exam with well understood scores (AP1-AP5) Chemistry is an exact science, more “monotonic” No undo reliance on graphics (no free-body diagrams) Availability of experts for exam generation and grading

5 Slide 5 Team Selection Selective Call-For-Proposals Solid track record of relevant.gov and industrial funding Significant number of man-years invested in existing relevant technology World-class team Responsiveness of proposal to the CFP Bid within guidelines and expectations Ability to work within the Project Halo contractual environment Funded teams: Cycorp, SRI and Ontoprise

6 Slide 6 Climbing a Steep Hill Vulcan had little background in question answering prior to project Halo Hundreds of hours were dedicated to three rounds of training: General primer in AI Algorithmic training from each team Tools and Admin training from each team At the end, Vulcan was capable of encoding questions using each teams’ formal language

7 Slide 7 The Challenge Each team had four months to develop their chemistry question answering applications At the end of this time, the systems were sequestered and the exam was released Each team had two weeks to create formal encodings of the 100 (169 total subparts) questions in three sections (MC, DA, FF) Formal encodings were evaluated for fidelity against the original English by committee Encoded questions were run in batch on the sequestered systems, generating answers and justifications in English These answers were distributed to three SMEs for grading

8 Slide 8 The Systems: No NLP In The Pilot QA SystemNLP English FL English Answer & Justification

9 Slide 9 Metrics “Coverage”: the ability of the system to answer novel questions from the entire specified syllabus What percentage of the question types was the system capable of reliably answering? “Justification”: the ability to provide concise, user and domain appropriate explanations What percentage of the answer justifications was acceptable to domain evaluators? “Query encoding”: the ability to robustly represent queries Were questions encoded faithful to the original English? How sensitive were the systems to these encodings? “Brittleness”: the ability to describe, measure and defeat major sources of brittleness What were the major causes of failure? How can these be remedied?

10 Slide 10 Examples of Question Encodings MC2. When lithium metal is reacted with nitrogen gas, under proper conditions, the product is: (a) no reaction occurs (b) LiN (c) Li 2 N (d) Li 3 N (e) LiN 3

11 Slide 11 F-logic Encoding Encoded question m1:Reaction[hasReactants->>{"Li","N"};enforced->>TRUE]. answer("A") >X] and not equal(X,"LiN") and not equal(X,"Li2N") and not equal(X,"Li3N") and not equal(X,"LiN3"). answer("B") >"LiN"]. answer("C") >"Li2N"]. answer("D") >"Li3N"]. answer("E") >"LiN3"]. FORALL X <- answer(X)

12 Slide 12 KM Encoding (every QF2 has (context ((a Reaction with (raw-material ((a Chemical with (has-basic-structural-unit (((a Metal) & (an instance of (the output of (a Compute-Element-from- Name with (input ("Lithium"))))))))) (a Chemical with (has-basic-structural-unit (((a Molecular-Compound with (has-chemical-formula ((a Chemical-Formula with (term ((:seq (:pair 2 N))))))))))) (state ((a State-Value with (value (*gas)))))))))))

13 Slide 13 KM Encoding (Cont) (output ((forall (the atomic-chemical-formula of (the has-basic-structural-unit of (the result of (the context of Self)))) (if ((the elements of (the term of It)) = (:set (:pair 1 Li) (:pair 1 N))) then "(b) LiN" else (if ((the elements of (the term of It)) = (:set (:pair 2 Li) (:pair 1 N))) then "(c) Li2N" else (if ((the elements of (the term of It)) = (:set (:pair 3 Li) (:pair 1 N))) then "(d) Li3N" else (if ((the elements of (the term of It)) = (:set (:pair 1 Li) (:pair 3 N))) then "(e) LiN3" else "(a) no reaction occurs")))) (comm [QF2-output-1] Self)))))

14 Slide 14 CYCL Encoding (implies (and (chemicalReactants-TypeType ?REACTION Nitrogen) (chemicalReactants-TypeType ?REACTION (ElementalSubstanceFn Lithium)) (ionicDecomposition ?LI2N LithiumIon 2 NitrideIon 1) (ionicDecomposition ?LI3N LithiumIon 3 NitrideIon 1) (ionicDecomposition ?LIN LithiumIon 1 NitrideIon 1) (ionicDecomposition ?LIN3 LithiumIon 1 NitrideIon 3)) (thereExists ?COMPOUND (thereExists ?LI-NUM (thereExists ?N-NUM (thereExists ?LI-CHARGE (thereExists ?N-CHARGE (and

15 Slide 15 CYCL Encoding (Cont.) (relationAllInstance chargeOfObject LithiumIon (ElectronicCharge ?LI-CHARGE)) (relationAllInstance chargeOfObject NitrideIon (ElectronicCharge ?N-CHARGE)) (ionicDecomposition ?COMPOUND LithiumIon ?LI-NUM NitrideIon ?N-NUM) (evaluate 0 (PlusFn (TimesFn ?LI-CHARGE ?LI-NUM) (TimesFn ?N-CHARGE ?N-NUM))) (goodChoiceAmongSentences ?ANSWER (TheList (not (thereExists ?REACTION-2 (and (chemicalReactants-TypeType ?REACTION-2 (GaseousFn Nitrogen)) (chemicalReactants-TypeType ?REACTION-2 (ElementalSubstanceFn Lithium))))) (equals ?COMPOUND ?LIN) (equals ?COMPOUND ?LI2N) (equals ?COMPOUND ?LI3N) (equals ?COMPOUND ?LIN3))))))))))

16 Slide 16 Evaluating Encodings High fidelity encodings do not add or delete relevant chemical knowledge from the original English. The encoding committee reviewed all encodings to verify that they were all “high fidelity.” A second criterion was “automatability”, the likelihood encodings could be produced automatically from English, given today’s state-of-the-art.

17 Slide 17 Challenge Results All three teams produced challenge results All SMEs graded all the results Each question part got separate grades for answers and justifications The grade ranges for each question part were 0,.5 and 1 for answers and likewise for justifications Graders were given guidelines to be as “AP-like” as possible

18 Slide 18 Results: MC Section Features 50 multiple choice questions (MC1-MC50). MC3: sodium azide is used in air bags to rapidly produce gas to inflate the bag. The products of the decomposition reaction are: (a) Na and water; (b) Ammonia and sodium metal; (c) N2 and O2; (d) Sodium and nitrogen gas; (e) Sodium oxide and nitrogen gas.

19 Slide 19 MC Results

20 Slide 20 Results: DA Section Features 25 multi-part questions (DA1- DA25) DA1. Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination (a) C 4 H 10 + O 2  CO 2 + H 2 O (b) KClO 3  KCl + O 2 (c) CH 3 CH 2 OH + O 2  CO 2 + H 2 O (d) P 4 + O 2  P 2 O 5 (e) N 2 O 5 + H 2 O  HNO 3

21 Slide 21 DA Results

22 Slide 22 Results: FF Section Features 25 multi-part questions (FF1-FF25) More qualitative, less computational FF2. Pure water is a poor conductor of electricity, yet ordinary tap water is a good conductor. Account for this difference.

23 Slide 23 FF Results

24 Slide 24 Total Results

25 Slide 25 Grader Comments Organization and brevity were the two major remarks Some of the justifications were over 16 pages long Many of the arguments were used repetitively Proofs took a long time to “get to the point” In some multiple choice cases, proofs involve invalidating all wrong answers rather than proving the right one Generalized proofs relied on instance-based solutions, lack of meta-reasoning capability Gaps in the knowledge were evident, e.g. many of the teams had issues with net ionic equations

26 Slide 26 Brittleness Analysis: SRI

27 Slide 27 Brittleness Analysis: Cycorp

28 Slide 28 Brittleness Analysis: Ontoprise

29 Slide 29 Website Mockup: Main Page

30 Slide 30 Website Mockup: Failure Explanation

31 Slide 31 Performance Analysis SRICYCORPONTOPRISE 32.28mins1512mins33.61mins

32 Slide 32 Projections for the Next Iteration (3 Months) Same domain and scope: AP-5 for multiple choice (~85%) AP-4 for non-multiple choice (DA & FF) (~65%)

33 Slide 33 Observations Per-page encoding costs O($10K) for 50 pages Encoding took highly expert teams 2 weeks of effort SRI relied most heavily on professional chemists, most thorough on assembly process The Ontoprise platform was the fastest and most reliable (<2 hours). F-Logic was the most concise formal language SRI was >5 hours and Cycorp >12 hours Cycorp’s generative explanations were the most ambitious. Needed more domain expert feedback Previously stated metrics, like the number of concept and relations, do not provide insight into coverage

34 Slide 34 Next Steps: Phase II Building tools to allow domain experts to encode robust knowledge Building tools to allow students to pose questions/problems Currently in pre-CFP design Required skills: Knowledge Engines Knowledge Acquisition (against documents) HCI and Human Factors

35 Slide 35 Inquiries Check out new website: www.projecthalo.com Contact me at: noahf@vulcan.com


Download ppt "Slide 1 Project Halo: Towards a Digital Aristotle April 30 th, 2003 Noah S Friedland, PhD."

Similar presentations


Ads by Google