Presentation on theme: "CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish."— Presentation transcript:
CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish Tiwari (SRI Intl.) Dr. Amey Karkare (IIT Kanpur)
Introduction Aim to build an intelligent tutoring system targeted at the domain of Periodic Table (Chemistry) Targeted at solving problems by emulating thought processes/lines of reasoning employed by students Much more than a problem solver – aid learning by generating hints and intelligent problems
System Overview System divided into two components – Natural Language Component Translate natural language input to an intermediate logical representation Problem Solving Component Solve problems, generate hints and new problems of graded difficulty More info: Problem Solving team
Natural Language Component Lexer Option Parsing Terms in logic Parser Tier 1 Domain information Parser Tier 2 Tokens Full logical representation Input Problem
An Example - Lexer Which element in group 2 has the maximum metallic property?– i)Be ii)Mg iii)Ca iv)Sr Which element in Group 2 has the maximum metallic character?Group 2 has the maximum metallic character?2 has the maximum metallic character? maximum metallic character? metallic character? Group2MaxMetallicProperty
Parser – Tier 1 Group2MaxMetallicProperty Same Group 2 Hole $1 Max Hole Metalli cPrope rty
Parsing Tier 2 Max Hole Same Group 2 Hole Max MetallicProp erty Same Group 2 $1 Metalli cPrope rty $1
Introduction of Variables Implicit introduction of free variables needed to formulate a valid logical formula. Example: Alkali metals belong to Group 1 Intelligently guess the requirement of a variable Two situations: Hole (of type elem) present. Not satisfied by tokens in unused list (even after replication) Hole (of type elem) present. No tokens left in unused list. No original tokens replicated satisfy Introduce a new variable!
Handling Quantifiers Universal Quantifiers: General scheme - Existential Quantifiers: General scheme - Assumptions: Quantification over a single variable No nesting of quantifiers
Universal Quantification Problems Finding the position of implication Finding the antecedent and consequent Example – Alkali metals show metallic character Solution – ForAll($1, AlkaliMetal($1) Metallic($1)) Position of implication ≈ Position of verb Deciding the antecedent and consequent is more complicated
ForAll Resolution Algorithm Active vs. Passive Voice (Stanford CoreNLP) Alkali metals show metallic character Metallic character is shown by alkali metals Both have the same translation!
Assertion Based Questions Assert facts Pose questions Span multiple sentences Example - An element A forms covalent bond with oxygen. It has high electronegativity and belongs to group 13. What is its atomic number? Problem – Anaphora Resolution! Solution – Use Stanford CoreNLP to get coreference graph
Assertion Based Questions Method for translating assertion based questions Construct logical formula corresponding to sentence independently Use coreference graph to find variables referring to the same entity Construct the formula – A 1 (x) ∧ A 2 (x)… ∧ A n (x), where A i (x) = logical formula of i th sentence Quantify over the free variable(s). Typically ask about a single entity. Existential quantification suffices
Negations Non-: Which of the following non-metals is a gas at STP? Couple non with the predicate immediately next to it And(IsGasAtSTP($1), Not(Metallic($1))) Not: Not all alkali metals form basic oxides. Negation of statement to the right of not Not(ForAll($1, Implies(AlkaliMetal($1), BasicOxide($1))))
Negations No: No halogen is metallic in nature. Natural interpretation of no as “there does not exist” Not(Exists($1, And(Halogen($1),Metallic($1))))
Ranking Algorithm Need to rank different representation trees generated Heuristics Greater cover Greater confidence Higher confidence to filling a hole with a token closer to its parent in the English sentence Penalize when: Replicate tokens – Larger tokens More penalization Insert handcrafted tokens – And, Or, Implies Unused tokens – Greater proportion of unused tokens More penalization
Evaluation Currently able to solve 70 out of the 126 problems collected from Tata McGraw Hill textbook for Grade XI More problems can be solved by modeling of more chemistry-speciﬁc predicates. This just corresponds to adding domain knowledge to our system Another evaluation metrics could be the ratio of the number of rules encoded to the corpus size of problems solved. We encode 173 predicates/entities/functions in our algorithm (out of which 118 are names of elements).
Conclusions While contemporary works focus on analyzing languages by learning, we hypothesize that for a simpler structured domain like Chemistry, a much simpler type-theoretic approach armed with some heuristics observed from the domain can achieve similar, if not better, success. During the later phase of the project, we tried to use some techniques of learning to improve upon our system and were successful in doing so. In conclusion, we feel that a combination of such a type- theoretic approach and the standard machine learning techniques can achieve good success for a well structured domain like Chemistry.
Future Work Disambiguate – At, As, In (names of elements) 1 = 1 st = first (Stanford CoreNLP NER Tool) And(And(x,y),z) = And(And(x,z),y) Model electronic configuration Better modelling of conjunctions – “Alkali metals belong to group 1 and are metallic in nature”
Stanford CoreNLP Collection of commonly used NLP tools – POS tagging, parsing, coreference analysis, NER Problem – Integrating Java package with C# Command line interface slow – needs to large load data models (17 secs per question!) Solution - Query online demo Get XML response http://nlp.stanford.edu:8080/corenlp/ http://nlp.stanford.edu:8080/corenlp/