# CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish.

## Presentation on theme: "CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish."— Presentation transcript:

CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish Tiwari (SRI Intl.) Dr. Amey Karkare (IIT Kanpur)

Introduction  Aim to build an intelligent tutoring system targeted at the domain of Periodic Table (Chemistry)  Targeted at solving problems by emulating thought processes/lines of reasoning employed by students  Much more than a problem solver – aid learning by generating hints and intelligent problems

System Overview System divided into two components –  Natural Language Component  Translate natural language input to an intermediate logical representation  Problem Solving Component  Solve problems, generate hints and new problems of graded difficulty  More info: Problem Solving team

Natural Language Component Lexer Option Parsing Terms in logic Parser Tier 1 Domain information Parser Tier 2 Tokens Full logical representation Input Problem

An Example - Lexer  Which element in group 2 has the maximum metallic property?– i)Be ii)Mg iii)Ca iv)Sr Which element in Group 2 has the maximum metallic character?Group 2 has the maximum metallic character?2 has the maximum metallic character? maximum metallic character? metallic character? Group2MaxMetallicProperty

Parser – Tier 1 Group2MaxMetallicProperty Same Group 2 Hole \$1 Max Hole Metalli cPrope rty

Parsing Tier 2 Max Hole Same Group 2 Hole Max MetallicProp erty Same Group 2 \$1 Metalli cPrope rty \$1

Introduction of Variables  Implicit introduction of free variables needed to formulate a valid logical formula.  Example: Alkali metals belong to Group 1  Intelligently guess the requirement of a variable  Two situations:  Hole (of type elem) present. Not satisfied by tokens in unused list (even after replication)  Hole (of type elem) present. No tokens left in unused list. No original tokens replicated satisfy  Introduce a new variable!

Handling Quantifiers  Universal Quantifiers:  General scheme -  Existential Quantifiers:  General scheme -  Assumptions:  Quantification over a single variable  No nesting of quantifiers

Universal Quantification  Problems  Finding the position of implication  Finding the antecedent and consequent  Example – Alkali metals show metallic character Solution – ForAll(\$1, AlkaliMetal(\$1)  Metallic(\$1))  Position of implication ≈ Position of verb  Deciding the antecedent and consequent is more complicated

ForAll Resolution Algorithm  Active vs. Passive Voice (Stanford CoreNLP)  Alkali metals show metallic character  Metallic character is shown by alkali metals  Both have the same translation!

Assertion Based Questions  Assert facts  Pose questions  Span multiple sentences  Example - An element A forms covalent bond with oxygen. It has high electronegativity and belongs to group 13. What is its atomic number?  Problem – Anaphora Resolution!  Solution – Use Stanford CoreNLP to get coreference graph

Assertion Based Questions  Method for translating assertion based questions  Construct logical formula corresponding to sentence independently  Use coreference graph to find variables referring to the same entity  Construct the formula – A 1 (x) ∧ A 2 (x)… ∧ A n (x), where A i (x) = logical formula of i th sentence  Quantify over the free variable(s).  Typically ask about a single entity. Existential quantification suffices

Negations  Non-:  Which of the following non-metals is a gas at STP?  Couple non with the predicate immediately next to it  And(IsGasAtSTP(\$1), Not(Metallic(\$1)))  Not:  Not all alkali metals form basic oxides.  Negation of statement to the right of not  Not(ForAll(\$1, Implies(AlkaliMetal(\$1), BasicOxide(\$1))))

Negations  No:  No halogen is metallic in nature.  Natural interpretation of no as “there does not exist”  Not(Exists(\$1, And(Halogen(\$1),Metallic(\$1))))

Ranking Algorithm  Need to rank different representation trees generated  Heuristics  Greater cover  Greater confidence  Higher confidence to filling a hole with a token closer to its parent in the English sentence  Penalize when: Replicate tokens – Larger tokens  More penalization Insert handcrafted tokens – And, Or, Implies Unused tokens – Greater proportion of unused tokens  More penalization

Evaluation  Currently able to solve 70 out of the 126 problems collected from Tata McGraw Hill textbook for Grade XI  More problems can be solved by modeling of more chemistry-speciﬁc predicates.  This just corresponds to adding domain knowledge to our system  Another evaluation metrics could be the ratio of the number of rules encoded to the corpus size of problems solved.  We encode 173 predicates/entities/functions in our algorithm (out of which 118 are names of elements).

Conclusions  While contemporary works focus on analyzing languages by learning, we hypothesize that for a simpler structured domain like Chemistry, a much simpler type-theoretic approach armed with some heuristics observed from the domain can achieve similar, if not better, success.  During the later phase of the project, we tried to use some techniques of learning to improve upon our system and were successful in doing so.  In conclusion, we feel that a combination of such a type- theoretic approach and the standard machine learning techniques can achieve good success for a well structured domain like Chemistry.

Future Work  Disambiguate – At, As, In (names of elements)  1 = 1 st = first (Stanford CoreNLP NER Tool)  And(And(x,y),z) = And(And(x,z),y)  Model electronic configuration  Better modelling of conjunctions – “Alkali metals belong to group 1 and are metallic in nature”

Stanford CoreNLP  Collection of commonly used NLP tools – POS tagging, parsing, coreference analysis, NER  Problem – Integrating Java package with C#  Command line interface slow – needs to large load data models (17 secs per question!)  Solution - Query online demo  Get XML response  http://nlp.stanford.edu:8080/corenlp/ http://nlp.stanford.edu:8080/corenlp/

Thank You

Download ppt "CHEMISTRY STUDIO: AN INTELLIGENT TUTORING SYSTEM Ankit Kumar, Abhishek Kar, Ashish Gupta, Akshay Mittal Mentors: Dr. Sumit Gulwani (MSR, Redmond) Dr. Ashish."

Similar presentations