Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing.

Similar presentations


Presentation on theme: "Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing."— Presentation transcript:

1 Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

2 2 Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Outline of Talk Introduction to Knowledge Tracing – History – Intuition – Model – Demo – Variations (and other models) – Evaluations (baker work / kdd) Random Forests – Description – Evaluations (kdd) Time left? – Vote on next topic

3 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos History Introduced in 1995 (Corbett & Anderson, UMUAI) Basked on ACT-R theory of skill knowledge (Anderson 1993) Computations based on a variation of Bayesian calculations proposed in 1972 (Atkinson)

4 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Intuition Based on the idea that practice on a skill leads to mastery of that skill Has four parameters used to describe student performance Relies on a KC model Tracks student knowledge over time

5 Given a student’s response sequence 1 to n, predict n ? For some Skill K: Chronological response sequence for student Y [ 0 = Incorrect response 1 = Correct response] 1 …. n n+1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

6 Track knowledge over time (model of learning) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

7 Knowledge Tracing (KT) can be represented as a simple HMM Latent Observed Node representations K = Knowledge node Q = Question node Node states K = Two state (0 or 1) Q = Two state (0 or 1) UMAP Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

8 Four parameters of the KT model: P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) P(G) P(S) Probability of forgetting assumed to be zero (fixed) 8 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

9 Formulas for inference and prediction Derivation (Reye, JAIED 2004): Formulas use Bayes Theorem to make inferences about latent variable Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

10 00111 Model Training Step - Values of parameters P(T), P(G), P(S) & P(L 0 ) used to predict student responses Ad-hoc values could be used but will likely not be the best fitting Goal: find a set of values for the parameters that minimizes prediction error Student A Student B Student C 0 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Model Training:

11 Model Tracing Step – Skill: Subtraction Student’s last three responses to Subtraction questions (in the Unit) Test set questions Latent (knowledge) Observable (responses) 10% 45% 75% 79% 83% 71% 74% P(K) P(Q) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Model Prediction:

12 Influence of parameter values P(L 0 ): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 Student reached 95% probability of knowledge After 4 th opportunity Estimate of knowledge for student with response sequence: Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

13 Estimate of knowledge for student with response sequence: P(L 0 ): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09 P(L 0 ): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03 Student reached 95% probability of knowledge After 8 th opportunity Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Influence of parameter values

14 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos ( Demo )

15 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Variations on Knowledge Tracing (and other models)

16 Prior Individualization Approach Do all students enter a lesson with the same background knowledge? Node representations K = Knowledge node Q = Question node S = Student node Node states K = Two state (0 or 1) Q = Two state (0 or 1) S = Multi state (1 to N) P(L 0 |S) Observed Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

17 Conditional Probability Table of Student node and Individualized Prior node P(L 0 |S) S valueP(S=value) 11/N 2 3 N CPT of Student node CPT of observed student node is fixed Possible to have S value for every student ID Raises initialization issue (where do these prior values come from?) S value can represent a cluster or type of student instead of ID Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

18 Conditional Probability Table of Student node and Individualized Prior node P(L 0 |S) S valueP(L 0 |S) N0.92 CPT of Individualized Prior node Individualized L 0 values need to be seeded This CPT can be fixed or the values can be learned Fixing this CPT and seeding it with values based on a student’s first response can be an effective strategy This model, that only individualizes L 0, the Prior Per Student (PPS) model P(L 0 |S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

19 Conditional Probability Table of Student node and Individualized Prior node P(L 0 |S) S valueP(L 0 |S) CPT of Individualized Prior node Bootstrapping prior If a student answers incorrectly on the first question, she gets a low prior If a student answers correctly on the first question, she gets a higher prior P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

20 What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) CPT of Individualized Prior node What values to use for the two priors? P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

21 What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) CPT of Individualized Prior node 1.Use ad-hoc values P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

22 What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) 0EM 1 CPT of Individualized Prior node 1.Use ad-hoc values 2.Learn the values P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

23 What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) 0Slip 11-Guess CPT of Individualized Prior node 1.Use ad-hoc values 2.Learn the values 3.Link with the guess/slip CPT P(L 0 |S) 1 1 Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

24 What values to use for the two priors? P(L 0 |S) S valueP(L 0 |S) 0Slip 11-Guess CPT of Individualized Prior node 1.Use ad-hoc values 2.Learn the values 3.Link with the guess/slip CPT P(L 0 |S) 1 1 With ASSISTments, PPS (ad-hoc) achieved an R 2 of (0.176 with KT) (Pardos & Heffernan, UMAP 2010) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Prior Individualization Approach

25 UMAP Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Variations on Knowledge Tracing (and other models)

26 P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Baker et al., 2010) BKT-BF Learns values for these parameters by performing a grid search (0.01 granularity) and chooses the set of parameters with the best squared error... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

27 P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Chang et al., 2006) BKT-EM Learns values for these parameters with Expectation Maximization (EM). Maximizes the log likelihood fit to the data... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

28 P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Baker, Corbett, & Aleven, 2008) BKT-CGS Guess and slip parameters are assessed contextually using a regression on features generated from student performance in the tutor... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

29 P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Baker, Corbett, & Aleven, 2008) BKT-CSlip Uses the student’s averaged contextual Slip parameter learned across all incorrect actions.... P(G) P(S) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

30 P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Nooraiei et al, 2011) BKT-LessData Limits students response sequence length to the most recent 15 during EM training.... P(G) P(S) Most recent 15 responses used (max) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

31 P(L 0 ) = Probability of initial knowledge P(T) = Probability of learning P(G) = Probability of guess P(S) = Probability of slip UMAP 2011 P(L 0 ) P(T) (Pardos & Heffernan, 2010) BKT-PPS Prior per student (PPS) model which individualizes the prior parameter. Students are assigned a prior based on their response to the first question.... P(G) P(S) P(L 0 |S) Observed Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

32 UMAP CFAR Correct on First Attempt Rate (CFAR) calculates the student’s percent correct on the current skill up until the question being predicted. Student responses for Skill X: _ Predicted next response would be 0.50 (Yu et al., 2010) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

33 UMAP Tabling Uses the student’s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set Training set Student A: Student B: Student C: Predicted next response would be 0.66 Test set student: _ Max table length set to 3: Table size was =15 (Wang et al., 2011) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

34 UMAP PFA Performance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student’s prior failures and successes on the current skill. (Pavlik et al., 2009) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

35 Study Cognitive Tutor for Genetics – 76 CMU undergraduate students – 9 Skills (no multi-skill steps) – 23,706 problem solving attempts – 11,582 problem steps in the tutor – 152 average problem steps completed per student (SD=50) – Pre and post-tests were administered with this assignment Dataset Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Methodology Evaluation Intro to Knowledge Tracing

36 Study Predictions were made by the 9 models using a 5 fold cross-validation by student Methodology model in-tutor prediction Student 1Skill AResp Skill AResp 2 … Skill AResp N Student 1Skill BResp 1 … Skill BResp N BKT-BF BKT-EM … Actual Accuracy was calculated with A’ for each student. Those values were then averaged across students to report the model’s A’ (higher is better) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

37 Study Results in-tutor model prediction ModelA’ BKT-PPS BKT-BF BKT-EM BKT-LessData PFA Tabling BKT-CSlip CFAR BKT-CGS A’ results averaged across students Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

38 Study Results in-tutor model prediction ModelA’ BKT-PPS BKT-BF BKT-EM BKT-LessData PFA Tabling BKT-CSlip CFAR BKT-CGS A’ results averaged across students No significant differences within these BKT Significant differences between these BKT and PFA Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

39 Study 5 ensemble methods were used, trained with the same 5 fold cross-validation folds Methodology ensemble in-tutor prediction Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label. Student 1Skill AResp Skill AResp 2 … Skill AResp N Student 1Skill BResp 1 … Skill BResp N BKT-BF BKT-EM … Actual featureslabel Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

40 Study Ensemble methods used: 1.Linear regression with no feature selection (predictions bounded between {0,1}) 2.Linear regression with feature selection (stepwise regression) 3.Linear regression with only BKT-PPS & BKT-EM 4.Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip 5.Logistic regression Methodology ensemble in-tutor prediction Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

41 Study Results in-tutor ensemble prediction ModelA’ Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg with BKT-PPS & BKT-EM Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic without feature selection A’ results averaged across students Tabling No significant difference between ensembles Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

42 Study Results in-tutor ensemble & model prediction ModelA’ BKT-PPS Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg with BKT-PPS & BKT-EM BKT-BF BKT-EM Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic without feature selection BKT-LessData PFA Tabling BKT-CSlip CFAR BKT-CGS A’ results averaged across students Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

43 Study Results in-tutor ensemble & model prediction ModelA’ Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip Ensemble: LinReg without feature selection Ensemble: LinReg with feature selection (stepwise) Ensemble: Logistic regression without feature selection Ensemble: LinReg with BKT-PPS & BKT-EM BKT-EM BKT-BF BKT-PPS PFA BKT-LessData CFAR Tabling Contextual Slip BKT-CGS A’ results calculated across all actions Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

44 In the KDD Cup Motivation for trying non KT approach: – Bayesian method only uses KC, opportunity count and student as features. Much information is left unutilized. Another machine learning method is required Strategy: – Engineer additional features from the dataset and use Random Forests to train a model Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Random Forests

45 Strategy: – Create rich feature datasets that include features created from features not included in the test set Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

46 Created by Leo Breiman The method trains T number of separate decision tree classifiers (50-800) Each decision tree selects a random 1/P portion of the available features (1/3) The tree is grown until there are at least M observations in the leaf (1-100) When classifying unseen data, each tree votes on the class. The popular vote wins or an average of the votes (for regression) Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

47 Feature Importance Features extracted from training set: Student progress features (avg. importance: 1.67) – Number of data points [today, since the start of unit] – Number of correct responses out of the last [3, 5, 10] – Zscore sum for step duration, hint requests, incorrects – Skill specific version of all these features Percent correct features (avg. importance: 1.60) – % correct of unit, section, problem and step and total for each skill and also for each student (10 features) Student Modeling Approach features (avg. importance: 1.32) – The predicted probability of correct for the test row – The number of data points used in training the parameters – The final EM log likelihood fit of the parameters / data points Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

48 Features of the user were more important in Bridge to Algebra than Algebra Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

49 RankFeature setRMSECoverage 1All features % 2Percent correct % 3All features (fill) % RankFeature setRMSECoverage 1All features % 2All features (fill) % 3Percent correct % Algebra Bridge to Algebra Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

50 RankFeature setRMSECoverage 1All features % 2Percent correct % 3All features (fill) % RankFeature setRMSECoverage 1All features % 2All features (fill) % 3Percent correct % Algebra Bridge to Algebra Best Bridge to Algebra RMSE on the Leaderboard was Random Forest RMSE of here is exceptional Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

51 RankFeature setRMSECoverage 1All features % 2Percent correct % 3All features (fill) % RankFeature setRMSECoverage 1All features % 2All features (fill) % 3Percent correct % Algebra Bridge to Algebra Skill data for a student was not always available for each test row Because of this many skill related feature sets only had 92% coverage Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

52 Conclusion from KDD Combining user features with skill features was very powerful in both modeling and classification approaches Model tracing based predictions performed formidably against pure machine learning techniques Random Forests also performed very well on this educational data set compared to other approaches such as Neural Networks and SVMs. This method could significantly boost accuracy in other EDM datasets. Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

53 Hardware/Software Software – MATLAB used for all analysis Bayes Net Toolbox for Bayesian Networks Models Statistics Toolbox for Random Forests classifier – Perl used for pre-processing Hardware – Two rocks clusters used for skill model training 178 CPUs in total. Training of KT models took ~48 hours when utilizing all CPUs. – Two 32gig RAM systems for Random Forests RF models took ~16 hours to train with 800 trees Random Forests Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

54 Choose the next topic KT: 1-35 Prediction: Evaluation: sig tests: Regression/sig tests: Time left? Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos

55 UMAP Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos Individualize Everything?

56 Fully Individualized Model Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

57 Fully Individualized Model S identifies the student Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

58 Fully Individualized Model T contains the CPT lookup table of individual student learn rates Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

59 Fully Individualized Model P(T) is trained for each skill which gives a learn rate for: P(T|T=1) [high learner] and P(T|T=0) [low learner] Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)

60 SSI model results DatasetNew RMSEPrev RMSEImprovement Algebra Bridge to Algebra Average of Improvement is the difference between the 1 st and 3 rd place. It is also the difference between 3 rd and 4 th place. The difference between PPS and SSI are significant in each dataset at the P < 0.01 level (t-test of squared errors) Intro to Knowledge Tracing Bayesian Knowledge Tracing & Other ModelsPLSC Summer School 2011Zach Pardos (Pardos & Heffernan, JMLR 2011)


Download ppt "Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining Zachary A. Pardos PSLC Summer School 2011 Bayesian Knowledge Tracing."

Similar presentations


Ads by Google