Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE
Motivation ◦BKT parameters are inferred from data ◦But best solution for a given data set may not quite match the parameters that actually generated it (sampling error) 0,0,0,0,0 0,1,1,0,1 0,1,0,0,0 0,0,1,1,0 5 students, 5 problems each, 25 bits of data prior = learning = guess = slip = parameters, 3 decimal digits each, 39.9 bits of data Not even possible for all parameter sets to be represented!
Questions ◦So how much data is needed for accurate estimates? ◦And do the parameter values affect how much you need? ◦Can we give confidence intervals for parameters?
Normal distribution over samples ◦Mean is almost always near true generating value ◦Standard deviation can be used to describe variation of estimates ◦Can use 68–95–99.7 rule for confidence intervals
Variation does depend on parameter values ◦Each parameter behaves differently ◦Best estimates for parameters near zero/one, worst in range
There are interactions between parameter values ◦Can’t just precompute a table of stddevs for each parameter ◦Complex relationship, analytical approach probably infeasible ◦But at least there is continuity with small rates of change
Sample size recommendations ◦Stddev proportional to 1/sqrt(n) ◦Must increase sample size by factor of 4 to improve error by factor of 2 ◦Small data sets (<1000 students) will not give even one sigfig in all parameters ◦Question systems based on small classes!
No interaction between sample size and parameters ◦Change sample size without changing parameters → predictable variation in error ◦Gives an approach to estimate error on real-world data sets: ◦Take samples with replacement, infer parameters for each, compute stddev ◦Scale using 1/sqrt(n) to estimate stddevs at other sample sizes
Knowledge Tracing for Interacting Student Pairs DERRICK COETZEE
Motivation ◦Standard Bayesian knowledge tracing uses fixed learning rate parameter to capture all learning
Motivation ◦One way to improve: use information on course materials viewed
Motivation ◦What about peer interaction (e.g. forums/chat)? ◦Not fixed/static like instructional materials ◦The level of knowledge of the other student is important ◦Use our BKT model of the other student’s knowledge!
Pair interaction scenario ◦Simple case of student interaction ◦Two students are paired and always interact between each item (no interactions with others) Do exercise Learn independently Interact with partner Do exercise Learn independently
Pair interaction scenario ◦Model independent learning and interaction stages
Pair interaction scenario ◦Model independent learning and interaction stages ◦New parameters: teach, mislead KnowsOther student knows Probability knows after interaction No 0 Yes 1 NoYesteach YesNo1−mislead
Results: Preliminary simulations ◦5-parameter system (prior, learn, guess, slip, teach) ◦forget, mislead parameters fixed at zero ◦Generate synthetic data, run EM from generating values ◦Same behavior as classic system when teach = 0 ◦Unstable when teach > 0 ◦Converges to trivial solution prior=learn=teach=1, slip=proportion incorrect responses ◦Occurs for both small and large teach parameters
Results: Preliminary simulations ◦4-parameter system (learn, guess, slip, teach) ◦forget, mislead, prior fixed at zero ◦For small teach values (e.g. 0.05), teach converges to zero ◦Yields nontrivial solutions for large teach values, but other parameters absorb some of the teach: ◦learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 100 students → learn=0.1586, guess=0.1648, slip=0.0856, teach= ◦learn=0.0900, guess=0.1400, slip=0.0900, teach=0.9000, 1000 students → learn=0.1643, guess=0.1940, slip=0.1102, teach=0.7225
Results: Preliminary simulations ◦4-parameter system (learn, guess, slip, teach) with students and high teach ◦prior=0.0000, learn=0.0900, guess=0.1400, slip=0.0900, teach= → prior=0.2184, learn=0.0841, guess=0.1239, slip=0.2658, teach= ◦prior and slip have high error, but learning/guess/teach are good ◦teach accuracy increases dramatically with sample size
Possible solutions ◦Answer items between independent learning and interaction (more observed data) ◦Mentor/mentee model: knowledge flows in only one direction ◦Eliminate different parameters, or combine parameters to create lower-dimensional space
Future work ◦Determine whether interaction model produces better predictions on synthetic data ◦Gather real-world pair interaction data using MOOCchat tool ◦Determine whether pair interaction produces better predictions ◦Typical values, appropriate interpretations for teach and mislead parameters? ◦Generalize to more complex interactions