Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ground Truth for Behavior Detection Week 3 Video 1.

Similar presentations


Presentation on theme: "Ground Truth for Behavior Detection Week 3 Video 1."— Presentation transcript:

1 Ground Truth for Behavior Detection Week 3 Video 1

2 Welcome to Week 3  Over the last two weeks, we’ve discussed prediction models  This week, we focus on a type of prediction models called behavior detectors

3 Behavior Detectors  Automated models that can infer from log files whether a student is behaving in a certain way  We discussed examples of this  off-task behavior and gaming detectors  In the San Pedro et al. case study in week 1

4 Behaviors people have detected

5 Disengaged Behaviors  Gaming the System (Baker et al., 2004, 2008, 2010; Cheng & Vassileva, 2005; Walonoski & Heffernan, 2006; Beal, Qu, & Lee, 2007)  Off-Task Behavior (Baker, 2007; Cetintas et al., 2010)  Carelessness (San Pedro et al., 2011; Hershkovitz et al., 2011)  WTF Behavior (Rowe et al., 2009; Wixon et al., UMAP2012) 5

6 Meta-Cognitive Behaviors  Help Avoidance (Aleven et al., 2004, 2006)  Unscaffolded Self-Explanation (Shih et al., 2008; Baker, Gowda, & Corbett, 2011)  Exploration Behaviors (Amershi & Conati, 2009) 6

7 If you’re not interested in behavior detectors…  Feel free to take the week off and rejoin us in a week  Although there may still be stuff that’s interesting to you this week

8 Ground Truth  The first big issue with developing behavior detectors is…  Where do you get the prediction labels?

9 Behavior Labels are Noisy  No perfect way to get indicators of student behavior  It’s not truth  It’s ground truth  (Truthiness?)

10 Behavior Labels are Noisy  Another way to think of it  In some areas, there are “gold-standard” measures  As good as gold  With behavior detection, we have to work with “bronze-standard” measures  Gold’s less expensive cousin

11 Is this a problem?  Not really  It does limit how good we can realistically expect our models to be  If your training labels have inter-rater agreement of Kappa = 0.62  You probably should not expect (or want) your detector to have Kappa = 0.75

12 Ultimately  “Anything worth doing is worth doing badly.” – Herb Simon Picture taken from CMU tribute page

13 Sources of Ground Truth for Behavior  Self-report  Field observations  Text replays  Video coding

14 Self-report  Fairly common for constructs like affect and self- efficacy  Not as common for labeling behavior

15 Are you gaming the system right now? Yes No

16 Was recommended to me by an IRB compliance officer in 2003 Are you gaming the system right now? Yes No

17 Field Observations  One or more observers watch students and take systematic notes on student behavior  Takes some training to do right (Ocumpaugh et al., 2012)  http://www.columbia.edu/~rsb2162/bromp.html  Free Android App developed by my group (Baker et al., 2011)  http://www.columbia.edu/~rsb2162/HART8.6.zip

18 Text replays  Pretty-prints of student interaction behavior from the logs

19 Examples

20

21

22

23 Major Advantages  Blazing fast to conduct  8 to 40 seconds per observation

24 Notes  Decent inter-rater reliability is possible  Agree with other measures of constructs  Can be used to train behavior detectors

25 Major Limitations  Limited range of constructs you can code  Gaming the System – yes  Collaboration in online chat – yes (Prata et al, 2008)  Frustration, Boredom – sometimes  Off-Task Behavior outside of software – no  Collaborative Behavior outside of software – no

26 Major Limitations  Lower precision (because lower bandwidth of observation)

27 Video Coding  Can be videos of live behavior in classrooms or screen replay videos  Slowest method  Replicable and precise  Some challenges in camera positioning  If you don’t get this right, it can be difficult to code your data

28 Inter-Rater Reliability  Good to check for inter-rater reliability when using expert coding  Researchers typically try to shoot for Kappa = 0.6 or higher for the expert coding  But ignore magic numbers…  Any real signal is something you can try to re-capture  I’d rather have 1,000 data points with Kappa = 0.5  Than 100 data points with Kappa = 0.7

29 Once you have ground truth…  You can build your detector!  Once you’ve synchronized your ground truth to the log data…

30 Next Lecture  Data Synchronization and Grain-Sizes


Download ppt "Ground Truth for Behavior Detection Week 3 Video 1."

Similar presentations


Ads by Google