Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data, Education, and Society

Similar presentations


Presentation on theme: "Big Data, Education, and Society"— Presentation transcript:

1 Big Data, Education, and Society
March 28, 2018

2 Assignment 2 Any questions on assignment 2?
Remember, it is due tomorrow – post to the forum on your same thread as last week

3 Validating model generalizability
Will your predictive/inferential model work in the situation you want to use it in?

4 Different than statistical significance
You can have a hugely statistically significant result But if it’s drawn from a different population or context than the context you want to apply it in, it may be inapplicable

5 Over-fitting Your model fits to noise rather than signal
Your model fits to features of your current data set rather than the broader set of contexts where you want to apply it

6 Training-test split Building your model on some data, testing on other data

7 Cross-validation Repeatedly building your model on some data, testing on other data 4-fold A, B, C -> D A, B, D -> C A, C, D -> B B, C, D -> A

8 Common mistake 8 years ago
Multiple data points for the same student Divide those data points into different folds Same student is in both training and test set Why is this a problem?

9 Common mistake 8 years ago
Multiple data points for the same student Divide those data points into different folds Same student is in both training and test set Why is this a problem? Usually addressed now through student-level cross-validation

10 Cross-group validation (Ocumpaugh et al., 2014)
Train on N groups, test on 1 group Example: Train on Urban and Suburban students, test on Rural students

11 All-group validation Train on all groups, test on held-out set from all groups Check performance on each group Example: Train on Urban, Suburban, Rural Test on new Urban, Test on new Suburban, Test on new Rural

12 Why… Cross-group instead of all-group?
All-group instead of cross-group?

13 What are some groups… It might make sense to split by during validation?

14 Of course… Testing across all these groups requires having enough data for all of them! Or indeed, any data at all

15 The perniciousness of convenience samples
Much easier to collect data for suburban middle-class students than other groups in USA

16 Questions? Comments?

17 Contextual cross-validation
Easy example is lessons in tutors or levels in games (Baker et al., 2008; Karumbaiah et al., under review)

18 Contextual cross-validation
Easy example is lessons in tutors or levels in games (Baker et al., 2008; Karumbaiah et al., under review) What are some other examples of contexts to validate across?

19 Far generalizability Generalizability across learning systems – Paquette’s work last week

20 Important Consideration
Where do you want to be able to use your model? New students? New schools? New populations? New software content?

21 Common Practice Different model for every school or university
Test for overall performance No attention to how well it captures performance on subgroups within school or university Is this good enough? If not, what is a rational and affordable alternative?

22 Politics can get in the way
Limitations on demographic data Limitations on IEP data Limitations on data, period

23 Questions? Comments?

24 Debate Imagine that we are developing a model to predict whether a teenager will engage in school violence

25 Debate Is it best to have a model that is:
Moderately accurate for all students Very accurate for most groups of students, but very inaccurate for one group (10%) of students Very accurate for 100% of students, but has different models for different students (i.e. the same behavior is punished for some students but not others)

26 Questions? Comments?

27 Upcoming office hours April 4 930am-1030am or by appointment


Download ppt "Big Data, Education, and Society"

Similar presentations


Ads by Google