Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013.

Slides:



Advertisements
Similar presentations
Week 1, video 2: Regressors. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other.
Advertisements

A brief review of non-neural-network approaches to deep learning
Bayesian Knowledge Tracing Prediction Models
Brief introduction on Logistic Regression
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Regularization David Kauchak CS 451 – Fall 2013.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Feature Engineering Studio Special Session October 23, 2013.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 18, 2013.
Prediction with Regression
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 27, 2012.
Lecture 22: Evaluation April 24, 2010.
Educational Data Mining March 3, Today’s Class EDM Assignment#5 Mega-Survey.
Week 2 Video 4 Metrics for Regressors.
Three kinds of learning
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Classification and Prediction: Regression Analysis
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 3 Correlation and Prediction.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
Today Evaluation Measures Accuracy Significance Testing
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Prediction (Classification, Regression) Ryan Shaun Joazeiro de Baker.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Statistical Analysis Topic – Math skills requirements.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Feature Engineering Studio September 9, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 27, 2013.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.
From OLS to Generalized Regression Chong Ho Yu (I am regressing)
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 6, 2013.
Feature Engineering Studio Special Session September 25, 2013.
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Oct 29th, 2001Copyright © 2001, Andrew W. Moore Bayes Net Structure Learning Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Chapter 7. Classification and Prediction
Multiple Regression.
Prediction (Classification, Regression)
Feature Engineering Studio Special Session
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Predictive Modeling
Linear Model Selection and regularization
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Support Vector Machines 2
Is Statistics=Data Science
Presentation transcript:

Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 7, 2013

Today’s Class Regression in Prediction

There is something you want to predict (“the label”) The thing you want to predict is numerical – Number of hints student requests – How long student takes to answer – What will the student’s test score be

Regression in Prediction A model that predicts a number is called a regressor in data mining The overall task is called regression

Regression Associated with each label are a set of “features”, which maybe you can use to predict the label Skillpknowtimetotalactionsnumhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Regression The basic idea of regression is to determine which features, in which combination, can predict the label’s value Skillpknowtimetotalactionsnumhints ENTERINGGIVEN ENTERINGGIVEN USEDIFFNUM ENTERINGGIVEN REMOVECOEFF REMOVECOEFF USEDIFFNUM ….

Linear Regression The most classic form of regression is linear regression There are courses called “regression” at a lot of universities that don’t go beyond linear regression

Linear Regression The most classic form of regression is linear regression Numhints = 0.12*Pknow *Time – 0.11*Totalactions Skillpknowtimetotalactionsnumhints COMPUTESLOPE ?

Linear Regression Linear regression only fits linear functions (except when you apply transforms to the input variables, which most statistics and data mining packages can do for you…)

Non-linear inputs What kind of functions could you fit with Y = X 2 Y = X 3 Y = sqrt(X) Y = 1/x Y = sin X Y = ln X

Linear Regression However… It is blazing fast It is often more accurate than more complex models, particularly once you cross-validate – Caruana & Niculescu-Mizil (2006) It is feasible to understand your model (with the caveat that the second feature in your model is in the context of the first feature, and so on)

Example of Caveat Let’s study a classic example

Example of Caveat Let’s study a classic example Drinking too much prune nog at a party, and having to make an emergency trip to the Little Researcher’s Room

Data

Some people are resistent to the deletrious effects of prunes and can safely enjoy high quantities of prune nog!

Learned Function Probability of “emergency”= 0.25 * # Drinks of nog last 3 hours * (Drinks of nog last 3 hours) 2 But does that actually mean that (Drinks of nog last 3 hours) 2 is associated with less “emergencies”?

Learned Function Probability of “emergency”= 0.25 * # Drinks of nog last 3 hours * (Drinks of nog last 3 hours) 2 But does that actually mean that (Drinks of nog last 3 hours) 2 is associated with less “emergencies”? No!

Example of Caveat (Drinks of nog last 3 hours) 2 is actually positively correlated with emergencies! – r=0.59

Example of Caveat The relationship is only in the negative direction when (Drinks of nog last 3 hours) is already in the model…

Example of Caveat So be careful when interpreting linear regression models (or almost any other type of model)

Comments? Questions?

Regression Trees

Regression Trees (non-linear; RepTree) If X>3 – Y = 2 – else If X<-7 Y = 4 Else Y = 3

Linear Regression Trees (linear; M5’) If X>3 – Y = 2A + 3B – else If X< -7 Y = 2A – 3B Else Y = 2A + 0.5B + C

Create a Linear Regression Tree to Predict Emergencies

Model Selection in Linear Regression Greedy M5’ None

Neural Networks Another popular form of regression is neural networks (also called Multilayer Perceptron) This image courtesy of Andrew W. Moore, Google

Neural Networks Neural networks can fit more complex functions than linear regression It is usually near-to-impossible to understand what the heck is going on inside one

Soller & Stevens (2007)

Neural Network at the MOMA

In fact The difficulty of interpreting non-linear models is so well known, that they put up a sign about it on the Belt Parkway

And of course… There are lots of fancy regressors in Data Mining packages like RapidMiner Support Vector Machine Poisson Regression LOESS Regression (“Locally weighted scatterplot smoothing”) Regularization-based Regression (forces parameters towards zero) – Lasso Regression (“Least absolute shrinkage and selection operator”) – Ridge Regression

Assignment 5 Let’s discuss your solutions to assignment 5

How can you tell if a regression model is any good?

Correlation/r 2 RMSE/MAD What are the advantages/disadvantages of each?

Cross-validation concerns The same as classifiers

Statistical Significance Testing F test/t test But make sure to take non-independence into account! – Using a student term

Statistical Significance Testing F test/t test But make sure to take non-independence into account! – Using a student term (but note, your regressor itself should not predict using student as a variable… unless you want it to only work in your original population)

As before… You want to make sure to account for the non- independence between students when you test significance An F test is fine, just include a student term (but note, your regressor itself should not predict using student as a variable… unless you want it to only work in your original population)

Alternatives Bayesian Information Criterion Akaike Information Criterion Makes trade-off between goodness of fit and flexibility of fit (number of parameters) Said to be statistically equivalent to cross- validation – May be preferable for some audiences

Questions? Comments?

Asgn. 7

Next Class Wednesday, March 13 Imputation in Prediction Readings Schafer, J.L., Graham, J.W. (2002) Missing Data: Our View of the State of the Art. Psychological Methods, 7 (2), Assignments Due: None

The End