Educational Data Mining Overview John Stamper PSLC Summer School 2011 7/25/2011 1PSLC Summer School 2011.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 23, 2012.
Design of Experiments Lecture I
Intro to EDM Why EDM now? Which tools to use in class Week 1, video 1.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 12, 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2012.
Educational Data Mining Overview Ryan S.J.d. Baker PSLC Summer School 2010.
Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 18, 2013.
Educational data mining overview & Introduction to Exploratory Data Analysis Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction.
Improving learning by improving the cognitive model: A data- driven approach Cen, H., Koedinger, K., Junker, B. Learning Factors Analysis - A General Method.
Knowledge Engineering Week 3 Video 5. Knowledge Engineering  Where your model is created by a smart human being, rather than an exhaustive computer.
Educational Data Mining March 3, Today’s Class EDM Assignment#5 Mega-Survey.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Ryan S.J.d. Baker Adam B. Goldstein Neil T. Heffernan Detecting the Moment of Learning.
Neuroinformatics 1: review of statistics Kenneth D. Harris UCL, 28/1/15.
Mgt 240 Lecture Decision Support Systems March 3, 2005.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University Ryan S.J.d.
Searching for Patterns: Sean Early PSLC Summer School 2007 Question: Which is a better predictor of performance in a cognitive tutor, error rate or assistance.
Classical Techniques: Statistics, Neighborhoods, and Clustering.
MATH408: Probability & Statistics Summer 1999 WEEKS 8 & 9 Dr. Srinivas R. Chakravarthy Professor of Mathematics and Statistics Kettering University (GMI.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab The University of Texas at Dallas.
Presenter: Teng-Chih Yang Professor: Ming-Puu Chen Date: 10/ 28/ 2009 Data mining in course management systems: Moodle case study and tutorial Romero,
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Educational Data Mining Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Richard Scheines Professor of Statistics, Machine Learning, and Human-Computer.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Data Mining Techniques
Feature Engineering Week 3 Video 3. Feature Engineering.
Classifiers, Part 1 Week 1, video 3:. Prediction  Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
PSLC DataShop Introduction Slides current to DataShop version John Stamper DataShop Technical Director.
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Data Collection & Processing Hand Grip Strength P textbook.
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 8: Quantitative.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 February 13, 2012.
Appendix: The WEKA Data Mining Software
Feature Engineering Studio September 23, Welcome to Mucking Around Day.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Educational Data Mining: Discovery with Models Ryan S.J.d. Baker PSLC/HCII Carnegie Mellon University Ken Koedinger CMU Director of PSLC Professor of Human-Computer.
C M Clarke-Hill1 Analysing Quantitative Data Forming the Hypothesis Inferential Methods - an overview Research Methods.
1 …continued… Part III. Performing the Research 3 Initial Research 4 Research Approaches 5 Hypotheses 6 Data Collection 7 Data Analysis.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Today’s Goals Answer questions about homework and lecture 2 Understand what a query is Understand how to create simple queries using Microsoft Access 2007.
Chapter 6: Analyzing and Interpreting Quantitative Data
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 January 23, 2013.
Core Methods in Educational Data Mining HUDK4050 Fall 2015.
Data Analytics CMIS Short Course part II Day 1 Part 1: Introduction Sam Buttrey December 2015.
Data mining with DataShop Ken Koedinger CMU Director of PSLC Professor of Human-Computer Interaction & Psychology Carnegie Mellon University.
D/RS 1013 Data Screening/Cleaning/ Preparation for Analyses.
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
HUDK5199: Special Topics in Educational Data Mining
Data-Driven Education
Data Mining 101 with Scikit-Learn
HUDK5199: Special Topics in Educational Data Mining
Finding Answers through Data Collection
Big Data, Education, and Society
Big Data, Education, and Society
Introduction to PSLC DataShop
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Core Methods in Educational Data Mining
Presentation transcript:

Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011

Welcome to the EDM track! 7/25/2011 2PSLC Summer School 2011

Educational Data Mining “Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.” – 7/25/2011 3PSLC Summer School 2011

Classes of EDM Method (Baker & Yacef, 2009) Prediction Clustering Relationship Mining Discovery with Models Distillation of Data For Human Judgment 7/25/2011 4PSLC Summer School 2011

Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) Does a student know a skill? Which students are off-task? Which students will fail the class? 7/25/2011 5PSLC Summer School 2011

Clustering Find points that naturally group together, splitting full data set into set of clusters Usually used when nothing is known about the structure of the data – What behaviors are prominent in domain? – What are the main groups of students? 7/25/2011 6PSLC Summer School 2011

Relationship Mining Discover relationships between variables in a data set with many variables – Association rule mining – Correlation mining – Sequential pattern mining – Causal data mining 7/25/2011 7PSLC Summer School 2011

Discovery with Models Pre-existing model (developed with EDM prediction methods… or clustering… or knowledge engineering) Applied to data and used as a component in another analysis 7/25/2011 8PSLC Summer School 2011

Distillation of Data for Human Judgment Making complex data understandable by humans to leverage their judgment Text replays are a simple example of this 7/25/2011 9PSLC Summer School 2011

Knowledge Engineering Creating a model by hand rather than automatically fitting model In one comparison, leads to worse fit to gold- standard labels of construct of interest than data mining (Roll et al, 2005), but similar qualitative performance 7/25/ PSLC Summer School 2011

EDM track schedule Tuesday 10am – Educational Data Mining with DataShop (Stamper) Tuesday 11am – Item Response Theory and Learning Factor Analysis (Koedinger) Tuesday 2:15pm – Principal Component Analysis, Additive Factor Model (Gordon) Tuesday 3:15pm (optional) – Hands-on Activity: Data Annotation for Classification – Hands-on Activity: Learning Curves and Logistic Regression in R 7/25/ PSLC Summer School 2011

EDM track schedule Wednesday 11am – Bayesian Knowledge Tracing; Prediction Models Wednesday 11:45am (optional) – Hands-on activity: Prediction modeling Wednesday 3:15pm – Machine Learning and SimStudent (Matsuda) 7/25/ PSLC Summer School 2011

Comments? Questions? 7/25/ PSLC Summer School 2011

EDM Tools 7/25/ PSLC Summer School 2011

PSLC DataShop Many large-scale datasets Tools for – exploratory data analysis – learning curves – domain model testing Detail tomorrow morning 7/25/ PSLC Summer School 2011

Microsoft Excel Excellent tool for exploratory data analysis, and for setting up simple models 7/25/ PSLC Summer School 2011

Pivot Tables 7/25/ PSLC Summer School 2011

Pivot Tables Who has used pivot tables before? 7/25/ PSLC Summer School 2011

Pivot Tables What do they allow you to do? 7/25/ PSLC Summer School 2011

Pivot Tables Facilitate aggregating data for comparison or use in further analyses 7/25/ PSLC Summer School 2011

Excel Add-ins Data Analysis – Statistical measures – T-tests, ANOVA, etc. Equation Solver – Allows you to fit mathematical models in Excel – Simple regression models 7/25/ PSLC Summer School 2011

Suite of visualizations Scatterplots (with or without lines) Bar graphs 7/25/ PSLC Summer School 2011

Free data mining packages Weka RapidMiner 7/25/ PSLC Summer School 2011

Weka.vs. RapidMiner Weka easier to use than RapidMiner RapidMiner significantly more powerful and flexible (from GUI, both are powerful and flexible if accessed via API) 7/25/ PSLC Summer School 2011

In particular… It is impossible to do key types of model validation for EDM within Weka’s GUI RapidMiner can be kludged into doing so (more on this in hands-on session Wed) No tool really tailored to the needs of EDM researchers at current time… 7/25/ PSLC Summer School 2011

SPSS SPSS is a statistical package, and therefore can do a wide variety of statistical tests It can also do some forms of data mining, like factor analysis (a relative of clustering) 7/25/ PSLC Summer School 2011

SPSS The difference between statistical packages (like SPSS) and data mining packages (like RapidMiner and Weka) is: – Statistics packages are focused on finding models and relationships that are statistically significant (e.g. the data would be seen less than 5% of the time if the model were not true) – Data mining packages set a lower bar – are the models accurate and generalizable? 7/25/ PSLC Summer School 2011

R R is an open-source competitor to SPSS More powerful and flexible than SPSS But much harder to use – I find it easy to accidentally do very, very incorrect things in R 7/25/ PSLC Summer School 2011

Matlab A powerful tool for building complex mathematical models Beck and Chang’s Bayes Net Toolkit – Student Modeling is built in Matlab 7/25/ PSLC Summer School 2011

Comments? Questions? 7/25/ PSLC Summer School 2011

Pre-processing Where does EDM data come from? 7/25/ PSLC Summer School 2011

Wherever you get your data from You’ll need to process it into a form that software can easily analyze, and which builds successful models 7/25/ PSLC Summer School 2011

Common approach Flat data file – Even if you store your data in databases, most data mining techniques require a flat data file 7/25/ PSLC Summer School 2011

Some useful features to distill for educational software Type of interface widget “Pknow”: The probability that the student knew the skill before answering (using Bayesian Knowledge- Tracing or PFA or your favorite approach) Assessment of progress student is making towards correct answer (how many fewer constraints violated) Whether this action is the first time a student attempts a given problem step “Optoprac”: How many problem steps involving this skill that the student has encountered 7/25/ PSLC Summer School 2011

Some useful features to distill for educational software “timeSD”: time taken in terms of standard deviations above (+) or below (-) average for this skill across all actions and students “time3SD”: sum of timeSD for the last 3 actions (or 5, or 4, etc. etc.) Action type counts or percents – Total number of action so far – Total number of action on this skill, divided by optoprac – Number of action in last N actions – Could be assessment of action (wrong, right), or type of action (help request, making hypothesis, plotting point) 7/25/ PSLC Summer School 2011

Any other recommendations? 7/25/ PSLC Summer School 2011

Code Available Ryan Baker has code available for EDM – – Distilling DataShop data – Bayesian Knowledge Tracing 7/25/ PSLC Summer School 2011

Comments? Questions? 7/25/ PSLC Summer School 2011

Time to work on projects 7/25/ PSLC Summer School 2011