Feature Engineering Studio Special Session January 26, 2015.

Slides:



Advertisements
Similar presentations
Project Based Learning
Advertisements

Feature Engineering Studio January 21, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
Knowledge Inference: Advanced BKT Week 4 Video 5.
Knowledge Engineering Week 3 Video 5. Knowledge Engineering  Where your model is created by a smart human being, rather than an exhaustive computer.
Please get your science notebook from off the table before we start class. Seat down in your seat and turn to the warm-up section (back of book). On.
Software Engineering Lab Session Session 4 – Feedback on Assignment 1 © Jorge Aranda, 2005.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 26, 2010.
Discovery with Models Week 8 Video 1. Discovery with Models: The Big Idea  A model of a phenomenon is developed  Via  Prediction  Clustering  Knowledge.
Data Synchronization and Grain-Sizes Week 3 Video 2.
Ryan S.J.d. Baker Adam B. Goldstein Neil T. Heffernan Detecting the Moment of Learning.
Ground Truth for Behavior Detection Week 3 Video 1.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 April 13, 2011.
Educational Data Mining Overview John Stamper PSLC Summer School /25/2011 1PSLC Summer School 2011.
Affective Computing and Intelligent Interaction Ma. Mercedes T. Rodrigo Ateneo Laboratory for the Learning Sciences Department of Information Systems and.
Please open your laptops, log in to the MyMathLab course web site, and open Daily Quiz 18. You will have 10 minutes for today’s quiz. The second problem.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Test Preparation Strategies
How Can We Help You ! This is what your ISS’ will offer during the Gallery Walk Sessions Secondary Math: #1 HS Activity Generator: Ready to go resource.
CS548 Showcase Using SPSS for Data Mining Ahmedul Kabir.
Feature Engineering Studio Special Session September 11, 2013.
Drury Middle School Mr. Steele – Curriculum Director DMS Welcome – Dr. Slye & fellow colleagues 40 minute presentation on our Princpal’s vision for the.
Case Study – San Pedro Week 1, Video 6. Case Study of Classification  San Pedro, M.O.Z., Baker, R.S.J.d., Bowers, A.J., Heffernan, N.T. (2013) Predicting.
Data Annotation for Classification. Prediction Develop a model which can infer a single aspect of the data (predicted variable) from some combination.
Meta-Cognition, Motivation, and Affect PSY504 Spring term, 2011 January 13, 2010.
Understanding the tools  Online environment  Moodle  Forums and Groups  and files  Chat and Office Hours  Dos and Don'ts.
Boredom Across Activities, and Across the Year, within Reasoning Mind William L. Miller, Ryan Baker, Mathew Labrum, Karen Petsche, Angela Z. Wagner.
CAMP 4:4:3 Power Session 2: Customer Service Selling.
My 5 th Grade Portfolio Jeffrey Robinson. Opening Note Date: These are things I think I do well. 1.Math 2.Science 3.Language Arts I am most proud.
Human Computer Interaction
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 2, 2012.
Science Presentation Outline and Sequence 7 th and 8 th grade.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Feature Engineering Studio September 9, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
Engaging Students in New Ways of Learning Candace Thille, Director OLI Carnegie Mellon University.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 26, 2012.
College of Engineering and Science Louisiana Tech University College of Engineering and Science Connecting Mathematics with Engineering and the Sciences.
Feature Engineering Studio September 23, Let’s start by discussing the HW.
Melissa Nelson EDU 521 Fall First Grade Standards Whole Class KWLLearning Centers Small Groups Math : Determine and compare sets of pennies.
Checking our Connection Between Home and School. SOL’s in Fourth Grade Reading Math Social Studies.
Feature Engineering Studio September 30, Quick Note Please me for appointments rather than just showing up at my office – I’m always glad.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 February 27, 2013.
Feature Engineering Studio October 7, Welcome to Bring Me a Rock Day 2.
Feature Engineering Studio April 29, Assignment Problem Shift “The Fresh Mind”
Feature Engineering Studio September 9, Welcome to Feature Engineering Studio Design studio-style course teaching how to distill and engineer features.
ENSURING MATHEMATICS SUCCESS. – WHO IS AT RISK FOR PERSISTENT MATHEMATICS DIFFICULTIES IN THE UNITED STATES?, MORGAN ET AL., JOURNAL OF LEARNING DISABILITIES,
Chapter 1 What is Biology? 1.1 Science and the Natural World.
Feature Engineering Studio February 2, Welcome to Problem Proposal Day Rules for Presenters Rules for the Rest of the Class.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 January 25, 2012.
1 ST GRADE Prior Knowledge. Using this PowerPoint The purpose of this PowerPoint is for students to be able to access engaging online activities to help.
Formative Assessment with A free public service of.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Core Methods in Educational Data Mining HUDK4050 Fall 2014.
Florida Standards Assessment Parent Night 3 RD GRADE.
Feature Engineering Studio October 7, Welcome to Bring Me Another Rock.
Core Methods in Educational Data Mining
Data Mining in Education
Using Learning Analytics in Personalized Learning
CS147: Assignment #1, Needfinding
Core Methods in Educational Data Mining
Writing to Learn vs. Writing in the Disciplines
Core Methods in Educational Data Mining
Big Data, Education, and Society
Big Data, Education, and Society
Feature Engineering Studio
The Take-Away What are they learning?.
Core Methods in Educational Data Mining
Core Methods in Educational Data Mining
Educational Data Mining Success Stories
Core Methods in Educational Data Mining
Presentation transcript:

Feature Engineering Studio Special Session January 26, 2015

Assignment One Problem Proposal – Due next Monday Be ready to talk for 3 minutes on: – A data set Give where it came from and how big it is You need to already have this data set, or be able to acquire it in the next two weeks – A prediction (or other statistical) model you will build in this data set – What variable will you predict? – What kind of variables will you use to predict it? – Why is this worth doing?

Example (Pardos et al., 2014) Data set – ASSISTments system, formative assessment and learning software for math used by 60k students a year (Razzaq et al., 2007) – 810,000 data points from 229 students studied – Student actions in the software have been overlaid with synchronized field codes of student affect (boredom, frustration, etc.) 3075 field codes Each field code connects to 20 seconds of log file actions

Example (Pardos et al., 2014) We will predict whether a student is bored at a specific time – So that we can replicate the human judgments without needing a field observer We will predict this from what was going on in the log files at the time the field observation was made – We know every student action’s correctness, timing, relevant skill, and probability they knew the skill

Example (Pardos et al., 2014) This is worth doing because boredom is known to predict student learning (Craig et al., 2004; Rodrigo et al., 2009; Pekrun et al., 2010) And building a detector will help us study boredom more thoroughly As well as enabling us to intervene on boredom in real time

Important Considerations Is the problem genuinely important? (usable or publishable) Is there a good measure of ground truth? (the variable you want to predict) Do we have rich enough data to distill meaningful features? Is there enough data to be able to take advantage of data mining?

What concerns you? Data set What variable will you predict? What kind of variables will you use to predict it?

Data Set Who here has a data set, but has concerns about it? Who here doesn’t have a data set?

Data Set Who here has a data set, but has concerns about it? Who here doesn’t have a data set?

Data Set Who here has a data set, but has concerns about it? Who here doesn’t have a data set?

Online Learning ASSISTments (Neil Heffernan) Genetics Tutor (Albert Corbett) Impulse (Elizabeth Rowe) Inq-ITS (Janice Gobert) Physics Playground/Newton’s Playground (Val Shute) Refraction (Taylor Martin) Mathemantics (Herb Ginsburg) Vialogues (Gary Natriello, Hui Soo Chae) Project LISTEN (Jack Mostow) TC3-Sim (Robert Sottilare)

Online Learning Big Data and Education (me) Data, Analytics, and Learning (me) SQL-Tutor (Tanja Mitrovic) Project ARIES (Art Graesser) ALEKS (Xiangen Hu) Ecolab (Genaro Rebolledo-Mendez) Fractions Tutor (Vincent Aleven) Help Tutor (Ido Roll) InventionLab (Ido Roll) BlueJ (Matt Jadud) Aplusix (Jean-Francois Nicaud) Second Life (Bruce Homer)

Online Learning International use of Scatterplot Tutor (me) Zombie Division (Jake Habgood) Virtual Performance Assessments (Jody Clarke-Midura) EcoMUVE (Shari Metcalfe) Reasoning Mind (George Khachatryan) Chemistry Virtual Laboratory (David Yaron) Tuunu data (Fewof Mopfsan)

Potential Data Sources Grade data (Alex Bowers) Course-taking and dropout data (Cristobal Romero) BROMP data (me) Center for the Science of Learning Data (Krishna Srinivasan)

Procedure Pick a data set If I have it on hand, we talk right away If not, I broker a conversation

What variable will you predict? Something already directly labeled – Student was bored at 2:10:13 pm Something indirectly labeled – Student had 15% overall learning gain Something you can label with text replays – Student gamed the system while using learning system

Let’s discuss specific data sets you guys are interested in

What kind of variables will you use as predictors? You don’t need to have specific ideas at this stage The main question is, do you have the right kind of data to be able to do this at all?

Questions? Concerns?