Presentation is loading. Please wait.

Presentation is loading. Please wait.

Utilizing “big Data” analytics for student success

Similar presentations


Presentation on theme: "Utilizing “big Data” analytics for student success"— Presentation transcript:

1 Utilizing “big Data” analytics for student success
IACRao October 2016 Brent Drake, PhD Chief Data Officer Office of Institutional Research, Assessment, and Effectiveness

2 Student facing analytics history
Course Signals Course early warning system created in 2007 Logistic regression predicting risk of failing a course Course performance Interaction with Blackboard (learning management system) Prior academic history High school GPA Standardized test scores Demographics Residency Age Credits Attempted Scoring model is run on-demand by instructors Predicted risk score generates stop light signal on student’s course home page Made hay with it, sold to Ellucian, promoted but: Low adoption on campus because faculty has to set up course a specific way Want to broaden our data sources Wanted to build an infrastructure for big data analysis where we could collaborate with faculty

3 Institutional data analytics platform
Partner with EMC and Pivotal to build “big data” environment Hardware/software environment Hadoop Server HAWQ – high performance parallel computing cluster Greenplum Database – Massively parallel database Machine Learning Toolset PostgreSQL MADlib PL/Python PL/R PL/pgSQL Create data lake of student success relevant data 360 degree view of students’ activity and profile Presently has around 56 billion rows and over 20 terabyte of data

4 Institutional data analytics platform
Initial data sources Additional Behavior Markers Class Activity (LMS) Use of Services (Card Transactions) Physical Location (Network Log) IDAP Banner Blackboard (Expanded to Gradebook) Card Service Transactions (location/facilities used) Network Activity (Login Geolocation)

5 Boosted decision trees
Construct a decision tree for each sample M features N examples Take the majority vote Randomly draw datasets with replacement from the training data, each sample the same size as the original training set ....… K samples ....… When we talk about machine learning algorithms what are we talking about

6 Model performance metrics
Precision, Recall, and F-score What are they? An Example: Of 100 students, 30 graduated late, 70 graduated on time. If we predict 44 late graduates, of which 24 were correct (actually late) and 20 were incorrect (actually on-time): Precision: 24/44 ~ 54% precision Recall: 24/30 ~ 80% recall F-Score: 2*Precision*Recall/(Precision+Recall) ~ 64% Support: late - 30, on-time - 70 What is ‘good’? If we were to randomly guess classifications in proportion to their prevalence (ex. if 10% of students fail, randomly predict 10% of the students as failing) we will have precision and recall equal to this proportion

7 Graduation Class Label
Model results Our models predict well Models all have recall of at least 64% on the classification of interest. We have tuned the models to achieve highest possible recalls on students who are potentially “at-risk”. Precision and recall if randomly selected a student without knowing anything about them: Late graduate is .34 First GPA below 2.5 is .28 Course GPA <=2.0 is .10 At-Risk First GPA Precision Recall F-Score <= 2.5 0.43 0.64 0.51 > 2.5 0.87 0.73 0.79 Graduation Class Label Precision Recall F-Score Dropped 0.86 0.25 0.39 Late 0.41 0.71 0.52 Normal 0.63 0.51 0.57 Course GPA Precision Recall F-Score <= 2.0 0.76 0.70 0.73 > 2.0 0.94 0.96 0.95

8 https://purdue.edu/forecast

9

10

11

12

13

14

15

16

17 Working on adding specific course

18 questions Brent M. Drake, PhD Purdue University bmdrake@purdue.edu
We have obviously made a large commitment and investment into investigating and acting on these warning triggers for students, but going back to our initial efforts there are many things campuses can do that are not as large of an investment. Likely late registration is an issue on campus, students who fall between a 2.0 and 2.5 GPA (murky middle likely an issue), high risk populatins – underrepresented groups, low income, first generation can build interventions around them now without making this kind of investment Brent M. Drake, PhD Purdue University


Download ppt "Utilizing “big Data” analytics for student success"

Similar presentations


Ads by Google