Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers.

Similar presentations


Presentation on theme: "Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers."— Presentation transcript:

1 Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers

2 The Case Study Educational Data Mining in a practical setting Directed to a student advice procedure Eindhoven University of Technology, Electrical Engineering department

3 The Case Study: advice procedure PAGE 3July 2009 Exam results Pre-university student information September October November December January EXAMS HOLIDAY EXAMS Exam results ADVICE STUDENTS 30% 70% DEADLINE Talks with students etc.

4 Outline CRISP-DM Framework Understanding of context Data understanding Data preparation Modeling Evaluation Deployment Conclusions and further work PAGE 4July 2009

5 CRISP-DM Framework Understanding of context Data understanding Data preparation Modeling Evaluation Deployment PAGE 5July 2009

6 Understanding of context Situation at Electrical Engineering, Eindhoven University of Technology 40% dropout rate, small inflow Decision to dropout preferably before end of January Study advice by student counselor Objective for the department: More robust and objective advices PAGE 6July 2009

7 Understanding of context In data mining terms: Build model for academic success of a student Based on the currently available information Only information until December of year of enrollment. Objective for research: Try out applicability EDM in this context: −Enough data (amount)? −Enough data (type)? PAGE 7July 2009

8 Data understanding Data source Institutions’ database −Pre-university data −University data Resulting data Data from 648 students, from 2001-2009 PAGE 8July 2009

9 Data preparation (pre-university data) Standard preparatory education: # courses Type of courses taken Average grades for total, science, and math Non-standard previous education: Type Grade PAGE 9July 2009

10 Data preparation (university data) Courses, grades, # attempts Many transformations needed: Reorganizations Partial exams Example: Calculus 2000-2001: 1 examination 2001-2006: 2 partial examinations 2007-2008: 5 partial examinations, or 1 examination. PAGE 10July 2009

11 Modeling (general) Classification task 2 class classification Criterion: finish all courses of first year in three years Several mining techniques applied Decision trees (+ensembles), bayesian classifiers, association rules Separate university/pre-university data first PAGE 11July 2009

12 Modeling (pre-university data) Base line model One rule classifier 68% accuracy using Science_mean No significant improvement using other classification techniques PAGE 12July 2009

13 Modeling (university data) Base line model One rule classifier 75% accuracy using Linear algebra AB Significant improvements using other models (80%) Decision trees slightly better than other models PAGE 13July 2009

14 Modeling (total set) Accuracies 80%, using attributes from both subsets Improvements using cost matrices Shape misclassification Small trade-offs accuracy and misclassification: Accuracy 79%, 52% of errors FP Accuracy 76%, 41% of errors FP Similarities between models Linear Algebra AB always root node Science Mean always high in tree PAGE 14July 2009

15 Modeling (decision tree) LinAlgAB < 5.5 1 > 5.5 CalcA < 5.15 1 > 5.15 VWO_Sc_mean 1 {good, excellent} {n/a, poor, avg, above avg} 0 79% Accuracy PAGE 15July 2009

16 Evaluation Detailed manual analysis by student counselor: Review the classification measure: −25% of False Negatives should be true negatives −How to classify skilled people who leave? Improve data transformations PAGE 16July 2009

17 Deployment Objectives More robust and objective advices: −80% accuracy is possible, clear directions for improvements. Try out applicability EDM in this context: −Enough data (amount)? −Yes, and more is not easily obtainable −Enough data (type)? −Would probably be very useful, but costly. Deployment possible after improvements PAGE 17July 2009

18 Conclusions and further work EDM can help in a study advice process: 80% accuracy is possible, clear directions for improvements. EDM can work using small datasets and a limited amount of data categories Further work: Improve data transformations Improve classification measure: better two- class, move to three-class Review use of additional data PAGE 18July 2009

19 Questions?


Download ppt "Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers."

Similar presentations


Ads by Google