Download presentation
Presentation is loading. Please wait.
Published byJayden Score Modified over 3 years ago
1
Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers
2
The Case Study Educational Data Mining in a practical setting Directed to a student advice procedure Eindhoven University of Technology, Electrical Engineering department
3
The Case Study: advice procedure PAGE 3July 2009 Exam results Pre-university student information September October November December January EXAMS HOLIDAY EXAMS Exam results ADVICE STUDENTS 30% 70% DEADLINE Talks with students etc.
4
Outline CRISP-DM Framework Understanding of context Data understanding Data preparation Modeling Evaluation Deployment Conclusions and further work PAGE 4July 2009
5
CRISP-DM Framework Understanding of context Data understanding Data preparation Modeling Evaluation Deployment PAGE 5July 2009
6
Understanding of context Situation at Electrical Engineering, Eindhoven University of Technology 40% dropout rate, small inflow Decision to dropout preferably before end of January Study advice by student counselor Objective for the department: More robust and objective advices PAGE 6July 2009
7
Understanding of context In data mining terms: Build model for academic success of a student Based on the currently available information Only information until December of year of enrollment. Objective for research: Try out applicability EDM in this context: −Enough data (amount)? −Enough data (type)? PAGE 7July 2009
8
Data understanding Data source Institutions’ database −Pre-university data −University data Resulting data Data from 648 students, from 2001-2009 PAGE 8July 2009
9
Data preparation (pre-university data) Standard preparatory education: # courses Type of courses taken Average grades for total, science, and math Non-standard previous education: Type Grade PAGE 9July 2009
10
Data preparation (university data) Courses, grades, # attempts Many transformations needed: Reorganizations Partial exams Example: Calculus 2000-2001: 1 examination 2001-2006: 2 partial examinations 2007-2008: 5 partial examinations, or 1 examination. PAGE 10July 2009
11
Modeling (general) Classification task 2 class classification Criterion: finish all courses of first year in three years Several mining techniques applied Decision trees (+ensembles), bayesian classifiers, association rules Separate university/pre-university data first PAGE 11July 2009
12
Modeling (pre-university data) Base line model One rule classifier 68% accuracy using Science_mean No significant improvement using other classification techniques PAGE 12July 2009
13
Modeling (university data) Base line model One rule classifier 75% accuracy using Linear algebra AB Significant improvements using other models (80%) Decision trees slightly better than other models PAGE 13July 2009
14
Modeling (total set) Accuracies 80%, using attributes from both subsets Improvements using cost matrices Shape misclassification Small trade-offs accuracy and misclassification: Accuracy 79%, 52% of errors FP Accuracy 76%, 41% of errors FP Similarities between models Linear Algebra AB always root node Science Mean always high in tree PAGE 14July 2009
15
Modeling (decision tree) LinAlgAB < 5.5 1 > 5.5 CalcA < 5.15 1 > 5.15 VWO_Sc_mean 1 {good, excellent} {n/a, poor, avg, above avg} 0 79% Accuracy PAGE 15July 2009
16
Evaluation Detailed manual analysis by student counselor: Review the classification measure: −25% of False Negatives should be true negatives −How to classify skilled people who leave? Improve data transformations PAGE 16July 2009
17
Deployment Objectives More robust and objective advices: −80% accuracy is possible, clear directions for improvements. Try out applicability EDM in this context: −Enough data (amount)? −Yes, and more is not easily obtainable −Enough data (type)? −Would probably be very useful, but costly. Deployment possible after improvements PAGE 17July 2009
18
Conclusions and further work EDM can help in a study advice process: 80% accuracy is possible, clear directions for improvements. EDM can work using small datasets and a limited amount of data categories Further work: Improve data transformations Improve classification measure: better two- class, move to three-class Review use of additional data PAGE 18July 2009
19
Questions?
Similar presentations
© 2018 SlidePlayer.com Inc.
All rights reserved.
Ppt on media research methods Ppt on effective business communication skills Ppt on object-oriented programming Ppt on complex numbers and quadratic equations for class 11 Ppt on abstract art Ppt on magneto optical current transformer Ppt on blue eye technology Ppt on indian army weapons Ppt on hindu religion gods Ppt on area of parallelogram with vectors