Analysis on Accelerated Learning Cohorts

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

An Introduction to Boosting Yoav Freund Banter Inc.
Indian Statistical Institute Kolkata
Introduction to Predictive Learning
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Tree Models in Data Mining
Evaluating Classifiers
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Look-ahead Linear Regression Trees (LLRT)
Today Ensemble Methods. Recap of the course. Classifier Fusion
Linear Discriminant Analysis and Logistic Regression.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Data analysis tools Subrata Mitra and Jason Rahman.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
PREDICTING SONG HOTNESS
GROUP GOAL Learn and understand python programing language Libraries: Pandas Numpy SKlearn Use machine learning algorithms Decision trees Random Forests.
Utilizing “big Data” analytics for student success
7. Performance Measurement
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Machine Learning – Classification David Fenyő
Analysis of Fastenal Quoting Practices
Week 2 Presentation: Project 3
Evaluating Classifiers
An Empirical Comparison of Supervised Learning Algorithms
Trees, bagging, boosting, and stacking
David L. Olson Department of Management University of Nebraska
COMP1942 Classification: More Concept Prepared by Raymond Wong
Can Computer Algorithms Guess Your Age and Gender?
Statistical Techniques
Predicting survivors of Neonatal calf diarrhea (NCD) using Logistic Regression or Gradient Boosting Stefano Biffani*, Cesare Lubiano1, Davide Pravettoni1.
ECE 5424: Introduction to Machine Learning
Linear regression project
Dipartimento di Ingegneria «Enzo Ferrari»,
Basic machine learning background with Python scikit-learn
Advanced Analytics Using Enterprise Miner
Predicting Academic Performance of University Students
NBA Draft Prediction BIT 5534 May 2nd 2018
Machine Learning & Data Science
Machine Learning Week 1.
Advanced Analytics. Advanced Analytics What is Machine Learning?
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Feature Engineering Studio Special Session
PROBLEM 1 Training Examples: Class 1 Training Examples: Class 2
Ying shen Sse, tongji university Sep. 2016
iSRD Spam Review Detection with Imbalanced Data Distributions
Implementing AdaBoost
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Reasoning in Psychology Using Statistics
Predicting Loan Defaults
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Support Vector Machines 2
Information Organization: Evaluation of Classification Performance
Presenter: Donovan Orn
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Analysis on 2013-2018 Accelerated Learning Cohorts Benjamin Brown, Grace Rusth The Office of Educational Partnerships and Outreach Oregon Institute of Technology Contact Benjamin: benjaminbrown.cpe@gmail.com

Project Goals Data Analysis Machine Learning Algorithm Generate functional statistics to assist Oregon Tech’s Strategic Enrollment Management division in targeted recruitment of current high school non- degree seeking students Find a usable machine learning prediction model that is reasonably accurate (greater than 75% prediction accuracy) Emphasize accuracy with predicting who will matriculate over who will not matriculate

Overview Data Analysis Machine Learning Algorithm Data gathered for the 2013-2018 cohorts 22716 samples with 12 provided features and 1 generated feature The dataset includes students who have started the dual credit programs in the last 4 years but have not yet graduated All students in the 2013-2015 cohorts should have graduated Ran statistical analysis on the data provided by Oregon Tech’s Office of Institutional Research using Excel functions, charts, and graphs to aid in explanation Programmed in Python using the scipy, numpy, pandas, sklearn, graphviz, and matplotlib modules Ran five different machine learning algorithms to compare accuracies and determine best model for prediction

10-Fold CV Score Comparison Logistic Regression: Mean: 0.7705 Standard Deviation: 0.01706 Linear Discrimination Analysis Mean: 0.7404 Standard Deviation: 0.01956 KNN (k = 5) Mean: 0.7564 Standard Deviation: 0.01475 Support Vector Classification Mean: 0.7803 Standard Deviation: 0.01984 Binary Decision Tree Mean: 0.794 Standard Deviation: 0.01602

Methods By subject comparisons By school comparisons Data Analysis Binary Decision Tree By subject comparisons By school comparisons 2013-2015 vs 2016-2018 cohort comparisons Full dataset comparisons Tree depth of 5 is optimal with this dataset to not over fit Final model predicts 2016-2018 cohort matriculations off of a decision tree trained on the 2013-2015 cohort Validated by splitting the 2013- 2015 cohort before training the model Assumptions: Matriculation can be predicted. All of the included variables (12 given variables: Term, Prefix, Credits, Student Type, Gender, High School, Metro Area)can be used to assist in predicting matriculation. 2013-2015 Cohorts all have had ample time to graduate

Machine Learning Results Decision Tree Predictions with Validation Set 10-Fold Cross Validation Accuracy Accuracy score: 0.9156 Confusion matrix: No Yes No [[14634 947] Yes [ 425 246]] Classification report Precision Recall F1-score Total No 0.97 0.94 0.96 15581 Yes 0.21 0.37 0.26 671 Min: 0.7644 Max: 0.8173 Mean: 0.794 Standard Deviation: 0.01602 Actual = Rows Predicted = Columns Precision: Correct/ total column (true predicted Yes/No / total predicted Yes/No) Recall: Correct / total row (True predicted Yes/No / actual Yes/No) F1: Harmonic mean of precision and recall (2*precision*recall/(precision+recall))

Final Decision Tree

Results Data Analysis Binary Decision Tree Schools geographically close to Oregon Tech’s main campus have higher matriculation rates Students who take more specialized classes are more likely to matriculate to Oregon Tech Students who matriculate take more credits on average than those who do not Determined matriculation can be predicted with acceptable accuracy using decision trees Garnered interest from administrators for further applications of machine learning algorithms within Oregon Tech Geographically close = within ~106 miles More specialized classes: CST/EE/MFG/etc. over MATH/ENG/WRI/etc. More credits = +4 credits over non-Mat, on average.

References and Acknowledgements Idea originated through collaboration between Grace Rusth and Benjamin Brown Data retrieved by Oregon Tech’s Office of Institutional Research Machine learning taught by Dr. Rosanna Overholser, Assistant Professor: Oregon Tech Advice and support from Joseph Reid, Associate Professor, Oregon Tech