Our Data Science Roadmap

Our Data Science Roadmap
Raw data collected Exploratory data analysis EDA R/Rstudio+ Machine learning algorithms; Statistical models Spark ML Build data products Communication Visualization Report Findings Make decisions Data is processed Data is cleaned Big data methods MapReduce CSE4/587 B. Ramamurthy 11/10/2018

Topics for Final Exam Data-Intensive Text Processing with MapReduce
by Jimmy Lin and Chris Dyer Ch. 2, 3 upto p.57 Ch. 5 Text processing, MR, and graph processing including shortest path and page rank Lab 2 MR usage details Naïve Bayes and Bayesian Classification (Class notes) Study Field Cady’s text: Chapter 6,7 and 8: focus on Bayes, logistic regressions and evalution Apache Spark RDD paper by Zaharia et al Motivation for Spark Spark APIs Lab3 details CSE4/587 B. Ramamurthy 11/10/2018

Topics for Final Exam Data-Intensive Text Processing with MapReduce
by Jimmy Lin and Chris Dyer Ch. 2, 3 upto p.57 Ch. 5 Text processing, MR, and graph processing including shortest path and page rank Lab 2 MR usage details Naïve Bayes and Bayesian Classification (Class notes) Apache Spark RDD paper by Zaharia et al Motivation for Spark Spark APIs Lab3 details CSE4/587 B. Ramamurthy 11/10/2018

Confusion Matrix Evaluating and comparing performance of prediction classifiers. Confusion matrix: Only binary confusion matrix In the next slide I have shown an easy way to remember the various metrics The slide after than shows a sample computation. Lets explore CSE4/587 B. Ramamurthy 11/10/2018

Classified Positive Classified Negative Actual Positive TP FN Sensitivity= TP/(TP+FN) Actual Negative FP TN Specificity= TN/(FP+TN) Misclassification Rate= (FN+FP)/Total Precision= TP/(TP+FP) Accuracy = (TP+TN)/Total

Total = 200 Classified Positive Classified Negative Actual Positive 60 10 Sensitivity= TP/(TP+FN)= 60/70 Actual Negative 5 125 Specificity= TN/(FP+TN) =125/130 Mis-classification Rate= (FN+FP)/Total= 15/200 Precision= TP/(TP+FP) =60/65 Accuracy = (TP+TN)/Total =185/200 Prevalence = 70/200 = 35%

Final exam format 6 questions (15-20 points each)
Closed book and closed notes Classification 1: Naïve Bayes Classification 2 : Logistic regression Spark given code—interpret MapReduce synthesis: Graph algorithms problem solve: write pseudo code MaReduce analysis: pagerank: simulate Evaluate performance of classification: (Binary) confusion matrix

Our Data Science Roadmap

Similar presentations

Presentation on theme: "Our Data Science Roadmap"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Our Data Science Roadmap

Similar presentations

Presentation on theme: "Our Data Science Roadmap"— Presentation transcript:

Similar presentations

About project

Feedback