Our Data Science Roadmap

Slides:

Advertisements

Similar presentations

Florida International University COP 4770 Introduction of Weka.

Advertisements

...visualizing classifier performance in R Tobias Sing, Ph.D. (joint work with Oliver Sander) Modeling & Simulation Novartis Pharma AG 3 rd BaselR meeting.

Predicting Risk of Re-hospitalization for Congestive Heart Failure Patients (in collaboration with ) Jayshree Agarwal Senjuti Basu Roy, Ankur Teredesai,

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,

SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.

25.All-Pairs Shortest Paths Hsu, Lih-Hsing. Computer Theory Lab. Chapter 25P.2.

1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.

Final Review for CS 562. Final Exam on December 18, 2014 in CAS 216 Time: 3PM – 5PM (~2hours) OPEN NOTES, SLIDES, BOOKS Study the topics that we covered.

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.

Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.

The Fish4Knowledge Project Disclosing Computer Vision Errors to End-Users Emma Beauxis-Aussalet, Lynda Hardman, Jacco Van Ossenbruggen, Jiyin He, Elvira.

Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =

10/31/2015B.Ramamurthy1 Final Review CSE487/587 B.Ramamurthy.

Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

1/3/2016B.Ramamurthy1 Final Review CSE487/587 B.Ramamurthy.

An Exercise in Machine Learning

ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.

***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.

ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.

Enhancing Tor’s Performance using Real- time Traffic Classification By Hugo Bateman.

Lecture 00: Introduction

Danny Hendler Advanced Topics in on-line Social Networks Analysis

Data Mining Introduction to Classification using Linear Classifiers

Elizabeth R McMahon 14 April 2017

Evolving Decision Rules (EDR)

Name: Sushmita Laila Khan Affiliation: Georgia Southern University

Machine Learning – Classification David Fenyő

Performance Evaluation 02/15/17

Prepared by: Mahmoud Rafeek Al-Farra

Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data.

Final Review CSE487 B.Ramamurthy 7/30/2018 B.Ramamurthy.

Summary Tel Aviv University 2016/2017 Slava Novgorodov

DATA ANALYTICS AND TEXT MINING

Lecture Notes for Chapter 4 Introduction to Data Mining

Data Mining Classification: Alternative Techniques

Features & Decision regions

Naïve Bayes CSE651 6/7/2014.

TED Talks – A Predictive Analysis Using Classification Algorithms

Prepared by: Mahmoud Rafeek Al-Farra

Evaluation and Its Methods

Evaluating Classifiers (& other algorithms)

Pattern Recognition and Image Analysis

Data-intensive Computing - Review

Final Exam Review CSE487/587.

Evaluating Models Part 1

Machine Learning in Practice Lecture 7

Midterm Review CSE4/587 B.Ramamurthy 4/4/2019 4/4/2019 B.Ramamurthy

CSE 491/891 Lecture 25 (Mahout).

Midterm Review CSE4/587 B.Ramamurthy 4/8/2019 4/8/2019 B.Ramamurthy

CSE486/586 Distributed Systems

Dr. Sampath Jayarathna Cal Poly Pomona

Summary Tel Aviv University 2017/2018 Slava Novgorodov

Evaluation and Its Methods

Roc curves By Vittoria Cozza, matr

Our Data Science Roadmap

Evaluating Classifiers

Assignment 1: Classification by K Nearest Neighbors (KNN) technique

The Student’s Guide to Apache Spark

Machine Learning: Methodology Chapter

Dr. Sampath Jayarathna Cal Poly Pomona

Evaluation and Its Methods

Midterm Exam Review.

COSC 4368 Intro Supervised Learning Organization

ECE – Pattern Recognition Lecture 8 – Performance Evaluation

Igor Stančin, Alan Jović to: {igor.stancin,

Lecturer: Geoff Hulten TAs: Alon Milchgrub, Andrew Wei

Presentation transcript:

Our Data Science Roadmap Raw data collected Exploratory data analysis EDA R/Rstudio+ Machine learning algorithms; Statistical models Spark ML Build data products Communication Visualization Report Findings Make decisions Data is processed Data is cleaned Big data methods MapReduce CSE4/587 B. Ramamurthy 11/10/2018

Topics for Final Exam Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer Ch. 2, 3 upto p.57 Ch. 5 Text processing, MR, and graph processing including shortest path and page rank Lab 2 MR usage details Naïve Bayes and Bayesian Classification (Class notes) Study Field Cady’s text: Chapter 6,7 and 8: focus on Bayes, logistic regressions and evalution Apache Spark RDD paper by Zaharia et al Motivation for Spark Spark APIs Lab3 details CSE4/587 B. Ramamurthy 11/10/2018

Topics for Final Exam Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer Ch. 2, 3 upto p.57 Ch. 5 Text processing, MR, and graph processing including shortest path and page rank Lab 2 MR usage details Naïve Bayes and Bayesian Classification (Class notes) Apache Spark RDD paper by Zaharia et al Motivation for Spark Spark APIs Lab3 details CSE4/587 B. Ramamurthy 11/10/2018

Confusion Matrix Evaluating and comparing performance of prediction classifiers. Confusion matrix: Only binary confusion matrix In the next slide I have shown an easy way to remember the various metrics The slide after than shows a sample computation. Lets explore CSE4/587 B. Ramamurthy 11/10/2018

Classified Positive Classified Negative Actual Positive TP FN Sensitivity= TP/(TP+FN) Actual Negative FP TN Specificity= TN/(FP+TN) Misclassification Rate= (FN+FP)/Total Precision= TP/(TP+FP) Accuracy = (TP+TN)/Total

Total = 200 Classified Positive Classified Negative Actual Positive 60 10 Sensitivity= TP/(TP+FN)= 60/70 Actual Negative 5 125 Specificity= TN/(FP+TN) =125/130 Mis-classification Rate= (FN+FP)/Total= 15/200 Precision= TP/(TP+FP) =60/65 Accuracy = (TP+TN)/Total =185/200 Prevalence = 70/200 = 35%

Final exam format 6 questions (15-20 points each) Closed book and closed notes Classification 1: Naïve Bayes Classification 2 : Logistic regression Spark given code—interpret MapReduce synthesis: Graph algorithms problem solve: write pseudo code MaReduce analysis: pagerank: simulate Evaluate performance of classification: (Binary) confusion matrix