B.Ramamurthy. Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference.

Slides:



Advertisements
Similar presentations
Supervised Learning Recap
Advertisements

Indian Statistical Institute Kolkata
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
CIS 678 Artificial Intelligence problems deduction, reasoning knowledge representation planning learning natural language processing motion and manipulation.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Lesson 8: Machine Learning (and the Legionella as a case study) Biological Sequences Analysis, MTA.
Basic Data Mining Techniques Chapter Decision Trees.
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Clustering a.j.m.m. (ton) weijters The main idea is to define k centroids, one for each cluster (Example from a K-clustering tutorial of Teknomo, K.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
CS Instance Based Learning1 Instance Based Learning.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Algorithms for Data Analytics Chapter 3. Plans Introduction to Data-intensive computing (Lecture 1) Statistical Inference: Foundations of statistics (Chapter.
Collaborative Filtering Matrix Factorization Approach
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data mining and machine learning A brief introduction.
Inductive learning Simplest form: learn a function from examples
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Algorithms: The Basic Methods Witten – Chapter 4 Charles Tappert Professor of Computer Science School of CSIS, Pace University.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Week 1 - An Introduction to Machine Learning & Soft Computing
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
M Machine Learning F# and Accord.net.
Clustering Unsupervised learning introduction Machine Learning.
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Mining of Massive Datasets Edited based on Leskovec’s from
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Book web site:
Introducing Precictive Analytics
Machine Learning for Computer Security
Semi-Supervised Clustering
Action-Grounded Push Affordance Bootstrapping of Unknown Objects
Machine Learning & Deep Learning
Eick: Introduction Machine Learning
CSE 4705 Artificial Intelligence
Machine Learning I & II.
Data Mining Lecture 11.
Machine Learning Week 1.
Term Definition Examples Data Science Statistics with large data sets
Collaborative Filtering Matrix Factorization Approach
Algorithms for Data Analytics
Statistical Models and Machine Learning Algorithms
Overview of Machine Learning
Softmax Classifier.
Machine Learning Algorithms – An Overview
Basics of ML Rohan Suri.
Statistical Models and Machine Learning Algorithms --Review
Christoph F. Eick: A Gentle Introduction to Machine Learning
Midterm Exam Review.
What is Artificial Intelligence?
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

B.Ramamurthy

Data Analytics (Data Science) EDA Data Intuition/ understand ing Big-data analytics StatsAlgs Discoveries / intelligence Statistical Inference Decisions/ Answers/ Results * *

 Pipelines to prepare data  Three types: 1. Data preparation algorithms such as sorting, workflows 2. Optimization algorithms stochastic gradient descent, least squares… 3. Machine learning algorithms…

 Comes from Artificial Intelligence  No underlying generative process  Build to predict or classify something  Three basic algorithms:  linear regression, k-nn, k-means  We already looked at linear regression as a case study for R/Rstudio  We will start with k-means…

 K-means is unsupervised: no prior knowledge of the “right answer”  Goal of the algorithm Is to determine the definition of the right answer by finding clusters of data  Kind of satisfaction survey data, incident report data,  Assume data {age, gender, income, state, household, size}, your goal is to segment the users.  K-means is the simplest of the clustering algorithms.  Lets understand kmeans using an example.

 {Age, income range, education, skills, social, paid work}  Lets take just the age { 23, 25, 24, 23, 21, 31, 32, 30,31, 30, 37, 35, 38, 37, 39, 42, 43, 45, 43, 45}  Classify this data using K-means  Lets assume K = 3 or 3 groups  Give me a guess of the centroids? Lets assume initial value of centroids to {21, 30, 40}  First lets hand calculate and then use R-Studio

 Supervised ML  You know the “right answers” or at least data that is “labeled”: training set  Set of objects have been classified or labeled (training set)  Another set of objects are yet to be labeled or classified (test set)  Your goal is to automate the processes of labeling the test set.  Intuition behind k-NN is to consider most similar items --- similarity defined by their attributes, look at the existing label and assign the object a label.

Lets look at an example AgeLoan (X1000)Default 2540N 3560N 4580N 20 N 35120N 5218Y 2395Y 4062Y 60100Y 48220Y 33150Y