Basic machine learning background with Python scikit-learn

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Ensemble Learning: An Introduction
Sparse vs. Ensemble Approaches to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Dimensionality reduction
Data Mining and Decision Support
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
BSP: An iterated local search heuristic for the hyperplane with minimum number of misclassifications Usman Roshan.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks and support vector machines
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Support vector machines
CS 9633 Machine Learning Support Vector Machines
Support Feature Machine for DNA microarray data
Bagging and Random Forests
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Clustering Usman Roshan.
Trees, bagging, boosting, and stacking
Dimensionality reduction
Classification with Perceptrons Reading:
Neural Networks and Backpropagation
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Machine Learning Week 1.
Data Mining Practical Machine Learning Tools and Techniques
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
PCA, Clustering and Classification by Agnieszka S. Juncker
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Lecture 18: Bagging and Boosting
COSC 4335: Other Classification Techniques
Ensemble learning.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Support vector machines
Concave Minimization for Support Vector Machine Classifiers
Neural networks (3) Regularization Autoencoder
Usman Roshan Dept. of Computer Science NJIT
Clustering Usman Roshan CS 675.
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
What is Artificial Intelligence?
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Basic machine learning background with Python scikit-learn Usman Roshan

Python An object oriented interpreter-based programming language Basic essentials Data types are numbers, strings, lists, and dictionaries (hash-tables) For loops and conditionals Functions Lists and hash-tables are references (like pointers in C) All variables are passed by value

Python Simple Python programs

Data science Data science Basic machine learning problems Simple definition: reasoning and making decisions from data Contains machine learning, statistics, algorithms, programming, big data Basic machine learning problems Classification Linear methods Non-linear Feature selection Clustering (unsupervised learning) Visualization with PCA

Python scikit-learn Popular machine learning toolkit in Python http://scikit-learn.org/stable/ Requirements Anaconda Available from https://www.continuum.io/downloads Includes numpy, scipy, and scikit-learn (former two are necessary for scikit-learn)

Data We think of data as vectors in a fixed dimensional space For example

Classification Widely used task: given data determine the class it belongs to. The class will lead to a decision or outcome. Used in many different places: DNA and protein sequence classification Insurance Weather Experimental physics: Higgs Boson determination

Linear models We think of a linear model as a hyperplane in space The margin is the minimum distance of all closest point (misclassified have negative distance) The support vector machine is the hyperplane with largest margin y x

Support vector machine: optimally separating hyperplane In practice we allow for error terms in case there is no hyperplane.

Optimally separating hyperplane with errors       w x

SVM on simple data Run SVM on example data shown earlier Solid line is SVM and dashed indicates margin

SVM in scikit-learn Dataset are taken from the UCI machine learning repository Learn an SVM model on training data Which parameter settings? C: tradeoff between error and model complexity (margin) max_iter: depth of the gradient descent algorithm Predict on test data

SVM in scikit-learn Analysis of SVM program on breast cancer data

Non-linear classification In practice some datasets may not be classifiable. Remember this may not be a big deal because the test error is more important than the train one

Non-linear classification Neural networks Create a new representation of the data where it is linearly separable Large networks leads to deep learning Decision trees Use several linear hyperplanes arranged in a tree Ensembles of decision trees are state of the art such as random forest and boosting

Decision tree From Alpaydin, 2010

Combining classifiers by bagging A single decision tree can overfit the data and have poor generalization (high error on test data). We can relieve this by bagging Bagging Randomly sample training data by bootstrapping Determine classifier Ci on sampled data Goto step 1 and repeat m times For final classifier output the majority vote Similar to tree bagging Compute decision trees on bootstrapped datasets Return majority vote

Variance reduction by voting What is the variance of the output of k classifiers? Thus we want classifiers to be independent to minimize variance Given independent binary classifiers each with accuracy > ½ the majority vote accuracy increases as we increase the number of classifiers (Hansen and Salamon, IEEE Transactions of Pattern Analysis and Machine Intelligence, 1990)

Random forest In addition to sampling datapoints (feature vectors) we also sample features (to increase independence among classifiers) Compute many decision trees and output majority vote Can also rank features Alternative to bagging is to select datapoints with different probabilities that change in the algorithm (called boosting)

Decision tree and random forest in scikit-learn Learn a decision tree and random forest on training data Which parameter settings? Decision tree: Depth of tree Random forest: Number of trees Percentage of columns Predict on test data

Decision tree and random forest in scikit-learn

Data projection What is the mean and variance here?

Data projection Which line maximizes variance?

Data projection Which line maximizes variance?

Principal component analysis Find vector w of length 1 that maximizes variance of projected data

PCA optimization problem

Dimensionality reduction and visualization with PCA

Dimensionality reduction and visualization with PCA PCA plot of breast cancer data (output of program in previous slide)

Unsupervised learning - clustering K-means: popular fast program for clustering data Objective: find k clusters that minimize Euclidean distance of points in each cluster to their centers (means)

K-means algorithm for two clusters Input: Algorithm: Initialize: assign xi to C1 or C2 with equal probability and compute means: Recompute clusters: assign xi to C1 if ||xi-m1||<||xi-m2||, otherwise assign to C2 Recompute means m1 and m2 Compute objective Compute objective of new clustering. If difference is smaller than then stop, otherwise go to step 2.

K-means in scikit-learn

K-means PCA plot in scikit-learn

K-means PCA plot in scikit-learn PCA plot of breast cancer data colored by k-means labels PCA plot of breast cancer data colored by true labels

Conclusion We saw basic data science and machine learning tasks in Python scikit-learn Can we handle very large datasets in Python scikit-learn? Yes For space use array from numpy to use a bytes for a char and 4 for float and int. Otherwise more space is used because Python is object oriented For speed use stochastic gradient descent in scikit-learn and mini-batch k-means Deep learning stuff: Keras and Theano