Presentation on theme: "scikit-learn Machine Learning in Python Vandana Bachani"— Presentation transcript:
1scikit-learn Machine Learning in Python Vandana Bachani Spring 2012
2Outline What is scikit-learn? How can it be useful to the lab? There are other packages too!FeaturesUsageConclusion
3What is scikit-learn?scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib)A comprehensive package for all machine learning needs.FasterAccuracy? If you have the right data, it is pretty loyal.Ref:
5How can it be useful to the lab? Our daily jobs:Regression/PredictionText ClassificationText Feature ExtractionText Feature SelectionUsing Chi-Square and other metricsCross-ValidationK-FoldClustering (K-Means, etc.)Maybe in future:Image ClassificationAll in one package!
6There are other packages too! NLTKOrangescikit-learnMachine Learning + Text Processing + …Machine Learning + visualizationsMachine Learning + Machine LearningMature (Book exists!)Naïve and sophisticatedNew, Still developingDocumentation – Not so great.Good. Sufficient code examples.Documentation – Very good, but incompleteLacks in functionality (w.r.t ML), old schoolLacks lot of functionality (unsupervised learning)Almost complete w.r.t. machine learning + additional utilitiesGood Metrics SupportComplicated to useEasy to useEasy and intuitive to useRest APINo API support
7Features Linear Models Regression (Predicting Continuous Values) Example: Prices of houses (Boston house dataset)Linear, Ridge, Lasso (for sparse coefficients, useful in field of compressed sensing), LARS (very-high dimensional data), BayesianClassificationLogistic Regression, Stochastic Gradient Descent