Presentation on theme: "Machine Learning in Python Vandana Bachani Spring 2012."— Presentation transcript:
Machine Learning in Python Vandana Bachani Spring 2012
What is scikit-learn? How can it be useful to the lab? There are other packages too! Features Usage Conclusion
scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib)numpyscipymatplotlib A comprehensive package for all machine learning needs. Faster Accuracy? If you have the right data, it is pretty loyal. Ref:
Our daily jobs: ◦ Regression/Prediction ◦ Text Classification ◦ Text Feature Extraction ◦ Text Feature Selection Using Chi-Square and other metrics ◦ Cross-Validation K-Fold ◦ Clustering (K-Means, etc.) Maybe in future: ◦ Image Classification All in one package!
NLTKOrangescikit-learn Machine Learning + Text Processing + … Machine Learning + visualizations Machine Learning + Machine Learning Mature (Book exists!)Naïve and sophisticatedNew, Still developing Documentation – Not so great. Good. Sufficient code examples. Documentation – Very good, but incomplete Lacks in functionality (w.r.t ML), old school Lacks lot of functionality (unsupervised learning) Almost complete w.r.t. machine learning + additional utilities Good Metrics Support Complicated to useEasy to useEasy and intuitive to use Rest APINo API support
Linear Models Regression (Predicting Continuous Values) Example: Prices of houses (Boston house dataset) ◦ Linear, Ridge, Lasso (for sparse coefficients, useful in field of compressed sensing), LARS (very-high dimensional data), Bayesian Classification ◦ Logistic Regression, Stochastic Gradient Descent