Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya.

Similar presentations


Presentation on theme: "Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya."— Presentation transcript:

1 Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

2 Overview of predictive analytics & data mining How Microsoft supports predictive analytics How Mahout fits into the picture Demos Agenda

3 Data Mining

4 Recommenda- tion engines Advertising analysis Weather forecasting for business planning Social network analysis IT infrastructure and web app optimization Legal discovery and document archiving Pricing analysis Fraud detection Churn analysis Equipment monitoring Location-based tracking and services Personalized Insurance Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later*

5 Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more Rich developer experience

6

7 Ease of use through Excel Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more Scalable through integration with SSAS

8 MenuData Mining Analyze Key InfluencersNaïve Bayes Detect CategoriesClustering Fill From ExampleLogical Regression ForecastTime Series Highlight ExceptionsClustering Scenario Analysis – Goal SeekLogical Regression Scenario Analysis – What IfLogical Regression Prediction CalculatorLogical Regression Shopping Basket AnalysisAssociation Rules

9 Windows Azure HDInsight Microsoft Excel (Mining Add-in) Microsoft Excel Excel Data Mining Add-in Serving LayerSpeed LayerBatch Layer Flat files (.txt,.dat,.xlsx, etc.)

10 Mahout

11 Scalable machine learning algorithms on Hadoop platform Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies ClusteringRecommenders Vector Similarity Pattern Mining Classification RegressionGenetic Dimension Reduction Matrices Collocations

12 Flat files (.txt,.dat,.xlsx, etc.) Running Mahout job on Hadoop Command Window to get output file Convert to Mahout input Hadoop Command Window Output file Serving LayerSpeed LayerBatch Layer Windows Azure HDInsight HDInsight Consoles

13

14 Questions?

15


Download ppt "Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya."

Similar presentations


Ads by Google