Presentation is loading. Please wait.

Presentation is loading. Please wait.

Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data.

Similar presentations


Presentation on theme: "Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data."— Presentation transcript:

1

2 Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data Analytics tools Hadoop Data Analytics Application Recommendations

3 Introduction What is data ? What is big data ? Analysis v/s Analytics

4 WHAT IS DATA.. ? Collection of Facts and Statistics

5 CLASSIFICATION OF DATA Structured High degree of organization such as relational database Unstructured Information that is difficult to organize using traditional mechanisms Eg: Facebook, Whatsapp, Gmail WHAT IS DATA.. ? (contd..)

6 WHAT IS BIG DATA Complex and Dynamic 3V 90% of World’s DATA produced in Last 2 year -IBM

7 ANALYTICS Vs ANALYSIS ANALYTICS Extensive use of mathematics & statistics, use of descriptive techniques and predictive models to gain valuable knowledge ANALYSISANALYTICS Why did something happen?What is likely to happen?

8 WHY DATA ANALYTICS ? From Reactive strategy to proactive strategy: Helped in Determining President of America

9 DATA ANALYTICS IN REAL WORLD WALLMART Using predictive analytics to better identify customer preferences on a regional basis and stock their branch locations accordingly

10 REAL WORLD APPLICATIONS (contd..) Medical diagnostics company analyzed and developed first non-intrusive test for predicting coronary artery disease:.  Researchers analyzed over 100 million gene samples  Identified the 23 primary predictive genes for coronary artery disease  The resulting test, known as the “Corus CAD Test,” was recognized as on of the “Top Ten Medical Breakthroughs of 2010” by TIME Magazine

11 Data Analytics terminology Data mining Data Warehousing OLAP Big Data Analytics Business Analytics Descriptive Analytics Predictive Analytics 11

12 PREDICTIVE ANALYTICS Extracting information from existing data sets in order to determine patterns and predict future outcomes and trends Predictive analytics is an enabler of big data Faster, cheaper computers and easier-to-use software

13 PREDICTIVE ANALYTICS ( contd..)

14 What Is Machine Learning 14 Type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. Some Application Of ML  Spam filtering  Topic Spotting  Weather pridiction  Medical diagnosis  Fraud Detection

15 Types Of Machine Learning 15 Supervised learning:

16 Types Of Machine Learning 16 UnSupervised learning:

17 Some Algorithms Used For ML 17 Linear Regression Decision Tree Naïve Byes theorem K-means Algorithm

18 SOME DATA ANALYTICS TOOLS 18

19 R R is a programming language Open Source environment High Availability An interpreted Language Good data handling capability Most advanced graphical capability R support procedural and object oriented programming Get better result faster 19

20 SAS SAS is a commercial software developed by SAS institute It is expensive Easy to learn Good data handling capability SAS releases updates in controlled environment SAS provide dedicated customer support 20

21 DATA ANALYTICS IN CANADIAN RAILWAY 21

22 IBM PURE DATA ANALYTICS TOOLS Fast and Easy Set Up Peta scale user data capacity Better Access to Information Customized Analytics Integrated third party software 3 X faster scan rate 128 GB/sec scan rate per rack 50% greater data capacity per rack 22

23 DATA ANALYTICS PLATFORM 23

24 DATA ANALYTICS PLATFORMS (contd.) Cloudera Cloudera Inc. was founded by big data geniuses from Facebook, Google, Oracle and Yahoo in 2008. First company to develop and distribute Apache Hadoop-based software. Use Cloudera management suite to automate the installation process It uses HDFS component for file system access Centralized metadata architecture 24

25 Hortonworks Hortonworks, founded in 2011, has quickly emerged as one of the leading vendors of Hadoop It is a completely open source platform based on Apache Hadoop for analysing, storing and managing big data It is better than MapReduce in the sense that it will enable inclusion of more data processing frameworks It uses HDFS component for File system access Centralized metadata architecture 25 DATA ANALYTICS PLATFORMS (contd.)

26 HADOOP Apache Hadoop is an open-source software framework written in java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware

27 HDFS Specially designed file system for storing huge data sets with cluster of commodity hardware with streaming access pattern

28 MAP REDUCE Apache Hadoop MapReduce is a framework for processing large data sets in parallel across a Hadoop cluster. Data analysis uses a two step map and reduce process MapReduce is a programming model Google has used successfully is processing its “big-data” sets (~ 20000 peta bytes per day) Users specify the computation in terms of a map and a reduce function

29

30 EXISTING CHALLENGES IN INDIAN RAIL SYSTEM Delays Signaling problem Broken down trail Congestion QoS  One Solution to these problems can be Analysis of BIG Data through Predictive maintenance  Big Data in the Rail industry can be used in Predictive analysis to predicts fault before they happen, thus improving the services

31 PREDICTIVE MAINTENANCE: BIG DATA ON RAILS

32 PREDICTIVE MAINTENACE (contd…) Choose the right system or subsystem for prediction  The prediction possibility zone  Prediction effectiveness zone Identify the required data sets as early as possible. Identify the value-add of PM for maintenance strategies Complement your data science team with rail expertise Look for the right skills when hiring data scientists

33 CHOOSING THE RIGHT SYSTEM OR SUBSYSTEM FOR PREDICTION The prediction possibility zone Prediction effectiveness zone

34 APPLICATION OF DATA ANALYTICS IN INDIAN RAILWAYS

35 Automatic vehicle location

36 PASSENGER INFORMATION SYSTEM

37 AUTOMATED FARE COLLECTION Using ticket vending machine Using smart card that provides access to all type of transit services across multiple operating agencies AFC Analytics provides details of passengers are using systems, identify the trends and help improve the services

38 AUTOMATED PASSENGER COUNTING No of passengers boarding de-boarding each vehicle in a particular Station Rate of Increase of passengers can be predicted over the years by using the recorded data Peak hours in a day and Peak Months in a year can be identified These data can used to provide better services and project evolving ridership trends

39


Download ppt "Table of Contents Introduction Why Data Analytics Data Analytics Terminology Predictive Analytics Data Analytics challenges Data Analytics Platform Data."

Similar presentations


Ads by Google