Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?

Similar presentations


Presentation on theme: "1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?"— Presentation transcript:

1 1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?

2 2 Agenda Example WebApp Problem Definition Big Data Stack Q&A

3 3 DinnerLite !! Awesome Healthy Dinner Recipes

4 4 DinnerLite !! Awesome Healthy Dinner Recipes BIG DATA PROBLEM No Usage Analytics or Stats No Personalized view 1 2

5 5 DinnerLite!! Understanding the problem Audience, Frequency, Output 12 Usage Statistics Internal Batch Mode Graphs/Reports Personalized View External Real-Time Recommendations

6 6 Building the Stack DinnerLite !! Usage Logs ? Internal External

7 7 What solutions are present? Source: http://sranka.files.wordpress.com/2014/01/bigdata.jpg

8 8 What are Big Data Technologies? IngestionPlatformProcessingConsumption

9 9 DinnerLite!! Ingestion PlatformProcessingConsumption Imports Data from Relational Database Kafka Distributed Realtime SqoopStorm/ Spark Streaming Distributed Publish- Subscribe Messaging System

10 10 DinnerLite !! 12 Usage Statistics Kafka Storm Personalized View Kafka Spark Streaming Ingestion PlatformProcessingConsumption

11 11 Building the Stack DinnerLite !! Usage Logs ? Internal External Pub-sub messaging (Kafka) Batch Consumer Real-Time Consumer

12 12 DinnerLite !! PlatformProcessingConsumptionIngestion Distributed Processing HDFS Analysis and Summarization MapReduce / Spark Hive / Shark Distributed File Systems

13 13 DinnerLite !! 12 Usage Statistics HDFS Map Reduce Hive Personalized View HDFS Spark Shark PlatformProcessingConsumptionIngestion

14 14 DinnerLite !! ZookeeperOozie PlatformProcessingConsumptionIngestion Yarn Distributed Workflow Management Distributed Resource Management Distributed Configuration Management

15 15 Building the Stack DinnerLite !! Usage Logs Hadoop / Spark Cluster Internal External Pub-sub messaging (Kafka) Batch ConsumerReal-Time Consumer Pig/Hive HDFS MapReduceSpark Shark Oozie, Yarn

16 16 DinnerLite !! R/ Matlab/ Scikit Mahout/ Weka/ MLLIB PlatformProcessingConsumptionIngestion Data Mining and Data Analysis Tools Data Analysis and Machine Learning Libraries

17 17 DinnerLite !! 12 Usage Statistics R Mahout Personalized View Scikit MLLib PlatformProcessingConsumptionIngestion

18 18 Building the Stack DinnerLite !! Usage Logs Hadoop / Spark Cluster Internal External Pub-sub messaging (Kafka) Batch ConsumerReal-Time Consumer Pig/Hive HDFS MapReduceSpark Shark Oozie, Yarn MahoutMLLib

19 19 DinnerLite!! MySql/PostgresOpenTSDBCassandra/HBase PlatformConsumptionIngestionProcessing NoSql, Distributed, Key Value, Column Stores Time-Series Databases Traditional Relational Databases

20 20 DinnerLite!! 12 Usage Statistics MySql OpenTSDB Personalized View Hbase Memcache PlatformConsumptionIngestionProcessing

21 21 Usage Logs HBase MySQL Real-Time ConsumerBatch Consumer Building the Stack DinnerLite !! Hadoop / Spark Cluster Internal External Pub-sub messaging (Kafka) Pig/Hive HDFS MapReduceSpark Shark Oozie, Yarn MahoutMLLib

22 22 DinnerLite !! Awesome Healthy Dinner Recipes

23 23 Thank You! Sounds Interesting? Box is Hiring!! Source https://www.sac.edu/StudentServices/InternationalStudents/Calendar%20of%20Events/questions-and-answers.jpg

24 24 Extra slides

25 25 What is Big Data? “In God we trust. All others must bring data.” – W. Edwards Deming BIG DATA VolumeVelocityVariabilityVeracity

26 26 Why does it matter? "Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom." - Clifford Stoll Big DataInformationKnowledge Better Products


Download ppt "1 Divya Jain Oct 10 th, 2014 Big Data Products: Where do I start?"

Similar presentations


Ads by Google