1 1 Apache Hadoop and the Emergence of the Enterprise Data Hub Eli Collins, Chief Technologist ©2014 Cloudera, Inc. All rights reserved.

Slides:



Advertisements
Similar presentations
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Advertisements

© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Evaluation of distributed open source solutions in CERN database use cases HEPiX, spring 2015 Kacper Surdy IT-DB-DBF M. Grzybek, D. L. Garcia, Z. Baranowski,
An Information Architecture for Hadoop Mark Samson – Systems Engineer, Cloudera.
WebSphere -DB2 Integration Web Browser Web Server (Apache) WebSphere –JSP/Servlet/EJB DB2 JDBC, SQL HTTP.
Securing Native Big Data Deployments Steven C. Markey, MSIS, PMP, CISSP, CIPP/US, CISM, CISA, STS-EV, CCSK, Cloud + Principal, nControl, LLC Adjunct Professor.
Architecting for the Internet of Things
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Components of the Data Warehouse Michael A. Fudge, Jr.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Streams – DataStage Integration InfoSphere Streams Version 3.0
Apache Spark and the future of big data applications Eric Baldeschwieler.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
1 Apache Spark and Its Role in the Enterprise Data Hub Mike Olson, Chief Strategy Officer,
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
PANEL SENIOR BIG DATA ARCHITECT BD-COE
1 © Cloudera, Inc. All rights reserved. Alexander Bibighaus| Director of Engineering, Cloudera, Inc. The Future of Data Management with Hadoop and the.
Nov 2006 Google released the paper on BigTable.
© 2012 IBM Corporation Converting Big Data into Big Knowledge.
1 © Cloudera, Inc. All rights reserved. Engines, Algorithms, and Data Models Josh Wills | Senior Director of Data Science From Dimensional Modeling to.
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS GRID OPUS SPRING 2014 MEETING FRANK SCOTT, SAS CANADA.
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
LIMPOPO DEPARTMENT OF ECONOMIC DEVELOPMENT, ENVIRONMENT AND TOURISM The heartland of southern Africa – development is about people! 2015 ICT YOUTH CONFERENCE.
C Copyright © 2007, Oracle. All rights reserved. Introduction to Data Warehousing Fundamentals.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Slide 1 Data Warehousing in CIM  2000 YourNameHere Data Warehousing in Computer Integrated Manufacturing Steve Daino IEM 5303.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions.
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Data Integration - The ETL Process Module 4: BIC#4 – Data Integration Capability Populating Data Warehouse (Data Mart) 1.
3 Hadoop? Cloud data warehousing? Machine learning? NoSQL?
نمايندگي استان يزد. نمايندگي استان يزد طراحی کسب و کار الکترونیکی ارائه کننده : محسن افسر قره باغ.
Business Insights Play briefing deck.
Energy Management Solution
ETL Validator Deployment Options
OMOP CDM on Hadoop Reference Architecture
Big Data 101 Seriously, it is just 101
Organizations Are Embracing New Opportunities
Data Platform and Analytics Foundational Training
Cortana Intelligence Suite Workshop
Big Data is a Big Deal!.
PROTECT | OPTIMIZE | TRANSFORM
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Data Platform and Analytics Foundational Training
Big Data Management – Fall 2016
Melbourne Azure Meetup
Zhangxi Lin, The Rawls College,
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Data Platform and Analytics Foundational Training
Energy Management Solution
Data Warehouse.
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Hybrid Cloud Strategies for Big Data
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Data Warehouse and OLAP
Managing batch processing Transient Azure SQL Warehouse Resource
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
McGraw-Hill Technology Education
Big-Data Analytics with Azure HDInsight
Data Warehouse and OLAP
Introduction to Azure Data Lake
Data Wrangling for ETL enthusiasts
Best Practices in Higher Education Student Data Warehousing Forum
SQL Server 2019 Bringing Apache Spark to SQL Server
Big Data Clusters SQL Server 2019 Meets Big Data
Presentation transcript:

1 1 Apache Hadoop and the Emergence of the Enterprise Data Hub Eli Collins, Chief Technologist ©2014 Cloudera, Inc. All rights reserved.

2 2 The Enterprise Data Warehouse ©2014 Cloudera, Inc. All rights reserved. Flat Files Operational Store Data Sources Staging Reporting Analysis Mining Operational Store Metadata Summary Facts & Dimensions EDW Archive Data marts

3 3 The Enterprise Data Hub ©2014 Cloudera, Inc. All rights reserved. images logs binary DB dumps 1.Inexpensive storage 2.Flexible storage 3.Co-located compute 4.Multiple compute engines MR, Pig/Hive, SQL, Spark, SAS, R, Search, Graph..

4 ©2014 Cloudera, Inc. All rights reserved.4 So it’s Like a Data Warehouse?

5 5©2014 Cloudera, Inc. All rights reserved. An Analogy

6 6©2014 Cloudera, Inc. All rights reserved. What changed? The need? Convenience? Cost?

7 Take and share good photos

8 Data Warehouse vs. Data Hub ©2014 Cloudera, Inc. All Rights Reserved. Enterprise Data Warehouse Enterprise Data Hub

©2014 Cloudera, Inc. All Rights Reserved. 9 An Operating System APP SCHEDULER FILE SYSTEM MGT SERVICES APP LIB APP 3rd PARTY APP

©2014 Cloudera, Inc. All Rights Reserved. 10 An Enterprise Data Hub BATCH PROCESSING ANALYTIC SQL SEARCH ENGINE MACHINE LEARNING STREAM PROCESSING 3 RD PARTY APPS WORKLOAD MANAGEMENT STORAGE FOR ANY TYPE OF DATA UNIFIED, ELASTIC, RESILIENT, SECURE DATA MANAGEMENT SYSTEM MANAGEMENT FilesystemOnline NoSQL

11 Data Warehousing with an EDH ©2014 Cloudera, Inc. All rights reserved. Flat Files Operational Store Data Sources EDH Reporting Analysis Mining Operational Store EDW 1. Stage, transform, archive 3. Exploratory, Discovery, Search, ML.. 2. Reporting, Mining, Analysis

12 ©2014 Cloudera, Inc. All rights reserved.12

13 ©2014 Cloudera, Inc. All rights reserved.

©2014 Cloudera, Inc. All Rights Reserved. 14 Data Warehousing in Cloudera’s EDH

15 ©2014 Cloudera, Inc. All rights reserved.15

16 ©2014 Cloudera, Inc. All rights reserved.