Zhangxi Lin, The Rawls College,

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

1 1 Apache Hadoop and the Emergence of the Enterprise Data Hub Eli Collins, Chief Technologist ©2014 Cloudera, Inc. All rights reserved.
Time Series Data Repository (TSDR)
Advance Analytics Capabilities
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
Ch1: File Systems and Databases Hachim Haddouti
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
FROM DATA STORE TO DATA SERVICES - DEVELOPING SCALABLE DATA ARCHITECTURE AT SURS Tomaž Špeh UNECE Workshop on the Modernisation of Statistical Production.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA ebay
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Information Explosion. Reality: New Machine-Generated Data Non-relational and relational data outside of the EDW † Source: Analytics Platforms – Beyond.
Digital Filing A Simple Way to Digitally Centralize and Distribute Documents.
What is Big Query?.
TACTIC | Workflow: Project Management OSS on Microsoft Azure Helps Enterprises to Create Streamline, Manage, and Track Digital Content MICROSOFT AZURE.
SAM for SQL Workloads Presenter Name.
Advanced Database Concepts
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Energy Management Solution
DATA Storage and analytics with AZURE DATA LAKE
Big Data & Test Automation
Building a Data Warehouse
Data Lake and HAWQ Integration
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
Chapter 6 Foundations of Business Intelligence: Databases and Information Management.
Introduction To DBMS.
Data Platform and Analytics Foundational Training
Big Data Enterprise Patterns
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
The Hadoop Sandbox The Playground for the Future of Your Career
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
Energy Management Solution
Data Warehouse.
Pentaho 7.1.
Operationalize your data lake Accelerate business insight
Analytics for Cloud ERP
Chapter 6 Foundations of Business Intelligence: Databases and Information Management.
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Microsoft SQL Server 2008 Reporting Services
Through the Microsoft Azure Platform, TARGIT Decision Suite Enables Organizations to Analyze Critical Data, Giving Them the Courage to Act MICROSOFT AZURE.
Microsoft Connect /22/2018 9:50 PM
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
C.U.SHAH COLLEGE OF ENG. & TECH.
Overview of big data tools
DataMart (Data Warehouse) Tool:
Data warehouse.
Project Goals Collect and permanently store the data flowing around ONAP system into several Big Data storages, each in different category. Also serve.
OLAP in DWH Ján Genči PDT.
Data Warehouse.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Chapter 6 Foundations of Business Intelligence: Databases and Information Management.
Big Data Analysis in Digital Marketing
Big DATA.
CAD DESK PRIMAVERA PRESENTATION.
Customer 360.
UNIT 6 RECENT TRENDS.
SQL Server 2019 Bringing Apache Spark to SQL Server
Architecture of modern data warehouse
Big Data.
Presentation transcript:

Zhangxi Lin, The Rawls College, 2016-03-31 Data Lake & Data Hub Zhangxi Lin, The Rawls College, 2016-03-31

Data Warehouse Popular for business intelligence tasks, and being replaced by less-structured Data Lakes which allow more flexibility. The limitation of data warehouses is that they store data from various sources in some specific static structures and categories that dictate the kind of analysis that is possible on that data, at the very point of entry. While this was sufficient during the early stages of evolution of business intelligence where analysis was primarily done on proprietary databases and the scope was restricted to the canned reports, dashboards with limited and pre-defined interaction paths. This approach has started to fall apart in the world of big data discovery where it is very difficult to ascertain upfront all the intelligence and insights one would be able to derive from the variety of different sources, including proprietary databases, files, 3rd party tools to social media and web, that keep cropping up on a regular basis.

Data Lake A large-scale storage repository and processing engine. Provides "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs“ The term was coined by James Dixon, Pentaho chief technology officer. Dixon used the term initially to contrast with "data mart", which is a smaller repository of interesting attributes extracted from the raw data. One example of a data lake is the distributed file system, Apache Hadoop.

Top Five Differences between Data Lakes and Data Warehouses Retain all data Support all data types Support all users Adapt easily to changes Provide faster insights

Data Hub A collection of data from multiple sources organized for distribution, sharing, and often subsetting and sharing. Generally this data distribution is in the form of a hub and spoke architecture. A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services. A Data Lake tends to store data in one place for availability, and allow/require the consumer to process or add value to the data.

Turn ‘Data Lake’ into an Enterprise Data Hub Using Hadoop as a “data lake” — a scalable data repository built on the cheap-and-deep HDFS (Hadoop Distributed File System) storage economics — to capture data from anywhere, and in any format, for future analysis. As Hadoop deployments shift from proof-of-concept sandbox experiments to enterprise-grade, mission-critical production solutions, they take on new workloads, and those workloads need all the power and all the flexibility of those ecosystem components listed above. Customers with existing investments in non-HDFS data lakes are just as excited about attacking new analytic and processing workloads as everyone else. Setting up an alternative HDFS-based Hadoop cluster using Direct Attached Storage (DAS) would mean copying data from the existing NAS-based data lake into a separate Hadoop installation. Copying is expensive; copying terabytes or petabytes is prohibitively so.

Enterprise Data Hub