1 6/29/2015 XLDB ‘09 Luke Lonergan

Slides:



Advertisements
Similar presentations
A Ridiculously Easy & Seriously Powerful SQL Cloud Database Itamar Haber AVP Ops & Solutions.
Advertisements

Distributed Data Processing
System Center 2012 R2 Overview
Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
Lecture-7/ T. Nouf Almujally
CA Confidential; provided under NDA. © 2014 CA. All rights reserved.2014 Industry Analyst Symposium | 1 Evolving Role of Mainframe in the Dynamic Data.
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
McGraw-Hill/Irwin Copyright © 2008, The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin Copyright © 2008 The McGraw-Hill Companies, Inc.
Alternative: Bitmap Indexing Imagine the following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house.
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. Platform Computing Ken Hertzler VP Product Management.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
Business Intelligence components Introduction. Microsoft® SQL Server™ 2005 is a complete business intelligence (BI) platform that provides the features,
Data Warehouse Toolkit Introduction. Data Warehouse Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An.
Cloud computing Tahani aljehani.
David Besemer, CTO On Demand Data Integration with Data Virtualization.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | OFSAAAI: Modeling Platform Enterprise R Modeling Platform Gagan Deep Singh Director.
Ch 4. The Evolution of Analytic Scalability
Overview Big Data Big Data in Genomics Enter: The Cloud
Microsoft Confidential - Signed NDA Required Windows Azure Executive Vision and Roadmap NAME TITLE Microsoft Corporation.
Understanding Data Warehousing
Systems analysis and design, 6th edition Dennis, wixom, and roth
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
SQL Server 2008 R2 for the DBA Patrick LeBlanc. Objectives  New Editions  Datacenter  Parallel Data Warehouse  Multi-server management  Utility Control.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
Source: Alex Szalay. Example: Sloan Digital Sky Survey The SDSS telescope array is systematically mapping ¼ of the entire sky Discoveries are made by.
© 2010 Cisco and/or its affiliates. All rights reserved. 1 Managing Microsoft Applications with Cisco UCS Manager & PowerTool.
Iran Hutchinson.  I work for InterSystems who drives the new NoSQL project. 
Datawarehouse A sneak preview. 2 Data Warehouse Approach An old idea with a new interest: Cheap Computing Power Special Purpose Hardware New Data Structures.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Copy Data Management Templates for Automating Workflows Catalogic ECX Manage, Orchestrate and Analyze Copy Data to Enable ‘Self-Service’ Data Management.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
© 2015 MetricStream, Inc. All Rights Reserved. AWS server provisioning © 2015 MetricStream, Inc. All Rights Reserved. By, Srikanth K & Rohit.
Short Customer Presentation September The Company  Storgrid delivers a secure software platform for creating secure file sync and sharing solutions.
© 2011 VMware Inc. All rights reserved Introducing Cloud Foundry TM The first open platform as a service.
McGraw-Hill/Irwin ©2008,The McGraw-Hill Companies, All Rights Reserved Chapter 5 Data Resource Management.
1 Cloud-Native Data Warehousing Bob Muglia. 2 Scenarios with affinity for cloud Gartner 2016 Predictions: By 2018, six billion connected things will be.
Designing Cisco Data Center Unified Fabric
© 2009 Oracle Corporation – Proprietary and Confidential Agenda Reporting Overview Performance Workspace Dashboards Reports Drill thru Smartview Excel.
Data Platform and Analytics Foundational Training
Big Data Enterprise Patterns
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
WHY IDEAL ANALYTICS?.
Informix Red Brick Warehouse 5.1
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
Operationalize your data lake Accelerate business insight
Welcome! Power BI User Group (PUG)
September 11, Ian R Brooks Ph.D.
Enterprise Application Stores
DESIGN & IMPLEMENTATION
Welcome! Power BI User Group (PUG)
Outline Virtualization Cloud Computing Microsoft Azure Platform
Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.
Ch 4. The Evolution of Analytic Scalability
Managing batch processing Transient Azure SQL Warehouse Resource
Emerging technologies-
Big-Data Analytics with Azure HDInsight
OpenStack for the Enterprise
SQL Server 2019 Bringing Apache Spark to SQL Server
Architecture of modern data warehouse
Presentation transcript:

1 6/29/2015 XLDB ‘09 Luke Lonergan

“Big” numbers for GP today 70K/day - Query Rate 6.5PB – Dataset Size +100GB/s – Analysis Rate +3GB/s – Net Loading Rate 100,000/s – Transaction Rate 56 TB / kW, 1.6 GB/s/kW – Power Rate 100s – Number of Data/Compute nodes 6/29/2015 2

Things I’ve Heard Tiered computing – Organizational / Political / Geographic boundaries require it Metadata computing for HEP – “10TB sounds small but it’s not easy” Processing for Radio Astronomy, HEP – Data intensive computing – Requires an efficient pipeline from raw to consumables 6/29/2015 3

Thoughts A lot of plumbing! Moving data around, pipeline processing – Core engine should do this so the plumbing isn’t done over and over Need for specialized access methods and storage classes “Computing in data” is key to success 6/29/2015 4

GP Basic Features Access Methods – Compression, Column Store, Heap Store, External Tables, Indexes (GIST, GIN, Rtree, Bitmap, B-Tree, …) – Network Ingest / Export directly into parallel pipeline – Logical Partitioning by Range, List Parallel Programming Languages – SQL 2003 with Analytics – Map Reduce in Perl, Python, C, SQL, … – PL/R,python,perl,C,pgSQL,SQL, … 6/29/2015 5

From Enterprise Data Clouds Elastic / adaptive infrastructure for data warehousing and analytics – IT Operations deploy pools of low-cost commodity infrastructure Physical servers, virtual infrastructure, or onramp to public cloud – DBAs and Analysts provision sandboxes and warehouses in minutes Assemble the data they need (common, private, etc) for agile analytics 6/29/20156 Proprietary & Confidential DBA Analyst Consumer Division Packaged Goods Finance Free Free Free Infrastructure Warehouses IT Operations

Use Case: Big Telco Data Mart Consolidation 6/29/20157 Proprietary & Confidential Goals: Reduce maintenance and support costs from proliferation of data mart platforms Reduce risks and exposure due to data in shadow IT systems Break down silo walls - provide a unified way to find and access all data Approach: Embrace data – encourage ‘physical consolidation’ in advance of data model unification Provide ‘self serve’ model to bring shadow IT into the light Allow unified data access and pragmatic ‘logical’ data model unification incrementally Data Sources US- West 100 nodes X X X X X X X X X

Use Case: Big Ad Network Project Sandboxes 6/29/20158 Proprietary & Confidential Goals: Remove IT barriers to analyst productivity and value creation Dramatically reduce IT resource constraints and delays – i.e. realize ideas sooner Combine centralized ‘EDW’ data with freshly discovered feeds and other useful sources Approach: Self-serve creation of project warehouses in minutes – and elastically expand as needed Load new data feeds without requiring formal modeling Bring together any data within the EDC – even if globally distributed – and analyze US – West 200 nodes Europe 100 nodes Asia 200 nodes Free Free Free US- East 100 nodes Analyst’s New Warehouse Analyst’s Private Data Feed Analyst’s Private Data Feed EDC Self-Serve Dashboard

GP is Software – Develop Now Download at: – Gpn.greenplum.com – Get the VMWare image or use it on OSX, Linux, Solaris 6/29/2015 9

Think Big. Think Fast.