Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 6/29/2015 XLDB ‘09 Luke Lonergan

Similar presentations


Presentation on theme: "1 6/29/2015 XLDB ‘09 Luke Lonergan"— Presentation transcript:

1 1 6/29/2015 XLDB ‘09 Luke Lonergan llonergan@greenplum.com

2 “Big” numbers for GP today 70K/day - Query Rate 6.5PB – Dataset Size +100GB/s – Analysis Rate +3GB/s – Net Loading Rate 100,000/s – Transaction Rate 56 TB / kW, 1.6 GB/s/kW – Power Rate 100s – Number of Data/Compute nodes 6/29/2015 2

3 Things I’ve Heard Tiered computing – Organizational / Political / Geographic boundaries require it Metadata computing for HEP – “10TB sounds small but it’s not easy” Processing for Radio Astronomy, HEP – Data intensive computing – Requires an efficient pipeline from raw to consumables 6/29/2015 3

4 Thoughts A lot of plumbing! Moving data around, pipeline processing – Core engine should do this so the plumbing isn’t done over and over Need for specialized access methods and storage classes “Computing in data” is key to success 6/29/2015 4

5 GP Basic Features Access Methods – Compression, Column Store, Heap Store, External Tables, Indexes (GIST, GIN, Rtree, Bitmap, B-Tree, …) – Network Ingest / Export directly into parallel pipeline – Logical Partitioning by Range, List Parallel Programming Languages – SQL 2003 with Analytics – Map Reduce in Perl, Python, C, SQL, … – PL/R,python,perl,C,pgSQL,SQL, … 6/29/2015 5

6 From Enterprise Data Clouds Elastic / adaptive infrastructure for data warehousing and analytics – IT Operations deploy pools of low-cost commodity infrastructure Physical servers, virtual infrastructure, or onramp to public cloud – DBAs and Analysts provision sandboxes and warehouses in minutes Assemble the data they need (common, private, etc) for agile analytics 6/29/20156 Proprietary & Confidential DBA Analyst Consumer Division Packaged Goods Finance 40 8 8 16 120 Free 16 68 Free 96 40 64 Free Infrastructure Warehouses IT Operations

7 Use Case: Big Telco Data Mart Consolidation 6/29/20157 Proprietary & Confidential Goals: Reduce maintenance and support costs from proliferation of data mart platforms Reduce risks and exposure due to data in shadow IT systems Break down silo walls - provide a unified way to find and access all data Approach: Embrace data – encourage ‘physical consolidation’ in advance of data model unification Provide ‘self serve’ model to bring shadow IT into the light Allow unified data access and pragmatic ‘logical’ data model unification incrementally Data Sources US- West 100 nodes X X X X X X X X X

8 Use Case: Big Ad Network Project Sandboxes 6/29/20158 Proprietary & Confidential Goals: Remove IT barriers to analyst productivity and value creation Dramatically reduce IT resource constraints and delays – i.e. realize ideas sooner Combine centralized ‘EDW’ data with freshly discovered feeds and other useful sources Approach: Self-serve creation of project warehouses in minutes – and elastically expand as needed Load new data feeds without requiring formal modeling Bring together any data within the EDC – even if globally distributed – and analyze US – West 200 nodes Europe 100 nodes Asia 200 nodes 40 8 8 16 120 Free 16 68 Free 96 40 64 Free US- East 100 nodes Analyst’s New Warehouse Analyst’s Private Data Feed Analyst’s Private Data Feed EDC Self-Serve Dashboard

9 GP is Software – Develop Now Download at: – Gpn.greenplum.com – Get the VMWare image or use it on OSX, Linux, Solaris 6/29/2015 9

10 Think Big. Think Fast.


Download ppt "1 6/29/2015 XLDB ‘09 Luke Lonergan"

Similar presentations


Ads by Google