SMP MPP with PDW ** Workload requirements usually drive the architecture decision.

Slides:



Advertisements
Similar presentations
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Advertisements

2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.
Thomas Kejser Senior Program Manager Microsoft Corp. Introducing Parallel Data Warehouse (The project formerly known as Madison)
Garrett Edmondson Data Warehouse Architect Blue Granite Inc.
Microsoft Data Warehouse Vision Massive Scalability at Low Cost Improved Business Agility and Alignment Democratized Business Intelligence Hardware.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Danny Tambs Solution Architect. VOLUME (Size) VARIETY (Structure) VELOCITY (Speed)
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
April 10-12, Chicago, IL PDW Architecture Gets Real: Customer Implementations Brian Walker | Microsoft Corporation PDW Center of Excellence Murshed Zaman.
Kevin St. Clair Sr. Support Engineer Hewlett-Packard SESSION CODE: UNC306.
SQL Server Parallel Data Warehouse: Supporting Large Scale Analytics José Blakeley, Software Architect Database Systems Group, Microsoft Corporation.
SQL Server Warehousing (Fast Track 4.0 & PDW)
Sometimes it is the stuff you know that hinders true progress.
SQL Server Data Warehousing Overview
Activity Running Time DurationIntro0 2 min Setup scenario 2 2 min SQL BI components & concepts 4 5 min Data input (Let’s go shopping) 9 7 min Whiteboard.
DBI332 ilikesql brianwmitchelll UNSTRUCTURED UNBALANCED UNPREDICTABLE.
Ashwin Sarin Program Manager Microsoft Corporation SESSION CODE: COS204.
Maciej Pilecki Consultant, SQL Server MVP Project Botticelli Ltd. SESSION CODE: DAT403.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Enterprise Data Warehouse.
Boris Jabes Senior Program Manager Microsoft Corporation SESSION CODE: DEV319 Scale & Productivity in Visual C
Peter Provost Sr. Program Manager Microsoft Corporation SESSION CODE: DEV403.
SESSION CODE: BIE07-INT Eric Kraemer Senior Program Manager Microsoft Corporation.
Kevin Cox – SQL CAT Microsoft Corporation What are the largest SQL projects in the world? SESSION CODE: DAT305 Srik Raghavan –
Gail Warren Director, Online Services Microsoft Corporation SESSION CODE: COS201.
Joe SchulmanAdrienne WuProgram ManagerMicrosoft Corporation SESSION CODE: SIA319.
Data Management Conference Data Warehousing John Plummer TSP Architect
END USER TOOLS AND PERFORMANCE MANAGEMENT APPS Excel PerformancePoint Svcs/ProClarity BI PLATFORM SQL Server Reporting Services SQL Server Reporting Services.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN SQL Server 2012 Parallel Data Warehouse.
Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321.
Suhail Dutta Program Manager Microsoft Corporation SESSION CODE: DEV402.
Chandrika Shankarnarayan Senior Program Manager Microsoft Corporation SESSION CODE: ASI301.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
Satya SK Jayanty Director & Principal Architect D BI A Solutions Peter Saddow Senior Program Manager Microsoft Corporation -SQL Server SESSION CODE: DAT312.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
Thomas Deml Principal Program Manager Web Platform and Tools Microsoft Corporation SESSION CODE: WEB308.
Clint Huffman Microsoft Premier Field Engineer (PFE) Microsoft Corporation SESSION CODE: VIR315 Kenon Owens Technical Product Manager Microsoft Corporation.
SQL Server 2000 Sys Admin Jeremiah Curtis Engineering Services
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Vorstellung Parallel.
Jeff King Senior Program Manager, Visual Studio Microsoft Corporation SESSION CODE: WEB305.
Lori Dirks Expression Community Manager Microsoft Corporation SESSION CODE: WEB309.
Johan Arwidmark Chief Technical Architect TrueSec SESSION CODE: WEM301.
Solution to help customers and partners accelerate their data.
Chris Mayo Microsoft Corporation SESSION CODE: UNC207.
Olivier Bloch Technical Evangelist Microsoft Corporation SESSION CODE: WEM308.
Richard Campbell Co-Founder Strangeloop Networks SESSION CODE: WEB315.
SESSION CODE: MGT205 Chris Harris Program Manager Microsoft Corporation.
Andrew Connell, MVP Developer, Instructor & Author Critical Path Training, LLC. SESSION CODE: OSP305.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
BIO202 | Building Effective Data Visualizations and Maps with Microsoft SQL Server 2008 Reporting Services BIU08-INT | Using.
Martin Woodward Program Manager Microsoft Corporation SESSION CODE: DEV308.
Rushabh Mehta Managing Director (India) | Solid Quality Mentors
SESSION CODE: COS301. So what do we do?
Phil Pennington, Sr. Developer Evangelist, Microsoft Christian Saborio, Chief Software Architect, Scorpiotek Solutions SESSION CODE: WSV329.
David A. Carley Senior SDE Microsoft Corporation SESSION CODE: DEV318.
Jeff Mealiffe Sr. Program Manager Microsoft Corporation SESSION CODE: UNC301 Evan Morris Sr. Systems Engineer Hewlett-Packard.
Cube Measure Group Measure Partition Cube Dimension Dimension Attribute Attribute Relationship Hierarchy Level Cube Attribute Cube Hierarchy.
Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft.
SQL Server 2008 R2 Parallel Data Warehouse: Under the Hood Brian Mitchell Senior Premier Field Engineer.
DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)
Clint Kunz Data Platform Technology Specialist
Scaling PostgreSQL with GridSQL. Who Am I? Jim Mlodgenski – Co-organizer of NYCPUG – Founder of Cirrus Technologies – Former Chief Architect of EnterpriseDB.
…the secret sauce! Diagrams and video from Microsoft white papers and slide decks.
Flash Storage 101 Revolutionizing Databases
Data Warehousing: SQL Server Parallel Data Warehouse AU3 update
A developers guide to Azure SQL Data Warehouse
Overview of Fast Track and PDW
Microsoft Analytics Platform System 04 – APS Data Loading
A developers guide to Azure SQL Data Warehouse
LitwareHR v2: an S+S reference application
Microsoft Analytics Platform System 03 – Distribution Theory & Design
Presentation transcript:

SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Compute Nodes Infiniband Control Nodes Passive Compute Node Dual Fiber Channel Client Drivers ETL Load Interface Corporate Backup Solution Data Center Access Corporate Network Private Network Control Rack Control Rack Data Rack Data Rack Expand by adding data rack(s) when capacity or performance requirements change Dual Active / Passive Reporting Services Analysis Services Integration Services R2 DataDirect Drivers ADO.NET, OLE-DB, ODBC DataDirect Drivers ADO.NET, OLE-DB, ODBC 1. PDW Engine 2. Admin Console 3. Metadata 4. Workspace 1. PDW Engine 2. Admin Console 3. Metadata 4. Workspace 1.User Data 2. DMS 1.User Data 2. DMS 1. SSIS Instance 2. Loader Tool 3. File Staging 1. SSIS Instance 2. Loader Tool 3. File Staging 1.File store for backups 2.Std. SQL Backups 3.Full and differential 1.File store for backups 2.Std. SQL Backups 3.Full and differential 1.Active Directory/DNS 2.HPC 3.Setup/patching 1.Active Directory/DNS 2.HPC 3.Setup/patching

CPU RAM HP DL360 G6 1U Intel Nehalem 8 Cores Hyper threaded 72 GB 6 – 300GB 10K SAS DELLR6101U Intel Nehalem 8 Cores Hyper threaded 96 GB 4 – 300GB 10K SAS Enterprise Class DBMS TempDB Workspace Dual Multi-Core Processors Models listed as of SQL Server 2008 R2 PDW MTP2 release ** Server models could change before RTM**

450 GB 15KSAS 36 TB 1 TB 7.2KSATA 80 TB DUAL 4Gb FC Stg Processor DUAL 4Gb FC Stg Processor DUAL 4Gb FC Stg Processor DUAL 4Gb FC Stg Processor Data & Log Drives (RAID 10) Data & Log Drives (RAID 10) Hot Spare Hot Spare 450 GB 15KSAS 45 TB 1 TB 7.2KSAS 100 TB Models listed as of SQL Server 2008 R2 PDW MTP2 release ** Storage models and drives could change before RTM**

15 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End

16 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TD PD SD MDMD MDMD TD PD SD MDMD MDMD TD PD SD MDMD MDMD Smaller Dimension Tables are Replicated on Every Compute Node TD PD SD MDMD MDMD

17 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TD PD SD MDMD MDMD TD PD SD MDMD MDMD TD PD SD MDMD MDMD TD PD SD MDMD MDMD Larger Fact Table is Hash Distributed Across All Compute Nodes SF-1 SF-2 SF-3 SF-4

Compute Node SQL Server 2008 SP2 TempDB User Database Storage Node Storage Processor Data LUNs Tx Logs Hot Spare Distributed table hashed within node into physical tables Distributed Table SF1-a SF1-b SF1-c SF1-d SF1-e SF1-f SF1-g SF1-h SF-1 Each distribution lands on a specific LUN Replicated tables striped across LUNs SF1-a SF1-b SF1-c SF1-d SF1-e SF1-f SF1-g SF1-h Repl

SQL Server DW Authentication DW Configuration DW Schema DW Schema TempDB Data Movement Service (DMS) Data Movement Service (DMS) Compute Nodes Compute Node Query Tool SQL Server Data Movement Service (DMS) User Data Control Node PDW Engine Parse SQL Validate & Authorize Build MPP Plan Execute Plan Return Data to Client TempDB

Data Rack Control Rack 20 Control Node Landing Zone Compute Nodes Storage Nodes Infiniband Load File/SSIS Load File/SSIS DMS Ser er PDW Engine Load Manager DMS Manager DMS Manager DMS SQL Server SQL Server Load Client DMS Converter Sender Receiver Writer DMS Converter Sender Receiver Writer DWLoader invoked/ SSIS DWLoader invoked/ SSIS DMS Reads Load Data and buffers records to Send to Compute Nodes round-robin Load Manager Creates Staging Tables Each row is converted for bulk insert and hash the distribution column Hashed row is sent to appropriate node receiver for loading Received row is pushed onto writer thread Row is bulk inserted into staging table SSIS API

Insert-Select Load Data Bulk Insert Partitioned Staging Table (CIDX) Partitioned Staging Table (CIDX) Partitioned Final Table (CIDX) Partitioned Final Table (CIDX) Sort each BATCH in memory or TempDB Sort each BATCH in memory or TempDB Sort each partition In memory or TempDB Sort each partition In memory or TempDB Node1 Dist A Insert-Select Load Data Bulk Insert Partitioned Staging Table (CIDX) Partitioned Staging Table (CIDX) Partitioned Final Table (CIDX) Partitioned Final Table (CIDX) Sort each BATCH in memory or TempDB Sort each BATCH in memory or TempDB Sort each partition In memory or TempDB Sort each partition In memory or TempDB Node1 Dist B Insert-Select Load Data Bulk Insert Partitioned Staging Table (CIDX) Partitioned Staging Table (CIDX) Partitioned Final Table (CIDX) Partitioned Final Table (CIDX) Sort each BATCH in memory or TempDB Sort each BATCH in memory or TempDB Sort each partition In memory or TempDB Sort each partition In memory or TempDB Node1 Dist H

Credit card processing EDW Workload

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31 st You can also register at the North America 2011 kiosk located at registration Join us in Atlanta next year