Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM

Similar presentations


Presentation on theme: "© 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM"— Presentation transcript:

1 © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com +65 9877 0221

2 © 2012 IBM Corporation Information Management, A Killer Appl Dimensions of Parallelism & Data Distribution Data distribution is the foundation for the Parallelism process. Data distribution ensures that each parallel process does equal work. Parallelism does the work in the shortest possible time, because the work is shared equally by many processes. (Many lanes on a highway)(Equal number of cars in each lane) Optimal Parallelism = Even Distribution = Best Performance

3 © 2012 IBM Corporation Information Management, A Killer Appl The Two Factors of Parallelism Unbalanced System = BOTTLENECKS Balanced System = PERFORMANCE Processor Balancing Data Balancing

4 © 2012 IBM Corporation Information Management, A Killer Appl Processor Balancing: The Query Optimizer Query U N I T S O F P A R A L L E L I S M OPTIMIZER

5 © 2012 IBM Corporation Information Management, A Killer Appl Bad Query Optimization Query U N I T S O F P A R A L L E L I S M OPTIMIZER Step 3 Step 2 Step 1 Step 6 Step 5 Step 4 Step 3 Step 2 Step 1 Step 2 Step 1 Bad Optimization: Does not optimize across “units of parallelism” Does not balance workload across those “units of parallelism” that it chooses XXXXXX Dormant

6 © 2012 IBM Corporation Information Management, A Killer Appl Query U N I T S O F P A R A L L E L I S M OPTIMIZER Step 2 Step 1 Good Optimization: Optimizes across “units of parallelism” Balances the workload across “units of parallelism” Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Step 2 Step 1 Good Query Optimization

7 © 2012 IBM Corporation Information Management, A Killer Appl Good Optimization Is: Good OptimizerBad Optimizer Important for simple queries... Customer Impact: Fast, consistent query runtimes Accommodates multiple users easily “What is the inventory of candy canes?”

8 © 2012 IBM Corporation Information Management, A Killer Appl “What is the inventory of candy canes?” Good OptimizerBad Optimizer Important for simple queries... “What is the inventory of candy canes in each of the stores in Southern California?”... but crucial for complex queries Customer Impact: High-value answers returned quickly No answer Good Optimization Is:

9 © 2012 IBM Corporation Information Management, A Killer Appl The Value of Queries Query ComplexityQueryResulting DecisionBusiness Impact Simple “What is the inventory of candy canes?” Reduce price on excess inventory to avoid post- holiday overstock. Margin erosion

10 © 2012 IBM Corporation Information Management, A Killer Appl Simple “What is the inventory of candy canes?” Reduce price on excess inventory to avoid post- holiday overstock. Margin erosion Query ComplexityQueryResulting Decision Complex “What is the inventory of candy canes in each individual store?” Re-route overstocks to understocked stores to exploit neighborhood demand differences. Profit maximization Query ComplexityQueryResulting DecisionBusiness Impact Complex queries have tremendous value for your customer. But few customers know the difference between simple and complex queries. The Value of Queries

11 © 2012 IBM Corporation Information Management, A Killer Appl  DISTRIBUTE FOR COLLOCATION –Look at the joins  ORDER DATA –Look at the WHERE clauses  OPTIMIZE TABLE STRUCTURE –Performance, space, maintainability Netezza Performance – the Big Three

12 © 2012 IBM Corporation Information Management, A Killer Appl Network Cost – Rough Estimates  Collocated TablesN/A  Table Redistribute23 MB / SPU / sec.  Broadcasted Table80 MB / sec. dbos TwinFin 12 = 2.1 GB / sec.

13 © 2012 IBM Corporation Information Management, A Killer Appl SKEW Considerations  Collocated Tables – None -- except for whatever you started out with  Table Redistribute – Possible  Broadcasted Table – None – since all dataslices get an identical copy of the data – But you don’t want to broadcast large volumes of data

14 © 2012 IBM Corporation Information Management, A Killer Appl CUSTOMER Cust Id (PK) Cust Name Transaction Id (PK) Account Id (FK) Transaction Type Transaction Amount TRANSACTION ACCOUNT Account Id (PK) Cust Id (FK) Product Code (FK) Last Trans Date Amount Balance is holder of has activity of Choosing Distribution Key Id or Name? Acct or Cust Id? Or Product Code Trans or Acct Id? PRODUCT Product Code (PK) Product Desc has 100 mil. rows 300 mil. rows 300 rows 1.5 billion rows per month 18 billion rows per year What if there are 10 queries having… WHERE C.Cust_Id = A.Cust_Id and A.Account_Id = T.Account_Id and there are 10,000 such queries executed daily? Massive Processing Skew because 18 billion TRANSACTION rows need to be moved around to join the 300-million ACCOUNT table. The solution is: Add Cust_Id to TRANSACTION table Cust_Id And change the join-column to Cust_Id WHERE C.Cust_Id = A.Cust_Id and A.Cust_Id = T.Cust_Id

15 © 2012 IBM Corporation Information Management, A Killer Appl And finally………… “A good question lights a thousand fires, a good answer merely permits savages to sleep.” Mike Corbett (21 st Century Games Player)

16 © 2012 IBM Corporation Information Management, A Killer Appl Eddie BK Tan, PMP Technical Account Manager, Netezza Asia Pacific IBM Software | Information Mgmt. The IBM Place 9 Changi Business Park Central 1 Singapore 486048 Tel +65 9877 0221 tanbke@sg.ibm.com We don’t have enough time to dive into the deeper portions of this pond, but feel free to contact me: In closing…


Download ppt "© 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM"

Similar presentations


Ads by Google