Download presentation
Presentation is loading. Please wait.
Published byFelicia Paulina Rich Modified over 9 years ago
1
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated Nodes Vijayshankar Raman, Wei Han, Inderpal Narang IBM Almaden Research Center
2
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Properties of a relational database Ease of schema evolution Declarative Querying Transparent scalability does not quite work
3
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Today: Partitioning is basis for parallelism static partitioning (on the base tables) Dynamic partitioning via exchange operators Claim: partitioning does not handle non-dedicated nodes well L1L1 O1O1 SaSa L3L3 O3O3 ScSc L2L2 O2O2 SbSb
4
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Problems of partitioning Hard to scale incrementally –Data must be re-partitioned –Disk and CPU must be scaled together DBA must ensure partition-cpu affinity Homogeneity Assumptions –Same plan runs on each node –Identical software needed on all nodes Susceptible to load variations, node failures / stalls, … –Response time is dictated by speed of slowest processor –Bad for transient compute resources E.g. we want ability to interrupt query work by higher-priority local work exchange initial partitioning
5
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 GOAL: A more graceful scale-out solution Sacrifice partitioning for scalability –Avoid initial partitioning –No exchange New means for work allocation in absence of partitioning –Handles heterogeneity and load variations better Two Design Features –Data In The Network (DITN) Shared files on high speed networks (e.g SAN) –Intra-Fragment Parallelism Send SQL fragments to heterogeneous join processors: each performs the same join, over a different subset of cross-product space Easy fault-tolerance Can use heterogeneous nodes -- whatever is available at that time
6
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Outline Motivation DITN design Experimental Results Summary
7
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 DITN Architecture 1.Find idle coprocessors P 1, P 2, P 3, P 4, P 5, P 6 2.Prepare O, L, C 3.Logically divide OxLxC into workunits W i 4.In Parallel, Run SQL queries for W i at P i 5.Property: SPJAG(OxLxC) = AG ( i SPJAG(W i )) Restrictions (will return to this at the end) P i cannot use indexes at info. Integrator Isolation issues
8
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Why Data in the Network Observation: Network bandwidth >> Query Operator Bandwidth –N/W bandwidth: in Gbps (SAN/LAN), Scan: 10-100 Mbps, Sort: about 10 Mbps –Interconnect transfers data faster than query operators can process it But, exploiting this fast interconnect via SQL is tricky –E.g. ODBC Scan: 10x slower than local scan Instead, keep temp files in a shared storage system (e.g. SAN-FS) –Allows exploitation of full n/w bandwidth immediate benefits –Fast data transfer –DBMS doesn’t have to worry about disks, i/o || ism, || scans, etc. –Independent scaling of CPU and I/O
9
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Work Allocation without Partitioning For each join: we now have to join the off-diagonal rectangles also Minimize Response time= max(RT of each work-unit) = max i,j JoinCost(|L i |, |O j |) How to optimize the Work allocation? –~ cut join hyper-rectangle into n pieces to minimize max perimeter –Simplification: assume that the join is cut into a grid Choices: number of cuts on each table, size of each cut, allocation of work-units to processors
10
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Allocation to homogenous processors Theorem: For monotonic JoinCost, RT is minimized when each cut (on a table) is of same size So allocation done into rectangles of size |T 1 |/p 1, |T 2 |/p 2, … |T n |/p n Theorem: For symmetric JoinCost, RT is minimized when |T 1 |/p 1 = |T 2 |/p 2 = … |T n |/p n E.g., with 10 processors, cut Lineitem into 5 parts and Orders into 2 Note: cutting each table into same number of partitions (as is done usually) is sub-optimal
11
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Allocation to heterogeneous co-processors Response time of query RT = max(RT of each work-unit) Choose size of each work-unit, and allocation of work-units to co-processor, so as to minimize RT Like a bin packing problem –Solve for number of cuts on each table, assuming homogeneity –Then solve a Linear Program to find the optimal size of each cut –Have to make some approximations in order to avoid Integer Program (see paper)
12
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Failure/Stall Resiliency by Work-Unit Reassignment Without tuple shipping between plans, failure handling is easy If co-processor’s A,B,C finished by time X, and co-processor D has not finished by time X(1+f) –Take D’s work unit and assign to fastest among A,B,C – say A –When either of D or A returns, close the cursor on the other Can generalize to a work-stealing scheme –E.g. with 10 coprocessors, assign each to 1/20 th of the cross-product space –When a coprocessor returns with a result, assign it more work Tradeoff: Finer work allocation => more flexible work-stealing BUT, more redundant work
13
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Analysis: What do we lose by not partitioning Say join of L x O x C (TPC-H) with 12 processors: 12 = p 1 p 2 p 3 RT without partitioning ~ JoinCost(|L|/p 1, |O|/p 2, |C|/p 3 ) RT with partitioning ~ JoinCost(|L|/p 1 p 2 p 3, |O|/p 1 p 2 p 3, |C|/p 1 p 2 p 3 ) At p 1 =6, p 2 =2, p 3 =1, loss in CPU speedup is JoinCost(|L|/6, |O|/2, |C| ) ~ 2 JoinCost(|L|/12, |O|/12, |C|/12) Note: I/O speedup is unaffected Can close the gap with partitioning further Sort the largest tables of the join: e.g. |L|, |O| on their join column –Now, loss is: JoinCost(|L|/12,|O|/12,|C|) / JoinCost(|L|/12, |O|/12,|C|/12) Still avoids exchange => can use heterogeneous, non-dedicated nodes, but causes problems with isolation Optimization: selective clustering
14
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Lightweight Join Processor Work Allocation via Query Fragments => co-processors can be heterogeneous Need not have a full DBMS; join processor is enough E.g. screen saver for join processing We use a trimmed down version of Apache Derby –Parse CSV files –Predicates, projections, sort-merge joins, aggregates, group by
15
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Outline Motivation DITN design Experimental Results Summary
16
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Performance degradation due to not partitioning O L SOLSOL SOLCNR At 10 nodes on SxOxLxCxNxR, DITN is about 2.1x slower than PBP (Work alloc: L/5, O/2, S, C, N, R) DITN2PART has very little slowdown –But needs total clustering Slow-down oscillates due to discreteness of work-allocation
17
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Failure/Stall Resiliency by Work-Unit Reassignment Orders x Lineitem group by o_orderpriority 5 co-processors Impose high load on one co-processor as soon as query begins At 60% load (50% wait), DITN times out and switches to alternative DITN2PART PBP
18
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Importance of Asymmetric Allocation Initially 2 fast nodes: then add 4 slow nodes With symmetric allocation: adding slow nodes can slow down system Contrast between DITN-symmetric and DITN-asymmetric
19
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Danger of Tying partition to CPU Repeated execution of O L Impose 75% CPU load on one of the 5 co-processors during 3 rd iteration PBP continues to use this slow node throughout DITN switches to another node after two iterations
20
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Related Work Parallel query processing – Gamma, XPRS, many commercial systems –Mostly shared-nothing –Shared-disk: IBM Sysplex Queries done via tuple shipping between co-processors –Oracle Shared disk, but hash joins done via partitioning (static/dynamic) Mariposa – similar query fragment level work allocation Load Balancing Exchange, Flux, River, Skew-avoidance in hash joins Fault-tolerant exchange (FLUX) Polar*, OGSA-DQP Distributed Eddies Query Execution on P2P systems
21
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Summary and Future work Partitioning-based parallelism does not handle non-dedicated nodes Proposal: Avoid partitioning –Share data via storage system –Intra-fragment parallelism instead of exchange –Careful work-allocation to optimize response time Promising initial results: only 2x slowdown with 10 nodes Index scans: want shared reads without latching Isolation: DITN: uncommitted read; DITN2PART: read-only Scaling to large numbers of nodes Multi-query optimization to reuse shared temp tables Open Questions
22
IBM Research © 2005 IBM Corporation Parallel Querying with Non-Dedicated ComputersAug 30 2005 Backup Slides
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.