Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance-Robust Parallel I/O

Similar presentations


Presentation on theme: "Performance-Robust Parallel I/O"— Presentation transcript:

1 Performance-Robust Parallel I/O
Virtual Streams Performance-Robust Parallel I/O Z. Morley Mao, Noah Treuhaft CS258 5/17/99 Professor Culler

2 Introduction Clusters exhibit performance heterogeneity
static & dynamic, due to both hardware and software Consistent peak performance demands adaptive software building performance-robust parallel software means keeping heterogeneity in mind This work explores… adaptivity appropriate for I/O-bound parallel programs how to provide that adaptivity

3 Heterogeneity demands adaptivity
Cluster Node Process Disk ... Physical I/O streams are simple to build and use But their performance is highly variable different drive models, bad blocks, multizone behavior, file layout, competing programs, host bottlenecks I/O-bound parallel programs run at rate of slowest disk

4 Virtual Streams Performance-robust programs want virtual streams that... eliminate dependence on individual disk behavior continually equalize throughput delivered to processes Process Virtual Streams Layer Disk

5 Graduated Declustering (GD): a Virtual Streams implementation
data replicated (mirrored) for availability use replicas to provide performance availability, too fast network makes remote disk access comparable to local distributed algorithm for adaptivity client provides information about its progress server reacts by scheduling requests to even out progress client A client B Process GD client library GD server server server A B

6 GD in action Local decisions yield global behavior Before Perturbation
To Client0 Before Perturbation After Perturbation 1 2 3 B Client1 Client2 Client3 Server0 Server1 Server2 Server3 From B/2 7B/8 3B/8 5B/8 B/4

7 Evaluation of original GD implementation: progress-based
Seek overhead due to reading from all replicas Seek overhead

8 Deficiency of original GD implementation: seek overhead
Under the assumption of sequential data access: Seek occurs even when there is no perturbation seeks are becoming more significant as disk transfer rate increases Need a new algorithm, that ... reads mostly from a single disk under no perturbation dynamically adjusts to perturbation when necessary achieves both performance adaptivity and minimal overhead

9 Proposed solution: response-rate-based GD
Number of requests clients send to server based on server response rate servers use request queue lengths to make scheduling decisions uses implicit information, “historyless” no bandwidth information transmitted between server and client advantage: each client has a primary server

10 Evaluation of response-rate-based GD
Graph of bandwidth vs. disk nodes perturbed Reduced Seek overhead

11 Historyless vs. History-based adaptiveness
History-based: (progress based) Adjustment to perturbation occurs gradually over time Close to perfect knowledge, if the information not outdated extra overhead in sending control information Historyless: (response-rate based) primary server designation possible to increase sensitivity to real perturbation by creating “artificial” perturbation considers varying performance of data consumers takes longer to converge

12 Stability and Convergence
How long does it take for the system to converge? Linear with the number of nodes Depends on the last occurrence of perturbation Influenced by the style of communication (implicit vs. explicit)

13 Server request handoff
If a server finishes all its requests, it will contact other servers with the same replicas to help serve their clients (workstealing) server request handoff keeps all disks busy when possible design decisions? How many requests to handoff? Depending on the BW history of both servers, depending on the size of request queue. Benefit vs. Cost tradeoff

14 Writes Identical to reads except...
Create incomplete replicas with “holes” track “holes” in metadata afterward, do “hole-filling” both for availability and for performance robustness Process

15 Conclusions What did we achieve?
New load balancing algorithm--response-rate based Deliver equal BW to parallel-program processes in face of performance heterogeneity demonstrate the stability of the system reduce seek overhead server request handoff writes creates a useful abstraction for steaming I/O in clusters

16 Future Work Future work: hot file replication
get peak BW after perturbation ceases achieve orderly replies multiple disks abstraction


Download ppt "Performance-Robust Parallel I/O"

Similar presentations


Ads by Google