Presentation is loading. Please wait.

Presentation is loading. Please wait.

Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines.

Similar presentations


Presentation on theme: "Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines."— Presentation transcript:

1 Embedded System Lab. 최 길 모최 길 모 Kilmo Choi rlfah926@naver.com Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers

2 Embedded System Lab. 최 길 모최 길 모 Contents Background Problems and Challenges Active Flash Approach for In-situ Active Computation Feasibility Evaluation ActiveFlash Prototype based on OpenSSD Platform Conclusion

3 Embedded System Lab. 최 길 모최 길 모 Background

4 최 길 모최 길 모 Background Scientific Discovery : Two-Step Scientific Simulation Scientific Discovery Data Analysis and Visualization

5 Embedded System Lab. 최 길 모최 길 모 Background Large-scale leadership computing applications produce big data  GTC produces ~30TB output data per hour at-scale.

6 Embedded System Lab. 최 길 모최 길 모 Problems and Challenges Offline approach suffers from both performance and energy inefficiencies  Redundant I/O(simulations write, analyses read)  Excessive data movement  Extra energy cost Energy efficiency will become the primary metric for system design, as compute power is expected to increase by x1000 in the next decade with only a x10 increase in power envelope Using simulation nodes for data analysis not acceptable

7 Embedded System Lab. 최 길 모최 길 모 Active Flash Approach for In-situ SSDs now being adopted in Supercomputers(e.g. Tsbame, Gordon)  higher I/O throughput and storage capability SSD controllers becoming increasingly powerful  multi-core low-power processors Idle cycles at SSD controllers In-situ analysis  analysis on in-transit output data, before it is written to the PFS  eliminates redundant I/O, but it use expensive compute nodes

8 Embedded System Lab. 최 길 모최 길 모 Active Flash Approach for In-situ Active flash  In-situ analysis on SSDs  Exploit the computation at idle cycles of the SSD controller  Reduce transfer costs  high performance and energy saving

9 Embedded System Lab. 최 길 모최 길 모 Active Flash Approach for In-situ Three approach to data analysis  offline  active flash  analysis node

10 Embedded System Lab. 최 길 모최 길 모 Active Computation Feasibility Modeling SSD Deployment  Multiple constraints Capacity  Enough SSDs to sustain output burst Performance  High I/O bandwidth to SSD space  Fast restart from application checkpoints Write durability  SSD write endurance limits

11 Embedded System Lab. 최 길 모최 길 모 Active Computation Feasibility  Staging Ratio  How many simulation nodes share one common SSD?

12 Embedded System Lab. 최 길 모최 길 모 Active Computation Feasibility Modeling active computation feasibility  Relatively less compute intensive kernels better suited for active computation(e.g. regex matching)  Dependent on multiple factors : simulation data production rate, staging ratio, I/O bandwidth, etc.

13 Embedded System Lab. 최 길 모최 길 모 Evaluation Cray XT5 Jaguar supercomputer Samsung PM830 SSD Intel Core i7 processors

14 Embedded System Lab. 최 길 모최 길 모 Evaluation Feasibility of the analysis node approach  Most data analysis kernels can be placed on SSD controllers without degrading simulation performance  Additional SSDs are not required for supporting in-situ data analysis on SSDs  Analysis node approach is feasible at higher staging ratios, but at additional infrastructure cost

15 Embedded System Lab. 최 길 모최 길 모 Evaluation Energy and cost saving analysis  Staging ratio = 10  Active Flash and offline approach : y1 analysis node : y2  Offline model consumes more energy due to the I/O wait time

16 Embedded System Lab. 최 길 모최 길 모 Conclusion Extant approaches to scientific data analysis(e.g. offline and analysis nodes) are stymied by several inefficiencies in data movement and energy consumption that results in sub-optimal performance Active flash is better than either approaches for all of the aforementioned metrics


Download ppt "Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines."

Similar presentations


Ads by Google