Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.

Similar presentations


Presentation on theme: "Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level."— Presentation transcript:

1 Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. 2013@U-REaSON Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level I/O Aggregation for Processing Scientific Datasets 1

2 Introduction  Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future.  VPIC, Vector Particle in Cell, Plasma physics, 26 bytes per particle, 30TB  Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching.

3 Introduction  Scientific Datasets and Scientific I/O Libraries  PnetCDF, HDF5, ADIOS PnetCDF MPI-IO Parallel File Systems  Scientific I/O libraries allow users to specify array-based logical input  Logical-physical mismatching

4 Motivation I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5): Independent I/O Collective I/O Nonblocking I/O  Processes collaboration: No  Calls collaboration : No  Processes collaboration: Yes  Calls collaboration : No  Processes collaboration: Yes  Calls collaboration : Yes

5 Motivation Contention on Storage Server without Aware of Locality … Call 0 … Call 1 … Call i … Two Phase Collective I/O … ag 00 ag 01 ag 02 ag 03 …… … ag 10 ag 11 ag 12 ag 13 ag i0 ag i1 ag i2 ag i3

6 Performance with Overlapping Calls Conclusion: Overlapping Should be Removed

7 Idea: High level I/O Aggregation start{0,0,0} length{100,200,100} start{0,0,0} length{100,200,100} start{0,0,100} length{100,200,100} start{0,0,100} length{100,200,100} start{10,20,100} length{10,150,400} start{10,20,100} length{10,150,400} start{10,170,100} length{10,150,400} start{10,170,100} length{10,150,400} Physical Layout Physical Layout sub 0 sub 2 sub 0 sub 2 sub 1 sub 3 sub 1 sub 3 Physical Layout Physical Layout start{0,0,0} length{100,200,200} start{0,0,0} length{100,200,200} start{10,20,100} length{10,300,400} start{10,20,100} length{10,300,400} Call 0 Call 1 Logical Input Decomposition

8 Idea: High level I/O Aggregation Basic Idea  Figure out the overlapping among requests  Eliminate the overlapping before doing I/O Challenges  How to decompose the requests  How to aggregate the sub-arrays at a high level

9 Hila: High Level I/O Aggregation Way to figure out the physical layout  Sub-correlation Function  Sub-correlation Set  Lustre Striping: stripe size: t; stripe count: l;  Dataset : Dimension: d; subsets size: m

10 Hila Algorithm: Prior Step Prior Step: calculate sub-correlation set, one time analysis

11 Hila Algorithm: Decomposition Main Steps: Request Decomposition and Aggregation

12 Improvement with Hila Performance Improved with Hila

13 Improvement with Hila FASM Improved with Hila

14 Conclusion and Future Work Conclusion  The mismatching between logical access and physical layout can lead to poor performance.  We propose the locality-driven high-level aggregation approach (HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests. Future Work  Apply to write operations  Integrate with file systems.

15 Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Thanks Q&A http://discl.cs.ttu.edu


Download ppt "Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level."

Similar presentations


Ads by Google