Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,

Similar presentations


Presentation on theme: "High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,"— Presentation transcript:

1 High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12, 2003

2 2 Roadmap Introduction Research area description Past research Future research directions

3 3 About Myself Xiaosong Ma –Pronunciation: Shiao-song –Homepage through the faculty directory Brief bio –B.S., Peking University, China –Ph.D., UIUC Hobbies –Traveling –Food –Photography, movies, tennis …

4 4 High-Performance Computing Enabled by increasing computational power –Scientific computation –Parallel data mining –Web data processing High-performance computing in daily life –Weather forecast –Web crawling and web search –Games, movie graphics, virtual reality

5 5 Past Research I/O performance optimization for parallel applications –High-level buffering and prefetching techniques –Hiding the I/O cost –Utilizes idle resources for maximizing inter-task parallelism –Lightweight database support for visualization applications –Making optimizations portable and adaptive

6 6 Parallel I/O in Scientific Simulations Write-intensive Collective and periodic Bottleneck-prone “Poor stepchild” Traditional collective I/O focused on data transfer Computation … I/O Computation I/O Computation I/O Computation …

7 7 Active Buffering Hides periodic I/O costs behind computation phases [IPDPS ’02, ICS ’02, IPDPS ’03] Organizes idle memory resources into buffer hierarchy Controlled by state machines –Flexible regarding buffer space availability –Adapts to applications’ output pattern –Flexible software architecture

8 8 AB vs. Asynchronous I/O

9 9 Deployment of Active Buffering Panda Parallel I/O Library –University of Illinois –Client-server architecture ROMIO Parallel I/O Library –Argonne National Lab –Popular MPI-IO implementation, included in MPICH –Server-less architecture –ABT (Active Buffering with Threads)

10 10 Sample Execution with ABT Data reorganization and buffering comp. phase 1 comp. phase 2 comp. phase 3 comp. phase 4 I/O phase 1 I/O phase 2 I/O phase 3 time

11 11 I/O in Visualization Periodic reads Dual modes of operation –Interactive –Batch-mode Harder to overlap I/O with computation Computation … I/O Computation I/O Computation I/O Computation

12 12 Lightweight Data Management Process large number of datasets –Scientific data are structured –Conventional DBMS rarely used in parallel scientific codes GODIVA framework [ICDE ’04] –General Object Data Interface for Visualization Applications –In-memory database managing data buffer locations –Relational database-like interfaces –Developer controllable prefetching and caching –Developer-supplied read functions

13 13 GODIVA Architecture

14 14 Sample Record Instance Sample query –Where is the temperature array holding block_0003 at time-step 0.000075 in a fluid record?

15 15 Prefetching and Caching process unit –readUnit –addUnit and waitUnit –finishUnit and deleteUnit // add all units. addUnit("fluid_file1", read_file); addUnit("fluid_file2", read_file); // process array records in fluid_file1 waitUnit("fluid_file1"); do_visualization_computation("fluid_file1"); deleteUnit("fluid_file1"); // process array records in fluid_file2 waitUnit("fluid_file2"); do_visualization_computation("fluid_file2"); deleteUnit("fluid_file2");

16 16 Voyager on a Single-processor Workstation

17 17 Voyager on a Dual-processor Cluster node

18 18 Future work: I/O Performance Prediction Objective: to predict the I/O time for high- performance applications Challenge: lack of information in the Grid environment –Knowledge on applications or systems not available –Hard to simulate real applications in real environments –Hard to predict scalability –How do we parameterize an application?

19 19 Future work: Sci. Data Management Objective: to manage data in scientific applications effectively and efficiently Challenge: two research world not well connected –Conventional databases not suitable for HPC –Scientific databases designed for specific applications –General approach? Need to handle storage and I/O for different types of datasets and their distribution

20 20 Summary Wide area of potential research –Parallel computing –Databases –Operating systems/storage systems Many open problems and new challenges

21 21 References [ICDE ’04] Xiaosong Ma, Marianne Winslett, John Norris, Xiangmin Jiao and Robert Fiedler, GODIVA: Lightweight Data Management for Scientific Visualization, the 20th International Conference on Data Engineering, 2004 [PhD Thesis] Xiaosong Ma, Hiding Periodic I/O Costs for Parallel Applications, PhD thesis, University of Illinois, 2003 [IPDPS ’03] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Improving MPI-IO Output Performance with Active Buffering Plus Threads, 2003 International Parallel and Distributed Processing Symposium [PDSECA ’03] Xiaosong Ma, Xiangmin Jiao, Michael Campbell and Marianne Winslett, Flexible and Efficient Parallel I/O for Large-Scale Multi-component Simulations, The 4th Workshop on Parallel and Distributed Scientific and Engineering Computing with Applications [ICS ’02] Jonghyun Lee, Xiaosong Ma, Marianne Winslett and Shengke Yu, Active Buffering Plus Compressed Migration: An Integrated Solution to Parallel Simulations' Data Transport Needs, the 16th ACM International Conference on Supercomputing [IPDPS ’02] Xiaosong Ma, Marianne Winslett, Jonghyun Lee and Shengke Yu, Faster Collective Output through Active Buffering, 2002 International Parallel and Distributed Processing Symposium


Download ppt "High Performance I/O and Data Management System Group Seminar Xiaosong Ma Department of Computer Science North Carolina State University September 12,"

Similar presentations


Ads by Google