Locality-driven High-level I/O Aggregation

Slides:



Advertisements
Similar presentations
Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science
Advertisements

IPDPS Boston Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal.
A Local-Optimization based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud Many slides from authors’ presentation.
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.
Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
Data Locality Aware Strategy for Two-Phase Collective I/O. Rosa Filgueira, David E.Singh, Juan C. Pichel, Florin Isaila, and Jesús Carretero. Universidad.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
ADIOS IO introduction Yufei Dec 10. System at Oak Ridge 672 OSTs 10 Petabytes of storage 60 GB/sec = 480 Gbps aggregate performance (theoretical) 225,000.
Business Process Performance Prediction on a Tracked Simulation Model Andrei Solomon, Marin Litoiu– York University.
Tanzima Z. Islam, Saurabh Bagchi, Rudolf Eigenmann – Purdue University Kathryn Mohror, Adam Moody, Bronis R. de Supinski – Lawrence Livermore National.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Using Grid Computing in Parallel Electronic Circuit Simulation Marko Dimitrijević FACULTY OF ELECTRONIC ENGINEERING, UNIVERSITY OF NIŠ LABORATORY FOR ELECTRONIC.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Subprograms CE 311 K - Introduction to Computer Methods Daene C. McKinney.
Parallel I/O Performance: From Events to Ensembles Andrew Uselton National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory.
The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest.
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.
Large data storage (examples, volumes, challenges) Cluster, Grid, Clouds – Julien Dhallenne.
Using Pattern-Models to Guide SSD Deployment for Big Data in HPC Systems Junjie Chen 1, Philip C. Roth 2, Yong Chen 1 1 Data-Intensive Scalable Computing.
Jialin Liu, Surendra Byna, Yong Chen Oct Data-Intensive Scalable Computing Laboratory (DISCL) Lawrence Berkeley National Lab (LBNL) Segmented.
Compute and Storage For the Farm at Jlab
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Hierarchical I/O Scheduling for Collective I/O
Software Systems Development
Ioannis E. Venetis Department of Computer Engineering and Informatics
Status and Challenges: January 2017
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
HDF5 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.
Jiang Zhou, Wei Xie, Dong Dai, and Yong Chen
CSCE 990: Advanced Distributed Systems
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
Parallel Programming with MPI and OpenMP
Li Weng, Umit Catalyurek, Tahsin Kurc, Gagan Agrawal, Joel Saltz
CS110: Discussion about Spark
TeraScale Supernova Initiative
Laura Bright David Maier Portland State University
Tomography at Advanced Photon Source
Why Threads Are A Bad Idea (for most purposes)
Implementation of a De-blocking Filter and Optimization in PLX
Wellington Cabrera Advisor: Carlos Ordonez
Why Threads Are A Bad Idea (for most purposes)
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
Why Threads Are A Bad Idea (for most purposes)
Parallel Feature Identification and Elimination from a CFD Dataset
Contention-Aware Resource Scheduling for Burst Buffer Systems
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 07. 2013 Data-Intensive Scalable Computing Laboratory (DISCL)

Outline Introduction Motivation Hila: High Level I/O Aggregation Evaluation Conclusion and Future Work

Introduction Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future. GCRM, 100 million collumns, 128 levels per column, 50 km Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching.

Introduction Scientific Datasets and Scientific I/O Libraries PnetCDF, HDF5, ADIOS PnetCDF MPI-IO Parallel File Systems Scientific I/O libraries allow users to specify array-based logical input Logical-physical mismatching

Motivation I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5): Independent I/O Processes collaboration: No Calls collaboration : No Collective I/O Processes collaboration: Yes Calls collaboration : No Noblocking I/O Processes collaboration: Yes Calls collaboration : Yes

Motivation Contention on Storage Server without Aware of Locality Call0 … Call1 … Calli … … … … … Two Phase Collective I/O ag00 ag01 ag02 ag03 ag10 ag11 ag12 ag13 … agi0 agi1 agi2 agi3 Contention on Storage Server without Aware of Locality

Conclusion: Overlapping Should be Removed Performance with Overlapping Calls Conclusion: Overlapping Should be Removed

Idea: High level I/O Aggregation Physical Layout start{0,0,0} length{100,200,200} start{10,20,100} length{10,300,400} Call0 Call1 Logical Input Decomposition Physical Layout sub0 sub2 start{0,0,0} length{100,200,100} sub1 sub3 start{0,0,100} length{100,200,100} start{10,20,100} length{10,150,400} start{10,170,100} length{10,150,400}

Idea: High level I/O Aggregation Basic Idea Figure out the overlapping among requests Eliminate the overlapping before doing I/O Challenges How to decompose the requests How to aggregate the sub-arrays at a high level

Hila: High Level I/O Aggregation Way to figure out the physical layout Sub-correlation Function Sub-correlation Set Lustre Striping: stripe size: s; stripe count: l; Dataset : Dimension: d; subsets size: m

Hila Algorithm: Prior Step Prior Step: calculate sub-correlation set, one time analysis

Hila Algorithm: Decomposition Main Steps: Request Decomposition and Aggregation

Performance Improved with Hila Improvement with Hila Performance Improved with Hila

FASM Improved with Hila Improvement with Hila FASM Improved with Hila

Conclusion and Future Work The mismatching between logical access and physical layout can lead to poor performance. We propose the locality-driven high-level aggregation approach (HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests. Future Work Apply to write operations Integrate with file systems.

Thanks Q&A Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Thanks Q&A http://discl.cs.ttu.edu