Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.

Slides:

Advertisements

Similar presentations

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY

Advertisements

Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science

IPDPS Boston Integrating Online Compression to Accelerate Large-Scale Data Analytics Applications Tekin Bicer, Jian Yin, David Chiu, Gagan Agrawal.

A Local-Optimization based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud Many slides from authors’ presentation.

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.

File Consistency in a Parallel Environment Kenin Coloma

A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.

Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.

LIBRA: Lightweight Data Skew Mitigation in MapReduce

SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.

Large Scale Parallel I/O with HDF5 Darren Adams, NCSA Considerations for Parallel File Systems on Distributed HPC Platforms.

Parallel I/O Performance Study Christian Chilan The HDF Group September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1.

UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.

Data Locality Aware Strategy for Two-Phase Collective I/O. Rosa Filgueira, David E.Singh, Juan C. Pichel, Florin Isaila, and Jesús Carretero. Universidad.

Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory

ADIOS IO introduction Yufei Dec 10. System at Oak Ridge 672 OSTs 10 Petabytes of storage 60 GB/sec = 480 Gbps aggregate performance (theoretical) 225,000.

On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

Identifying and Incorporating Latencies in Distributed Data Mining Algorithms Michael Sevilla.

Business Process Performance Prediction on a Tracked Simulation Model Andrei Solomon, Marin Litoiu– York University.

Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.

Tanzima Z. Islam, Saurabh Bagchi, Rudolf Eigenmann – Purdue University Kathryn Mohror, Adam Moody, Bronis R. de Supinski – Lawrence Livermore National.

Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,

SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.

Using Grid Computing in Parallel Electronic Circuit Simulation Marko Dimitrijević FACULTY OF ELECTRONIC ENGINEERING, UNIVERSITY OF NIŠ LABORATORY FOR ELECTRONIC.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Cluster-based SNP Calling on Large Scale Genome Sequencing Data Mucahid KutluGagan Agrawal Department of Computer Science and Engineering The Ohio State.

December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.

HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.

SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

1/30/2003 BARC1 Profile-Guided I/O Partitioning Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University {yiwang,

Fusion-SDM (1) Problem description –Each run in future: ¼ Trillion particles, 10 variables, 8 bytes –Each time step, generated every 60 sec is (250x10^^9)x8x10.

The Vesta Parallel File System Peter F. Corbett Dror G. Feithlson.

HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

SC 2013 SDQuery DSI: Integrating Data Management Support with a Wide Area Data Transfer Protocol Yu Su*, Yi Wang*, Gagan Agrawal*, Rajkumar Kettimuthu.

CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Chung-hung.

Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest.

Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.

Jay Lofstead Input/Output APIs and Data Organization for High Performance Scientific Computing November.

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package MuQun Yang, Christian Chilan, Albert Cheng, Quincey Koziol, Mike.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

Parallel I/O Performance Study and Optimizations with HDF5, A Scientific Data Package Christian Chilan, Kent Yang, Albert Cheng, Quincey Koziol, Leon Arber.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.

Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.

Matrix Factorization Reporter : Sun Yuanshuai

Fast Data Analysis with Integrated Statistical Metadata in Scientific Datasets By Yong Chen (with Jialin Liu) Data-Intensive Scalable Computing Laboratory.

Large data storage (examples, volumes, challenges) Cluster, Grid, Clouds – Julien Dhallenne.

Jialin Liu, Surendra Byna, Yong Chen Oct Data-Intensive Scalable Computing Laboratory (DISCL) Lawrence Berkeley National Lab (LBNL) Segmented.

Fast Data Analysis with Integrating Statistical Metadata in Scientific Datasets Jialin Liu, Yong Chen Data-Intensive Scalable Computing Laboratory (DISCL)

Hierarchical I/O Scheduling for Collective I/O

Software Systems Development

Ioannis E. Venetis Department of Computer Engineering and Informatics

Status and Challenges: January 2017

Locality-driven High-level I/O Aggregation

CSCE 990: Advanced Distributed Systems

NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.

CS110: Discussion about Spark

TeraScale Supernova Initiative

Why Threads Are A Bad Idea (for most purposes)

PVFS: A Parallel File System for Linux Clusters

Why Threads Are A Bad Idea (for most purposes)

Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.

Why Threads Are A Bad Idea (for most purposes)

Parallel Feature Identification and Elimination from a CFD Dataset

Contention-Aware Resource Scheduling for Burst Buffer Systems

Presentation transcript:

Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level I/O Aggregation for Processing Scientific Datasets 1

Introduction  Scientific simulations nowadays generate a few terabytes (TB) of data in a single run and the data sizes are expected to reach petabytes (PB) in the near future.  VPIC, Vector Particle in Cell, Plasma physics, 26 bytes per particle, 30TB  Accessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching.

Introduction  Scientific Datasets and Scientific I/O Libraries  PnetCDF, HDF5, ADIOS PnetCDF MPI-IO Parallel File Systems  Scientific I/O libraries allow users to specify array-based logical input  Logical-physical mismatching

Motivation I/O methods in scientific I/O libraries(PnetCDF, ADIOS, HDF5): Independent I/O Collective I/O Nonblocking I/O  Processes collaboration: No  Calls collaboration : No  Processes collaboration: Yes  Calls collaboration : No  Processes collaboration: Yes  Calls collaboration : Yes

Motivation Contention on Storage Server without Aware of Locality … Call 0 … Call 1 … Call i … Two Phase Collective I/O … ag 00 ag 01 ag 02 ag 03 …… … ag 10 ag 11 ag 12 ag 13 ag i0 ag i1 ag i2 ag i3

Performance with Overlapping Calls Conclusion: Overlapping Should be Removed

Idea: High level I/O Aggregation start{0,0,0} length{100,200,100} start{0,0,0} length{100,200,100} start{0,0,100} length{100,200,100} start{0,0,100} length{100,200,100} start{10,20,100} length{10,150,400} start{10,20,100} length{10,150,400} start{10,170,100} length{10,150,400} start{10,170,100} length{10,150,400} Physical Layout Physical Layout sub 0 sub 2 sub 0 sub 2 sub 1 sub 3 sub 1 sub 3 Physical Layout Physical Layout start{0,0,0} length{100,200,200} start{0,0,0} length{100,200,200} start{10,20,100} length{10,300,400} start{10,20,100} length{10,300,400} Call 0 Call 1 Logical Input Decomposition

Idea: High level I/O Aggregation Basic Idea  Figure out the overlapping among requests  Eliminate the overlapping before doing I/O Challenges  How to decompose the requests  How to aggregate the sub-arrays at a high level

Hila: High Level I/O Aggregation Way to figure out the physical layout  Sub-correlation Function  Sub-correlation Set  Lustre Striping: stripe size: t; stripe count: l;  Dataset : Dimension: d; subsets size: m

Hila Algorithm: Prior Step Prior Step: calculate sub-correlation set, one time analysis

Hila Algorithm: Decomposition Main Steps: Request Decomposition and Aggregation

Improvement with Hila Performance Improved with Hila

Improvement with Hila FASM Improved with Hila

Conclusion and Future Work Conclusion  The mismatching between logical access and physical layout can lead to poor performance.  We propose the locality-driven high-level aggregation approach (HiLa) to facilitate the existing I/O methods by eliminating the overlapping among sub-array requests. Future Work  Apply to write operations  Integrate with file systems.

Locality-driven High-level I/O Aggregation for Processing Scientific Datasets Thanks Q&A