Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory 2007 Scientific Data Management All Hands.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
File Systems.
Katie Antypas NERSC User Services Lawrence Berkeley National Lab NUG Meeting 1 February 2012 Best Practices for Reading and Writing Data on HPC Systems.
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Data Locality Aware Strategy for Two-Phase Collective I/O. Rosa Filgueira, David E.Singh, Juan C. Pichel, Florin Isaila, and Jesús Carretero. Universidad.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Grid IO APIs William Gropp Mathematics and Computer Science Division.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Early Experience with Out-of-Core Applications on the Cray XMT Daniel Chavarría-Miranda §, Andrés Márquez §, Jarek Nieplocha §, Kristyn Maschhoff † and.
1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.
Tanzima Z. Islam, Saurabh Bagchi, Rudolf Eigenmann – Purdue University Kathryn Mohror, Adam Moody, Bronis R. de Supinski – Lawrence Livermore National.
1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Synchronizing Lustre file systems Dénes Németh Balázs Fülöp Dr. János Török Dr. Imre.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Optimizing Performance of HPC Storage Systems
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
MapReduce How to painlessly process terabytes of data.
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
DOE PI Meeting at BNL 1 Lightweight High-performance I/O for Data-intensive Computing Jun Wang Computer Architecture and Storage System Laboratory (CASS)
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Parallel IO for Cluster Computing Tran, Van Hoai.
ECE 456 Computer Architecture Lecture #9 – Input/Output Instructor: Dr. Honggang Wang Fall 2013.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD
An Introduction to GPFS
DOSAS: Mitigating the Resource Contention in Active Storage Systems Chao Chen 1, Yong Chen 1 and Philip C. Roth 2 1 Texas Tech University 2 Oak Ridge National.
Model-driven Data Layout Selection for Improving Read Performance Jialin Liu 1, Bin Dong 2, Surendra Byna 2, Kesheng Wu 2, Yong Chen 1 Texas Tech University.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Lustre File System chris. Outlines  What is lustre  How does it works  Features  Performance.
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Map reduce Cs 595 Lecture 11.
NFV Compute Acceleration APIs and Evaluation
Andy Wang COP 5611 Advanced Operating Systems
Course Introduction Dr. Eggen COP 6611 Advanced Operating Systems
Research Introduction
Grid Computing.
Parallel Data Laboratory, Carnegie Mellon University
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
湖南大学-信息科学与工程学院-计算机与科学系
Lecture 15 Reading: Bacon 7.6, 7.7
Andy Wang COP 5611 Advanced Operating Systems
CS 345A Data Mining MapReduce This presentation has been altered.
COMP755 Advanced Operating Systems
Presentation transcript:

Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory 2007 Scientific Data Management All Hands Meeting Snoqualmie, WA

2 Outline Description of the Active Storage Concept New Implementation of Active Storage Programming Framework Examples and Applications

3 Active Storage in Parallel Filesystems Active Storage exploits the old concept of moving computing to the data source to avoid data transfer penalties applications use compute resources on the storage nodes Storage nodes are full-fledged computers with lots of CPU power available, and standard OSes and Processors P P P P Network FS compute nodes I/O nodes Y=foo(X) x Y P P P P Network FS compute nodes I/O nodes Y=foo(X) Active Storage Traditional Approach

4 Example BLAS DSCAL on disk Y = α. Y Experiment – Traditional: The input file is read from filesystem, and the output file is written to the same file system. The input file has 120,586,240 doubles. – Active Storage: Each server receives the factor, reads the array of doubles from its disk locally, and stores the resulting array on the same disk. Each server processes 120,586,240/N doubles, where N is the number of servers Speedup contributed to using multiple OSTs and avoiding data movement between client and servers (no network bottleneck)

5 Related Work Active Disk/Storage concept was introduced a decade ago to use processing resources ‘near’ the disk On the Disk Controller. On Processors connected to disks. Reduce network bandwidth/latency limitations. References DiskOS Stream Based model (ASPLOS’98: Acharya, Uysal, Saltz) Active Storage For Large-Scale Data Mining and Multimedia (VLDB ’98: Riedel, Gibson, Faloutsos) Research proved Active Disk idea interesting, but Difficult to take advantage of in practice Processors in disk controllers not designed for the purpose Vendors have not been providing SDK Y=foo(X)

6 Lustre Architecture Client OST MDS O(10) OST O(1000) O(10000) Network Directory Metadata & concurrency File IO & Locking Recovery, File Status, File Creation Client

7 Active Storage in Kernel Space When the client writes to the file A: ASOBD makes a copy of data, and sends it to ASDEV The PC reads from and writes to the char device Original data in A, processed data in B Char device Disk NAL OBDfilter OST Ldiskfs Kernel space ASOBD ASDEV User space Processing component AB Active Storage Module

8 Active Storage Application High Throughput Proteomics 9.4 Tesla High Throughput Mass Spectrometer 1 Experiment per hour 5000 spectra per experiment 4 MByte per spectrum Per instrument: 20 Gbytes per hour 480 Gbytes per day Application Problem Given 2 float input number for target mass and tolerance, find all the possible protein sequences that would fit into specified range Active Storage Solution Each OST receives its part of the float pair sent by the client stores the resulting processing output in its Lustre OBD (object-based disk) Next generation technology will increase data rates x200

9 SC’2004 StorCloud Most Innovative Use Award Sustained 4GB/s Active Storage write processing Proteomics Application 320 TB Lustre GB disks 40 Lustre OSS's running Active Storage 4 Logical Disks (160 OST’s) 2 Xeon Processors 1 MDS 1 Client creating files Lustre OST Client System Gigabit Network Lustre OSS 39 Lustre OST Lustre OST Lustre MDS Lustre OSS 0 Lustre OSS 38

10 Active Storage in User Space Problems with the Kernel Space implementation Portability, maintenance, extra memory copies We developed a User Space implementation Most file system allows the storage nodes to be clients Most file system allows to create files with a given layout Our framework launches Processing Components on the storage nodes which have the files to be processed Processing Components read from and write to local files Highly Portable Implementation Used with Lustre 1.6, PVFS2 2.7 Bug in Lustre 1.4 (and SFS): frequent kernel crashes when mounting the file system on the storage nodes Held initial discussions with IBM on GPFS port

11 Active Storage in User Space Network Interconnect Compute Node Compute Node Compute Node Storage Node 0 Storage Node N-1 Metadata Server Parallel Filesystem's Components (also clients of the filesystem)..... Parallel Filesystem's Clients..... Data I/O Traffic asmaster ASRF Processing Component ASRF Processing Component Compute Node Active Storage Runtime Framework

12 Performance Evaluation AMINOGEN Bioinformatics Application Input file: ASCII file, mass and tolerance pairs, one per line. Total size = 44 bytes Output file: binary file which contains amino acid sequences. Total size = 14.2 GB Overall execution time

13 Enhanced Implementation of Active Storage for Striped Files Striped Files broadly used for performance not supported by earlier AS work Enhanced Implementation Use striping data from filesystem New component: AS Mapper Locality awareness in Processing Component: compute on local chunks Climate application with netCDF Computes statistics of key variables from Global Cloud Resolving simulation (U. Colorado) Eliminated >95% network traffic LIBAS read Local chunks 2 Contiguous file Write Local chunks Local chunks 234 Active Storage Runtime Framework... Processing component read callwrite call GLIBC readwrite 01 Processing Component

14 Examples and Applications Juan Piernas-Canovas

15 Active Storage in DSCAL Example OST31 /lustre OST43 /lustre Comp. Node /lustre Comp. Node /lustre Comp. Node /lustre Comp. Node /lustre Data I/O Traffic Parallel Filesystem's Clients Parallel Filesystem's Components dscal Network Interconnect MDS & MGS asmaster Doubles.20 Doubles.15.outDoubles.20.out Doubles.15

16 Non-Striped Files /lustre/doubles.* /lustre/doubles.15 in OST43 /lustre/doubles.15.out in OST43 (new file)

17 Climate Application Collaboration with SciDAC GCRM SAP (Karen) Problem: Compute averages for variables generated from scientific simulation stored in striped output files geodesic grid netCDF data format Objective: Optimize performance by exploiting data locality in AS Processing Components to minimize network traffic

18 Non-Striped Files /lustre/doubles.* /lustre/doubles.20 in OST31 /lustre/doubles.20.out in OST31 (new file) Execution: /lustre/asd/asmaster /lustre/dscal.xml

19 Processing Patterns In user space, it is easy to support different processing patterns: Active Storage PC Client data stream Active Storage Client data stream PC 1W  01W  #W

20 No Output File (Pattern 1W  0) /lustre/doubles.* /lustre/dscal1 /lustre/doubles.15 in OST43

21 Several Output Files (Pattern 1W  #W) /lustre/doubles.* @.err /lustre/doubles.15 in OST43 /lustre/doubles.15.out in OST43 (new file) /lustre/doubles.15.err in OST43 (new file)

22 Transparent Access to Striped Files /lustre/doubles.* Transparent access to the chunks of the input file Transparent access to the chunks of the output file New output file with the same striping of the input file

23 Mapper and Striped netCDF Files Network Interconnect Storage Node 0 Metadata Server Data I/O Traffic asmaster ASRF PC Storage Node 1 ASRF Storage Node 2 ASRF PC Storage Node N-1 ASRF HeaderVar. data Mapper  (0, 2) Striped netCDF file......

24 Processing of netCDF /lustre/data.* ta ta ${CHUNKNUM} ${CHUNKSIZE} Striping information of /lustre/data.37 Variable name in the netCDF file /lustre/data.37 Non-striped output file /lustre/data.37.out-ost43

25 PVFS2 support /lustre/doubles.* pvfs /pvfs2 PVFS2

26 Local File System with Virtual Striping /lustre/doubles.* localfs 8: Virtual striping: - stripe size: 1MB - stripe count: 8 Local file system

27 Further Information Technical paper J. Piernas, J. Nieplocha, E. Felix, “Evaluation of Active Storage Strategies for the Lustre Parallel Filesystem”, Proc. SC’07 Website: Upcoming release in December 2007 Support for Lustre 1.6, PVFS2, and Linux local file systems Source code available now under request. Just send us an ! Jarek Nieplocha Juan Piernas-Canovas

Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory Questions?