Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines.

Slides:



Advertisements
Similar presentations
Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)
Advertisements

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.
SAN DIEGO SUPERCOMPUTER CENTER Niches, Long Tails, and Condos Effectively Supporting Modest-Scale HPC Users 21st High Performance Computing Symposia (HPC'13)
National Center for Atmospheric Research John Clyne 4/27/11 4/26/20111.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO IEEE Symposium of Massive Storage Systems, May 3-5, 2010 Data-Intensive Solutions.
UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.
FAWN: A Fast Array of Wimpy Nodes Authors: David G. Andersen et al. Offence: Jaime Espinosa Chunjing Xiao.
FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.
1 stdchk : A Checkpoint Storage System for Desktop Grid Computing Matei Ripeanu – UBC Sudharshan S. Vazhkudai – ORNL Abdullah Gharaibeh – UBC The University.
Energy-Efficient Query Processing on Embedded CPU-GPU Architectures Xuntao Cheng, Bingsheng He, Chiew Tong Lau Nanyang Technological University, Singapore.
Beyond Music Sharing: An Evaluation of Peer-to-Peer Data Dissemination Techniques in Large Scientific Collaborations Thesis defense: Samer Al-Kiswany.
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.
PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Comparing Coordinated Garbage Collection Algorithms for Arrays of Solid-state Drives Junghee Lee, Youngjae Kim, Sarp Oral, Galen M. Shipman, David A. Dillow,
Usage Centric Green Metrics for Storage Doron Chen, Ealan Henis, Ronen Kat and Dmitry Sotnikov IBM Haifa Research Lab Most of the metrics defined today.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.
Improving Network I/O Virtualization for Cloud Computing.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
Mayuresh Varerkar ECEN 5613 Current Topics Presentation March 30, 2011.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Embedded System Lab. 김해천 Linearly Compressed Pages: A Low- Complexity, Low-Latency Main Memory Compression Framework Gennady Pekhimenko†
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Types of computers Hardware. 8/3/12 Hardware - the tangible, physical parts of the computer which work together to input, process, store and output data.
Impact of the convergence of three worlds: High-Performance Computing, Databases, and Analytics Stefan Manegold
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
Partitioned Multistack Evironments for Exascale Systems Jack Lange Assistant Professor University of Pittsburgh.
A Semi-Preemptive Garbage Collector for Solid State Drives
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
Sunpyo Hong, Hyesoon Kim
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
E-MOS: Efficient Energy Management Policies in Operating Systems
Tackling I/O Issues 1 David Race 16 March 2010.
Copyright © 2010 Hitachi Data Systems. All rights reserved. Confidential – NDA Strictly Required Hitachi Storage Solutions Hitachi HDD Directions HDD Actual.
Application-Aware Traffic Scheduling for Workload Offloading in Mobile Clouds Liang Tong, Wei Gao University of Tennessee – Knoxville IEEE INFOCOM
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
Persistent Memory (PM)
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
How Much SSD Is Useful For Resilience In Supercomputers
T-Share: A Large-Scale Dynamic Taxi Ridesharing Service
Adaptive Cache Partitioning on a Composite Core
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Physics-based simulation for visual computing applications
Electronic Data Collection at Statistics Canada
Parallel Garbage Collection in Solid State Drives (SSDs)
COT 4600 Operating Systems Fall 2009
Contention-Aware Resource Scheduling for Burst Buffer Systems
Presentation transcript:

Embedded System Lab. 최 길 모최 길 모 Kilmo Choi Active Flash: Towards Energy-Efficient, In-Situ Data Analytics on Extreme-Scale Machines Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers

Embedded System Lab. 최 길 모최 길 모 Contents Background Problems and Challenges Active Flash Approach for In-situ Active Computation Feasibility Evaluation ActiveFlash Prototype based on OpenSSD Platform Conclusion

Embedded System Lab. 최 길 모최 길 모 Background

최 길 모최 길 모 Background Scientific Discovery : Two-Step Scientific Simulation Scientific Discovery Data Analysis and Visualization

Embedded System Lab. 최 길 모최 길 모 Background Large-scale leadership computing applications produce big data  GTC produces ~30TB output data per hour at-scale.

Embedded System Lab. 최 길 모최 길 모 Problems and Challenges Offline approach suffers from both performance and energy inefficiencies  Redundant I/O(simulations write, analyses read)  Excessive data movement  Extra energy cost Energy efficiency will become the primary metric for system design, as compute power is expected to increase by x1000 in the next decade with only a x10 increase in power envelope Using simulation nodes for data analysis not acceptable

Embedded System Lab. 최 길 모최 길 모 Active Flash Approach for In-situ SSDs now being adopted in Supercomputers(e.g. Tsbame, Gordon)  higher I/O throughput and storage capability SSD controllers becoming increasingly powerful  multi-core low-power processors Idle cycles at SSD controllers In-situ analysis  analysis on in-transit output data, before it is written to the PFS  eliminates redundant I/O, but it use expensive compute nodes

Embedded System Lab. 최 길 모최 길 모 Active Flash Approach for In-situ Active flash  In-situ analysis on SSDs  Exploit the computation at idle cycles of the SSD controller  Reduce transfer costs  high performance and energy saving

Embedded System Lab. 최 길 모최 길 모 Active Flash Approach for In-situ Three approach to data analysis  offline  active flash  analysis node

Embedded System Lab. 최 길 모최 길 모 Active Computation Feasibility Modeling SSD Deployment  Multiple constraints Capacity  Enough SSDs to sustain output burst Performance  High I/O bandwidth to SSD space  Fast restart from application checkpoints Write durability  SSD write endurance limits

Embedded System Lab. 최 길 모최 길 모 Active Computation Feasibility  Staging Ratio  How many simulation nodes share one common SSD?

Embedded System Lab. 최 길 모최 길 모 Active Computation Feasibility Modeling active computation feasibility  Relatively less compute intensive kernels better suited for active computation(e.g. regex matching)  Dependent on multiple factors : simulation data production rate, staging ratio, I/O bandwidth, etc.

Embedded System Lab. 최 길 모최 길 모 Evaluation Cray XT5 Jaguar supercomputer Samsung PM830 SSD Intel Core i7 processors

Embedded System Lab. 최 길 모최 길 모 Evaluation Feasibility of the analysis node approach  Most data analysis kernels can be placed on SSD controllers without degrading simulation performance  Additional SSDs are not required for supporting in-situ data analysis on SSDs  Analysis node approach is feasible at higher staging ratios, but at additional infrastructure cost

Embedded System Lab. 최 길 모최 길 모 Evaluation Energy and cost saving analysis  Staging ratio = 10  Active Flash and offline approach : y1 analysis node : y2  Offline model consumes more energy due to the I/O wait time

Embedded System Lab. 최 길 모최 길 모 Conclusion Extant approaches to scientific data analysis(e.g. offline and analysis nodes) are stymied by several inefficiencies in data movement and energy consumption that results in sub-optimal performance Active flash is better than either approaches for all of the aforementioned metrics