UNCLASSIFIED: LA-UR 11-11726 Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National.

Slides:



Advertisements
Similar presentations
Jialin Liu, Bradly Crysler, Yin Lu, Yong Chen Oct. 15. Seminar Data-Intensive Scalable Computing Laboratory (DISCL) Locality-driven High-level.
Advertisements

EUFORIA FP7-INFRASTRUCTURES , Grant JRA4 Overview and plans M. Haefele, E. Sonnendrücker Euforia kick-off meeting 22 January 2008 Gothenburg.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Sensor Network Platforms and Tools
January 2002FAST 2002 WIP Presentation1 The Armada framework for parallel I/O on computational grids Ron Oldfield and David Kotz Department of Computer.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
XSEDE 13 July 24, Galaxy Team: PSC Team:
Advanced Scientific Visualization Paul Navrátil 28 May 2009.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
CERN IT Department CH-1211 Geneva 23 Switzerland t XLDB 2010 (Extremely Large Databases) conference summary Dawid Wójcik.
10 -1  The Term Project demands in-depth research and investigated reporting. All reported contents, figures, and tables must be originally generated.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA U N C L A S S I F I E D The Case for Monitoring and Testing David.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
VO Sandpit, November 2009 e-Infrastructure to enable EO and Climate Science Dr Victoria Bennett Centre for Environmental Data Archival (CEDA)
Science Problem: Cognitive capacity (human/scientist understanding), storage and I/O have not kept up with our capacity to generate massive amounts physics-based.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Introducing MapReduce to High End Computing Grant Mackey, Julio Lopez, Saba Sehrish, John Bent, Salman Habib, Jun Wang University of Central Florida, Carnegie.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
June 29 San FranciscoSciDAC 2005 Terascale Supernova Initiative Discovering New Dynamics of Core-Collapse Supernova Shock Waves John M. Blondin NC State.
Mark Rast Laboratory for Atmospheric and Space Physics Department of Astrophysical and Planetary Sciences University of Colorado, Boulder Kiepenheuer-Institut.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Site Report DOECGF April 26, 2011 W. Alan Scott Sandia National Laboratories Sandia National Laboratories is a multi-program laboratory managed and operated.
1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Multimedia Systems and Communication Research Multimedia Systems and Communication Research Department of Electrical and Computer Engineering Multimedia.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Chapter 9  Definition of terms  List advantages of client/server architecture  Explain three application components:
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Tackling I/O Issues 1 David Race 16 March 2010.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy's NNSA UNCLASSIFIED Optimizing the Energy Usage and Cognitive Value of.
EGEE is a project funded by the European Union under contract IST Generic Applications Requirements Roberto Barbera NA4 Generic Applications.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Online Performance Analysis and Visualization of Large-Scale Parallel Applications Kai Li, Allen D. Malony, Sameer Shende, Robert Bell Performance Research.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
Jialin Liu, Surendra Byna, Yong Chen Oct Data-Intensive Scalable Computing Laboratory (DISCL) Lawrence Berkeley National Lab (LBNL) Segmented.
Percipient StorAGe for Exascale Data Centric Computing Exascale Storage Architecture based on “Mero” Object Store Giuseppe Congiu Seagate Systems UK.
Geoffrey Fox Panel Talk: February
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Locality-driven High-level I/O Aggregation
Modern Data Management
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
Tamas Szalay, Volker Springel, Gerard Lemson
Algorithms for Big Data Delivery over the Internet of Things
University of Technology
Ray-Cast Rendering in VTK-m
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Parallel I/O System for Massively Parallel Processors
ICCCN14 CPS Cloud Panel Jun Wang
Physics-based simulation for visual computing applications
TeraScale Supernova Initiative
Building and running HPC apps in Windows Azure
Database System Architectures
Parallel Feature Identification and Elimination from a CFD Dataset
Expand portfolio of EGI services
Presentation transcript:

UNCLASSIFIED: LA-UR Data Infrastructure for Massive Scientific Visualization and Analysis James Ahrens & Christopher Mitchell Los Alamos National Laboratory 1

UNCLASSIFIED: LA-UR Large Data in High Performance Computing HPC users are simulating the world to better understand it and the products we use. – Climate, High Energy Physics, Defense Problems, etc. Current simulations are detailed enough to produce hundreds of TB if not PB of data. Raw data is useless without analysis and/or visualization. 2

UNCLASSIFIED: LA-UR A Scalable Open-source Visualization and Analysis Tool - ParaView ParaView – – Integrated analysis and visualization platform for high- performance computing – Data Parallelism Works well due to massive size of each dataset – Data Streaming Incrementally process what is requested – Multi-Resolution Immediate reduced resolution results – Full resolution streams in over time 3

UNCLASSIFIED: LA-UR Example: Plasma Data in ParaView GB/time step for one vector field

UNCLASSIFIED: LA-UR Integration with Data Intensive Technologies Project initiated to rethink I/O in ParaView and storage for HPC analytics. Currently use Parallel File System to access data via a storage network Data too expensive to ship from storage to processor – No longer “interactive” to user Rewrote ParaView reader to interface with the Hadoop Distributed File System. Process data in situ to where it is stored. Key is scheduling algorithm – Match nodes with data to ParaView processes – No network transfers! 5

UNCLASSIFIED: LA-UR Integration with Data Intensive Technologies Seeing near 3x improvement in read times compared to existing solutions. – What previously took ~1.5 seconds to load can be done in less than 0.5 seconds per time step! Lower standard deviation on results – More consistent read times. 6