1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.

Slides:



Advertisements
Similar presentations
1 The SciDAC Scientific Data Management Center: Infrastructure and Results Arie Shoshani Lawrence Berkeley National Laboratory SC 2004 November, 2004.
Advertisements

Earth System Curator Spanning the Gap Between Models and Datasets.
Metadata Development in the Earth System Curator Spanning the Gap Between Models and Datasets Rocky Dunlap, Georgia Tech.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing P. Balaji, Argonne National Laboratory W. Feng and J. Archuleta, Virginia Tech.
SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Multi-agent based High-Dimensional Cluster Analysis SciDAC SDM-ISIC Kickoff Meeting July.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
The Earth System Grid Discovery and Semantic Web Technologies Line Pouchard Oak Ridge National Laboratory Luca Cinquini, Gary Strand National Center for.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
Scientific Data Management (SDM)
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
SAN DIEGO SUPERCOMPUTER CENTER HDF5/SRB Integration August 28, 2006 Mike Wan SRB, SDSC Peter Cao
SDM meeting, July 10-11, 2001Area 3 Report Data mining and discovery of access patterns 3a.i) Adaptive file caching in a distributed system (LBNL) 3b.i)
Database System Concepts and Architecture
1 Scientific Data Management Center DOE Laboratories: ANL: Rob Ross LBNL:Doron Rotem LLNL:Chandrika Kamath ORNL: Nagiza Samatova.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Chandrika Kamath and Imola K. Fodor Center for Applied Scientific Computing Lawrence Livermore National Laboratory Gatlinburg, TN March 26-27, 2002 Dimension.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
1 Parallel and Grid I/O Infrastructure Rob Ross, Argonne National Lab Parallel Disk Access and Grid I/O (P4) SDM All Hands Meeting March 26, 2002.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.
High Energy and Nuclear Physics Collaborations and Links Stu Loken Berkeley Lab HENP Field Representative.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad.
Computer Science Research and Development Department Computing Sciences Directorate, L B N L 1 Storage Management and Data Mining in High Energy Physics.
June 29 San FranciscoSciDAC 2005 Terascale Supernova Initiative Discovering New Dynamics of Core-Collapse Supernova Shock Waves John M. Blondin NC State.
SDM Center’s Data Mining & Analysis SDM Center Parallel Statistical Analysis with RScaLAPACK Parallel, Remote & Interactive Visual Analysis with ASPECT.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.
1 Scientific Data Management Center(ISIC) contains extensive publication list.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Center for Applied Scientific Computing Lawrence Livermore National Laboratory.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
SDM center Supporting Heterogeneous Data Access in Genomics Terence Critchlow Ling Liu, Calton Pu GT Reagan Moore, Bertam Ludaescher, SDSC Amarnath Gupta.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Presented by Scientific Data Management Center Nagiza F. Samatova Oak Ridge National Laboratory Arie Shoshani (PI) Lawrence Berkeley National Laboratory.
GA 1 CASC Discovery of Access Patterns to Scientific Simulation Data Ghaleb Abdulla LLNL Center for Applied Scientific Computing.
DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
The Performance Evaluation Research Center (PERC) Participating Institutions: Argonne Natl. Lab.Univ. of California, San Diego Lawrence Berkeley Natl.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
SDM Center Parallel I/O Storage Efficient Access Team.
An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science.
Center for Component Technology for Terascale Simulation Software (CCTTSS) 110 April 2002CCA Forum, Townsend, TN This work has been sponsored by the Mathematics,
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Presented by Robust Storage Management On Desktop, in Machine Room, and Beyond Xiaosong Ma Computer Science and Mathematics Oak Ridge National Laboratory.
MATLAB Distributed, and Other Toolboxes
Problem: Ecological data needed to address critical questions are dispersed, heterogeneous, and complex Solution: An internet-based mechanism to discover,
Scientific Data Management contains extensive publication list
SDM workshop Strawman report History and Progress and Goal.
Metadata Development in the Earth System Curator
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory

2 Arie Shoshani, LBNL SDM center Participants Center Director: Arie Shoshani DOE Laboratories: ANL:Bill Gropp (coordinating PI) Rob Ross LBNL:Ekow Otoo Arie Shoshani (coordinating PI) LLNL:Terence Critchlow (coordinating PI) ORNL: Randy Burris Thomas Potok (coordinating PI) Universities: Georgia Institute of Technology Ling Liu Calton Pu (coordinating PI) North Carolina State University Mladen Vouk (coordinating PI) Northwestern University Alok Choudhary (coordinating PI) Wei-Keng Liao UC San Diego (Supercomputer Center): Amarnath Gupta Reagan Moore (coordinating PI)

3 Arie Shoshani, LBNL SDM center Original Goals and Framework Coordinated framework for theCoordinated framework for the unification, development, deployment, and reuse of scientific data management software FrameworkFramework 4 areas Very large databases distributed databases heterogeneous databases data mining (+ agent technology) 4 tier levels Storage level File level Dataset level federated data level

4 Arie Shoshani, LBNL SDM center Master Diagram 5) Agent technology c) Dataset Level b) File Level a) Storage Level 1) Storage and retrieval of Very large datasets 2) Access optimization of distributed data Parallel I/O: improving parallel access from clusters (ANL, NWU) MPI I/O: implementation based on file-level hints (ANL, NWU) Multi-tier metadata system for querying heterogeneous data sources (LLNL, Georgia Tech) Knowledge-based federation of heterogeneous databases (SDSC) Low level API for grid I/O (ANL) Optimization of low-level data storage, retrieval and transport (ORNL) [Grid Enabling Technology] Analysis of application-level query patterns (LLNL, NWU) Optimizing shared access to tertiary storage (LBNL, ORNL) High-dimensional indexing techniques (LBNL) Enabling communication among tools and data (ORNL, NCSU) d) Dataset Federation Level Multi-agent high-dimensional cluster analysis (ORNL) Adaptive file caching in a distributed system (LBNL) Dimension reduction and sampling (LLNL, LBNL) 3) Data mining and discovery of access patterns 4) Distributed, heterogeneous data access

5 Arie Shoshani, LBNL SDM center Tapes Disks Scientific Simulations & experiments Scientific Data Management ISIC Scientific Analysis & Discovery Data Manipulation: Getting files from Tape archive Extracting subset of data from files Reformatting data Getting data from heterogeneous, distributed systems moving data over the network Petabytes Terabytes Tapes Disks Petabytes Terabytes Data Manipulation: ~80% time ~20% time ~20% time ~80% time Using SDM-ISIC technology Scientific Analysis & Discovery Climate Modeling Astrophysics Genomics and Proteomics High Energy Physics Optimizing shared access from mass storage systems Metadata and knowledge- based federations API for Grid I/O High-dimensional cluster analysis High-dimensional indexing Adaptive file caching Agents … SDM-ISIC Technology DOE Labs: ANL, LBNL, LLNL, ORNL Universities: GTech, NCSU, NWU, SDSC Current Goal Goals Optimize and simplify: access to very large datasets access to distributed data access of heterogeneous data data mining of very large datasets

6 Arie Shoshani, LBNL SDM center Benefits to ApplicationsBenefits to Applications Efficiency Example: by removing I/O bottlenecks – matching storage structures to the application Effectiveness Example: by making access to data from tertiary storage or various sites on the data grid “transparent”, more effective data exploration is possible New algorithms Example: by developing a more effective high-dimensional clustering technique for large datasets, discovery of new correlations are possible Enabling ad-hoc exploration of data Example: by enabling a “run and render” capability to visualize simulation output while the code is running, it is possible to monitor and steer a long-running simulation

7 Arie Shoshani, LBNL SDM center Current Projects 1)High-Dimensional Clustering Target applications: Astrophysics, Climate Modeling LLNL, ORNL Scientific problem targeted: To understand the mechanism(s) behind core-collapse supernovae it is crucial to explore and quantify: The correlations between the neutrino flux and stellar core convection The correlations between convection and spatial dimensionality The correlations between convection and rotation Contact: Anthony Mezzacappa, ORNL Scientific problem targeted: Separating volcano and ENSO (El Nino Southern oscillation) signals from the rest of the climate data to study variability in temperature Contact: Ben Santer, PCMDI, LLNL

8 Arie Shoshani, LBNL SDM center Current Projects 2) Efficient Parallel I/O to Disk Storage Target application: Astrophysics ANL, NWU, LLNL Scientific problem targeted: Astrophysics simulation code (FLASH): Early production runs spent as much as half of the time writing checkpoint and vizualization data Contact: Mike Zingale, U of Chicago Scientific problem targeted: improving parallel I/O efficiency for tiled displays - a popular medium for collaborative viewing of high-resolution visualization Astrophysics data Contact: Mike Papka, ANL Scientific problem targeted: Query pattern analysis for astrophysics star data devising disk layout for the data such that overall data access time across multiple applications and users is reduced Contact: LLNL

9 Arie Shoshani, LBNL SDM center Current Projects 3) Providing transparent access to grid data Target application: High Energy Physics LBNL, ORNL Scientific problem targeted: given a logical request (expressed on event attributes), get relevant data from grid sites and tertiary storage to application code without human intervention Contact: Doug Olson, LBNL Contact: Stephen Gowdy, SLAC Contact: Jackie Chan, Sandia Livermore (combustion)

10 Arie Shoshani, LBNL SDM center Current Projects 4) Heterogeneous Data Federation Target application: Biology LLNL, SDSC, GTU, NCSU, ORNL Scientific problem targeted: to developing our infrastructure in support of cancer researchers at LLNL, who expect to use it to help identify genes which respond to low-doses of radiation. This problem is difficult because the information required by the scientists is spread across many, independent, web-based data sources - each using their own interfaces and data formats Contact: Matt Coleman, LLNL

11 Arie Shoshani, LBNL SDM center

12 Arie Shoshani, LBNL SDM center

13 Arie Shoshani, LBNL SDM center