SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
SDM center All-hands breakout session notes March 2002 Gatlinburg TN.
SDM Center Coupling Parallel IO with Remote Data Access Ekow Otoo, Arie Shoshani, Doron Rotem, and Alex Sim Lawrence Berkeley National Lab.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
SAN DIEGO SUPERCOMPUTER CENTER HDF5/SRB Integration July 10, 2006 Mike Wan SRB, SDSC Peter Cao
EInfrastructures (Internet and Grids) US Resource Centers Perspective: implementation and execution challenges Alan Blatecky Executive Director SDSC.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Peer-to-peer archival data trading Brian Cooper Joint work with Hector Garcia-Molina (and others) Stanford University.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
14 July 2000TWIST George Brett NLANR Distributed Applications Support Team (NCSA/UIUC)
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Project 3.3 Optimizing Shared Access to Tertiary Storage March, 2002 Presenter - Randy Burris.
DISTRIBUTED DATA FLOW WEB-SERVICES FOR ACCESSING AND PROCESSING OF BIG DATA SETS IN EARTH SCIENCES A.A. Poyda 1, M.N. Zhizhin 1, D.P. Medvedev 2, D.Y.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Igor Gaponenko ( On behalf of LCLS / PCDS ).  An integral part of the LCLS Computing System  Provides:  Mid-term (1 year) storage for experimental.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center(SDM-ISIC) Arie Shoshani Computing Sciences Directorate Lawrence Berkeley National Laboratory.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Fusion-SDM (1) Problem description –Each run in future: ¼ Trillion particles, 10 variables, 8 bytes –Each time step, generated every 60 sec is (250x10^^9)x8x10.
HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting.
1 Arie Shoshani, LBNL SDM center Scientific Data Management Center (Integrated Software Infrastructure Center – ISIC) Arie Shoshani All Hands Meeting March.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
1 Scientific Data Management Center(ISIC) contains extensive publication list.
Presented by Scientific Data Management Center Nagiza F. Samatova Network and Cluster Computing Computer Sciences and Mathematics Division.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
F. Douglas Swesty, DOE Office of Science Data Management Workshop, SLAC March Data Management Needs for Nuclear-Astrophysical Simulation at the Ultrascale.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
January 26, 2003Eric Hjort HRMs in STAR Eric Hjort, LBNL (STAR/PPDG Collaborations)
Introduction to The Storage Resource.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
SDM Center Coupling Parallel IO to SRMs for Remote Data Access Ekow Otoo, Arie Shoshani and Alex Sim Lawrence Berkeley National Laboratory.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Supercomputing 2006 Scientific Data Management Center Lead Institution: LBNL; PI: Arie Shoshani Laboratories: ANL, ORNL, LBNL, LLNL, PNNL Universities:
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
SDM Center Parallel I/O Storage Efficient Access Team.
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
1 Scientific Data Management Group LBNL SRM related demos SC 2002 DemosDemos Robust File Replication of Massive Datasets on the Grid GridFTP-HPSS access.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Accessing the VI-SEEM infrastructure
Joseph JaJa, Mike Smorul, and Sangchul Song
Grid Portal Services IeSE (the Integrated e-Science Environment)
VI-SEEM Data Repository
Scientific Data Management contains extensive publication list
SDM workshop Strawman report History and Progress and Goal.
OGCE Portal Applications for Grid Computing
TeraScale Supernova Initiative
The New Internet2 Network: Expected Uses and Application Communities
Presentation transcript:

SDM center Questions – Dave Nelson What kind of processing / queries / searches biologists do over microarray data? –Range query on a spot? –Range query on multiple spots? –Correlation between spots? 10,000 arrays x 40,000 spots x 10 values/spot x = 4 billion values –Organize with 40,000 indexes over 10,000 values? –Organize as 4 billion values into one index? Explain problem of chip variation –How to pick winner of 10 4 tests How do you represent pathways in a data structure?

SDM center Question – P1 Who will do “workflow”? –NCSU, SDSC, GATech? –Why not use NCSU –What about WFS from W3C? Why have Xwrap composition –Can use workflow? How would dynamic workflow fit the services flow model?

SDM center Question – P1 Is there a chance that Matt will really use that in a year for real work? (Same question to P2, P3, P4)

SDM center Question – P2 Astrophysics –3D Hydro run – 20 TB 5 variables x (1024) 3 x 1000 time steps x 4 bytes –Can you apply data reduction techniques to that? –How long will it take? –Reduced data: 100:1 factor => 200 GB –Can you visualize that?

SDM center Questions – P2 Monitoring (Ghaleb) –How can you minimize monitoring interference? –Relationship to Grid monitoring activities Grid Forum WG: Defining schemas now

SDM center Questions – P3 How do you migrate a terabyte of data? –e.g file each a gigabyte –At 10 MB/s – 10 5 seconds = 30 hours – OK? –At 1 MB/s hours - OK? –Is using HRMs and DRMs an OK solution?

SDM center Questions – P3 How do you do filtering of NetCDF files from HPSS (Randy)? –Can you select one variable (temp) over pacific ocean for 10 years? –With Hsi – pick “temp” value out of 30 variable? –Would it make sense for HRM to call NetCDF library instead? (discussion with Dean Williams)

SDM center Questions – P4 PVFS, MPI-IO, ROMIO –When to use what? –Give examples –Can they be layered? How? –When will you layer them? How would you use these over GridFTP?

SDM center Questions – P4 Suppose you got a hint to organize NetCDF files as one variable at a time. –How would you use that? Reorganize dataset over all time steps? How? Is using tiled data sufficiently effective so no reorganization is needed? AMR (Wei-Kang) –Tree structure, how stored? –How does it relate to Collela’s work?

SDM center Question – P4 If security is not important –Can SRMs use your specialized FTP? –What needs to be install on both ends? (can it be done dynamically?) –If you move data very effectively Why bother if network is bottleneck OK to hog network? What about doing simple concurrent GridFTP?

SDM center Organizational issues Next meeting –September (conflicts?) –San Diego –Reg fee - $100 Conference calls –Format OK? Useful? –Attendance? –Reports on web? Wedge white paper –Based on “study” – National Academy of Sciences –Something we are interested in seeing funded –Not an institutional view

SDM center General Intellectual property –DOE position: Layered collection of licenses –Do we have to have an open source license? –Leave it to project by project –Get DOE policy statement to all – ask Fred Johnson

SDM center General Public relations –Presence in conferences and meetings Tutorials in next all hands On-line tutorials Tutorials in conferences - Arie Acknowledgement in papers Consider workshop on addition to all hands –Web site –Conf calls – lab notebook –Services, products?

SDM center General Supercomputing –Integrate projects – one story for each of our project –Assuming centralized booth –Common view – a poster at least –Arie to ask for space! What’s the product of SDM center –Services (backed up by products) –Based on wsdl/soap? –Also “components”

SDM center Wedge white paper Data preservation across DOE Moving terabytes for security reasons –Evacuation / natural disaster –Move data to an emergency team Automatic replication –Resilience in case of a cyber-attack –Metadata – replicated / distributed –Look at Garcia-Molina Identify scientific experts –Automatic collections of metadata, etc. –Data mining for discovering Sensors in sensitive locations –Event monitoring of sensor databases –Data mining from image data –Soft sensors Collected from multiple distributed sesnsors

SDM center P3 - Next Getting ORNL involved with BNL and NERSC –Plan to get HRM to have up to a TBs –Plan to get tape cartridges up 3-5 TBs –Does PVFS work on Solaris? –Or run HRM on Linux?

SDM center P3 - Scenarios Bulk data transfer scenario –Have HRMs used to replicate STAR data Demo: show performance Develop: dynamic log visualization –Experimentation Parallel striping Partial file transfers Filtering of files (NetCDF, HDF)

SDM center P3 - Scenarios HENP data analysis scenario –STAR Analysis framework on BNL & NERSC –Replicate “medium” level access files in ORNL –Use bitmap index –Long term – use PAM –Demo: select files from the most accessible location