Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy University of Hawaii Honolulu, Hawaii June 19, 2008 LSST Data.

Slides:



Advertisements
Similar presentations
NOAO Brown Bag May 13, 2008 Tucson, AZ 1 Data Management Middleware NOAO Brown Bag Tucson, AZ May 13, 2008 Jeff Kantor LSST Corporation.
Advertisements

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Chapter 13 Physical Architecture Layer Design
Building a Framework for Data Preservation of Large-Scale Astronomical Data ADASS London, UK September 23-26, 2007 Jeffrey Kantor (LSST Corporation), Ray.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
Simo Niskala Teemu Pasanen
Introduction to LSST Data Management Jeffrey Kantor Data Management Project Manager.
STEALTH Content Store for SharePoint using Windows Azure  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
LIGO-G E ITR 2003 DMT Sub-Project John G. Zweizig LIGO/Caltech Argonne, May 10, 2004.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
OSG Public Storage and iRODS
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
1 New Frontiers with LSST: leveraging world facilities Tony Tyson Director, LSST Project University of California, Davis Science with the 8-10 m telescopes.
1 Radio Astronomy in the LSST Era – NRAO, Charlottesville, VA – May 6-8 th LSST Survey Data Products Mario Juric LSST Data Management Project Scientist.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
INFSO-RI Enabling Grids for E-sciencE EGEODE VO « Expanding GEosciences On DEmand » Geocluster©: Generic Seismic Processing Platform.
Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit Institute for Astronomy School of Physics University of.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
NOAO Brown Bag Tucson, AZ March 11, 2008 Jeff Kantor LSST Corporation Requirements Flowdown with LSST SysML and UML Models.
ALICE-USA Grid-Deployment Plans (By the way, ALICE is an LHC Experiment, TOO!) Or (We Sometimes Feel Like and “AliEn” in our own Home…) Larry Pinsky—Computing.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
LSST: Preparing for the Data Avalanche through Partitioning, Parallelization, and Provenance Kirk Borne (Perot Systems Corporation / NASA GSFC and George.
Team Principal investigator and team leader: Ahmed Elmagarmid Task leaders –Park –Spafford –Ghafoor –Korb –Research team: 18 faculty members.
DC2 Post-Mortem/DC3 Scoping February 5 - 6, 2008 DC3 Goals and Objectives Jeff Kantor DM System Manager Tim Axelrod DM System Scientist.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
Test and Integration Robyn Allsman LSST Corp DC3 Applications Design Workshop IPAC August , 2008.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Archive Access Tool Review of SSS Readiness for EVLA Shared Risk Observing, June 5, 2009 John Benson Scientist.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Data Sharing. Data Sharing in a Sysplex Connecting a large number of systems together brings with it special considerations, such as how the large number.
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
LSST VAO Meeting March 24, 2011 Tucson, AZ. Headquarters Site Headquarters Facility Observatory Management Science Operations Education and Public Outreach.
1 NGVLA WORKSHOP – DECEMBER 8, 2015 – NRAO SOCORRO, NM Name of Meeting Location Date - Change in Slide Master Computing for ngVLA: Lessons from LSST Jeffrey.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
The Large Synoptic Survey Telescope Project Bob Mann LSST:UK Project Leader Wide-Field Astronomy Unit, Edinburgh.
Ray Plante for the DES Collaboration BIRP Meeting August 12, 2004 Tucson Fermilab, U Illinois, U Chicago, LBNL, CTIO/NOAO DES Data Management Ray Plante.
Scalability Requirements and Implementation Options.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
The LSST Data Processing Software Stack Tim Jenness (LSST Tucson) for the LSST Data Management Team Abstract The Large Synoptic Survey Telescope (LSST)
Chapter 2 Database Environment.
NORDUnet NORDUnet e-Infrastrucure: Grids and Hybrid Networks Lars Fischer CTO, NORDUnet Fall 2006 Internet2 Member Meeting, Chicago.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"
1 SPIE Astronomical Telescopes + Instrumentation | 26 June - 1 July 2016 | Edinburgh, United Kingdom Investigating interoperability of the LSST Data Management.
1 OBSERVATORY CONTROL SYSTEM (OCS) FRANCISCO DELGADO OCS CAM.
Commissioning Planning
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
WP18, High-speed data recording Krzysztof Wrona, European XFEL
From LSE-30: Observatory System Spec.
Clouds , Grids and Clusters
LSST Commissioning Overview and Data Plan Charles (Chuck) Claver Beth Willman LSST System Scientist LSST Deputy Director SAC Meeting.
Data Flows in ACTRIS: Considerations for Planning the Future
Optical Survey Astronomy DATA at NCSA
Grid Computing.
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
LQCD Computing Operations
Ákos Frohner EGEE'08 September 2008
Computing Infrastructure for DAQ, DM and SC
Real IBM C exam questions and answers
BusinessObjects IN Cloud ……InfoSol’s story
Presentation transcript:

Jeff Kantor LSST Data Management Systems Manager LSST Corporation Institute for Astronomy University of Hawaii Honolulu, Hawaii June 19, 2008 LSST Data Management: Making Peta-scale Data Accessible

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 2 LSST Data Management System Long-Haul Communications Chile - U.S. & w/in U.S. 2.5 Gbps avg, 10 Gbps peak Archive Center NCSA, Champaign, IL 100 to 250 TFLOPS, 75 PB Data Access Centers U.S. (2) and Chile (1) 45 TFLOPS, 87 PB Mountain Summit/Base Facility Cerro Pachon, La Serena, Chile 10x10 Gbps fiber optics 25 TFLOPS, 150 TB 1 TFLOPS = 10^12 floating point operations/second 1 PB = 2^50 bytes or ~10^15 bytes

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 3 Processing Cadence Image Category (files) Catalog Category (database) Alert Category (database) NightlyRaw science image Calibrated science image Subtracted science image Noise image Sky image Data quality analysis Source catalog (from difference images) Object catalog (from difference images) Orbit catalog Data quality analysis Transient alert Moving object alert Data quality analysis Data Release (Annual) Stacked science image Template image Calibration image RGB JPEG Images Data quality analysis Source catalog (from calibrated science images) Object catalog (optimally measured properties) Data quality analysis Alert statistics & summaries Data quality analysis LSST Data Products

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 4 Database Volumes Detailed analysis done based on existing surveys, SRD requirements Expecting: –6 petabytes of data, 14 petabytes data+indexes –all tables: ~16 trillion rows (16x10 12 ) –largest table: 3 trillion rows (3x10 12 )

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 5 Data ProductsPipelines Application Framework Application Layer Middleware Layer Data AccessDistr. Processing System Administration, Operations, Security User Interface Infrastructure Layer ComputingCommunications Physical Plant Storage The DM reference design uses layers for scalability, reliability, evolution Scientific Layer Pipelines constructed from reusable, standard “parts”, i.e. Application Framework Data Products representations standardized Metadata extendable without schema change Object-oriented, python, C++ Custom Software Portability to clusters, grid, other Provide standard services so applications behave consistently (e.g. recording provenance) Keep “thin” for performance and scalability Open Source, Off-the-shelf Software, Custom Integration Distributed Platform Different parts specialized for real-time alerting vs peta-scale data access Off-the-shelf, Commercial Hardware & Software, Custom Integration

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 6 LSST DM Middleware makes it easy to answer these questions There are 75 PB of data, how do I get the data I need as fast as I need it? I want to run an analysis code on MY [laptop, workstation, cluster, Grid], how do I do that? I want to run an analysis code on YOUR [laptop, workstation, cluster, Grid], how do I do that? My multi-core nodes are only getting 10% performance and I don’t know how to code for GPUs, how can I get better performance in my pipeline? I want to reuse LSST pipeline software and add some of my own, how can I do that?

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 7 Facilities and Data Flows Base Facility Archive Center Data Center LSST Camera Subsystem : Instrument Subsystem LSST OCS : Observatory Control System Data Management Subsystem Interface: Data Acquisition High- Speed Storage Tier End User Tier 1 End User High- Speed Storage VO Server : Data Access Server Raw Data, Meta Data, Alerts High- Speed Storage Data Products Data Products Data Products Raw Data Meta-Data Raw Data Meta-Data DQA Xtalk Corrected, Raw Data Meta-Data Sky Template Catalog Data Mountain Site High- Speed Storage Pipeline Server Sky Template Catalog Data Alerts Meta-Data Pipeline Server Data Products VO Server : Data Access Server Data Products Raw Data Meta Data

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 8 Archive Center Base Data Access Center Archive Center Trend Line Computing needs show moderate growth

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 9 Cerro Pachon La Serena Long-haul communications are feasible Over 2 terabytes/second dark fiber capacity available Only new fiber is Cerro Pachon to La Serena (~100 km) 2.4 gigabits/second needed from La Serena to Champaign, IL Quotes from carriers include 10 gigabit/second burst for failure recovery Specified availability is 98% Clear channel, protected circuits

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 10 FY-09FY-10FY-11FY-12FY-13FY-14FY-15FY-16 LSST Timeline FY-17 FY-07FY-08 NSF D&D Funding MREFC Proposal Submission NSF CoDR MREFC Readiness NSF PDR NSB NSF CDR NSF MREFC Funding Commissioning Operations DOE R&D Funding DOE CD-0 (Q1-06) DOE MIE Funding DOE CD-1 DOE CD-2 DOE CD-3 Sensor Procurement Starts DOE CD-4 Camera Delivered to Chile Camera Fabrication (5 years) Telescope First Light DOE Ops Funding Camera Ready to Install NSF + Privately Supported Construction (8.5 years) System First Light ORR Camera I&C

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 11 Data ChallengeGoals #1 Jan Oct 2006 Validate infrastructure and middleware scalability to 5% of LSST required rates #2 Jan Jan 2008 Validate nightly pipeline algorithms Create Application Framework and Middleware, validate by creating functioning pipelines with them Validate infrastructure and middleware scalability to 10% of LSST required rates #3 Mar Jun 2009 Validate deep detection, calibration, SDQA pipelines Expand Middleware for Control & Management, Inter-slice Communications Validate infrastructure and middleware reliability Validate infrastructure and middleware scalability to 15% of LSST required rates #4 Jul Jun 2010 Validate open interfaces and data access Validate infrastructure and middleware scalability to 20% of LSST required rates Validating the design - Data Challenges

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 12 Data ChallengeWork Products #1 Jan Oct Teragrid nodes used to simulate data transfer: Mountain (Purdue), Base (SDSC), Archive Center (NCSA) using Storage Resource Broker (SRB) IA64 itanium 2 clusters at SDSC, NCSA, 32-bit Xeon cluster at Purdue MPI-based Pipeline Harness developed in C and python Simulated nightly processing application pipelines developed (CPU, i/o, RAM loads) Initial database schema designed and MySQL database configured Data ingest service developed Initial development environment configured, used throughout #2 Jan Jan node, 58-CPU dedicated cluster acquired and configured at NCSA Application Framework and Middleware API developed and tested Image Processing, Detection, Association pipelines developed Moving object pipeline (jointly developed with Pan-STARRS) ported to DM environment, modularized, and re-architected for nightly mode (nightMOPS) Major schema upgrade and implementation in MySQL with CORAL Acquired 2.5 TB pre-cursor data (CFHTLS-deep, TALCS) for testing Complete development environment configured, standardized, used throughout Validating the design - Data Challenge work products to date

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 13 Data ChallengeExecution Results #1 Jan Oct megabytes/ second data transfers (>15% of LSST transfer rate) 192 CCDs ( gigabytes each) runs processed with simulated pipelines across 16 nodes/32 itanium CPUs with latency and throughput of approximately seconds (>42% of LSST per node image processing rate) 6.1 megabytes/ second source data ingest (>100% of LSST required ingest rate at the Base Facility) #2 Jan Jan visits (0.1 gigabytes each CCD) runs processed through all pipelines (image processing & detection, association, night MOPS) across 58 xeon CPUs with latency and throughput of approximately 257 seconds (25% of LSST per node processing rate) Fast nodes only (48 xeon CPUs) run processed in approximately 180 seconds (30% of LSST per node processing rate) Data transfer and ingest rates same as DC1 Data Challenges 1 & 2 were very successful

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 14 LSST Data Management Resources Base year (2006) cost for developing LSST DM system and reducing/releasing data is –$5.5M R&D –$106M MREFC –$17M/yr Operations –For software, support, mountain, base, archive center, science centers Includes Data Access user resources –Two DACs in U.S. locations –One EPO DAC at another U.S. location (added recently) –One DAC in Chile Total Scientific Data Access user resources available across DACs –16 Gbps network bandwidth –12 petabyes of end user storage –25 TFLOPS computing

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 15 Philosophy & Terminology Access to LSST data should be completely open to anyone, anywhere –All data in the LSST public archive should be accessible to everyone worldwide; we should not restrict any of this data to “special” users –Library analogy: anyone can check out any book Access to LSST data processing resources must be managed –Computers, bandwidth, and storage cost real money to purchase and to operate; we cannot size the system to allow everyone unlimited computing resources –Library analogy: we limit how many books various people can check out at one time so as equitably to share resources Throughout the following, “access” will mean access to resources, not permission to view the data

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 16 Data Access Policy Considerations The vast quantity of LSST data makes it necessary to use computing located at a copy of the archive –Compute power to access and work with the data is a limited resource LSSTC must equitably and efficiently manage the allocation of finite resources –Declaring “open season” on the data will lead to inefficient use –Granting different levels of access to various uses will ensure increased scientific return The data have value –Building and operating the system will require significant expenditures –Setting a value on the data product is an important ingredient of any cost-sharing negotiation

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 17 Service Levels Current LSST plans are for resources to be apportioned across four service levels –All users will automatically be granted access at the lowest level –Access to higher levels will be granted according to merit by a proposal process under observatory management –Review process includes scientific collaborations and other astronomy and physics community representatives –Higher levels are targeted to different uses Foreign investigators will be granted resources beyond the base level in proportion to their country’s or institution’s participation in sharing costs. Additional access to resources may similarly be obtained by any individual or group

June 19, 2008 Institute for Astronomy University of Hawaii Honolulu, Hawaii 18 Service Levels defined in MREFC Proposal Level 4 – typical/general users, no special access required 6 Gbps bandwidth 1 PB data storage 1 TFlop total Level 3 - power user individuals, requires approval 2 Gbps bandwidth 100 TB storage 1 TFlop at each DAC Level 2 - power user institutions, requires approval 2 Gbps bandwidth 900 TB storage (100 TB/yr) 5 TFlops at each DAC (1 TFlop/yr for 5 years) Level 1 –most demanding applications, requires approval 6 Gbps 10 PB storage (1 PB/yr) 25 TFlops (5 TFlops/yr for 5 years)