Presentation is loading. Please wait.

Presentation is loading. Please wait.

July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader.

Similar presentations


Presentation on theme: "July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader."— Presentation transcript:

1 July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader

2 July 8, 2008SLAC Annual Program ReviewPage 2 Databases and SLAC *Over a decade experience SLAC’s core competency (1 of 4): Ultra-large database management for users and collaborations distributed worldwide

3 July 8, 2008SLAC Annual Program ReviewPage 3 LSST Peta-scale Architecture & Analyses *O(100) PB system –55 PB pixel data –20+ PB derived products –Virtual data *All data public, accessed by –Professional astronomers –Amateur astronomers –General public Challenge & Opportunity

4 July 8, 2008SLAC Annual Program ReviewPage 4 Cutting-edge Features For Cutting-edge Science *Queryable, shareable user annotations *Complete provenance tracking *Flexible / extendable schema *Support for uncertainty / fuzzy joins *Seamless integration of catalogs with pixel data *Scalable, fast, fault-tolerant and cost-effective

5 July 8, 2008SLAC Annual Program ReviewPage 5 Query Complexity – Pushing the Limits *Example queries –Near neighbor searches for arbitrary regions –Complex time series analysis find all pairs of objects with similar time series *Simplified interfaces –Common languages (C++, python) –Likely through common tools (R, IDL, MATLAB)

6 July 8, 2008SLAC Annual Program ReviewPage 6 How? *Use shared-nothing MPP columnar-like Data Management System *Run on cloud / commodity hardware *Push computation to data *Support natively arrays & operations on arrays *Aggressively compress (lossless) *Share scans *Build provenance, lightweight uncertainty and other features into DMS

7 July 8, 2008SLAC Annual Program ReviewPage 7 1st XLDB Workshop *October 2007 at SLAC *Participation –Data-intensive science & industries, database researchers and vendors *Goals –Identify trends, bridge gaps *Very successful –Science – db research collaboration strongly encouraged

8 July 8, 2008SLAC Annual Program ReviewPage 8 SciDB Mini-Workshop *March 2008 in Asilomar *Participation –Database researchers + data-intensive science representatives (HEP, Astro, Bio, Remote Sensors, Fusion) *Goals –Discuss common science db-requirements –Stimulate database research *Very successful –Agreed to explore avenue of building new open-source science-oriented DBMS. Led by Michael Stonebraker and David DeWitt

9 July 8, 2008SLAC Annual Program ReviewPage 9 Why “sciDB” *Requirements novel, unlikely to be met by existing vendors –Arrays, spatial/temporal support, provenance, uncertainty, versioning *Large scale and complexity prohibits roll-your-own approach *Overlap increasing, including: –Science: astronomy, biology, photon science, physics, geoscience (geology, oceanography, atmospheric science, environmental science) –Commercial applications (R&D and non-R&D): remote sensing, resource extraction (oil, gas, minerals), medical imaging, pharmaceuticals, internet

10 July 8, 2008SLAC Annual Program ReviewPage 10 Open Source, Science-DBMS: Making It Real (1) *Science partners –Put up some resources, provide requirements, use cases, tests *CS Database Brain Trust –Design, direct building of the system, provide some resources *Industrial partners –Provide funding/resources, share experience *Company –Manage open source project, contribute engineering, provide support, services, PR

11 July 8, 2008SLAC Annual Program ReviewPage 11 Open Source, Science-DBMS: Making It Real (2) *Science partners –Initial partners: LSST/SLAC, PNNL, LLNL, FermiLab, UCSB Expecting to reach more labs/projects via SciDB Science Board –Use cases, requirements (all) –1 FTE (LSST), office space (SLAC) –Continuing to look for support from other labs, DoE and NSF *CS Database Brain Trust –Assembled *Industrial partners –Initial partners: eBay, Microsoft, Vertica –Strong interest at Amazon and Facebook *Company –New startup or Vertica initiative eBay's use cases & requirements very similar to LSST

12 July 8, 2008SLAC Annual Program ReviewPage 12 Open Source, Science-DBMS: Making It Real (3) *Funding available –eBay, Microsoft, VCs *Design in progress –Array data model –All requested features feasible *Beta expected 4Q'09 *XLDB2 planned for this fall (Sept 29/30) LSST Timescales: –R&D ends 4Q’10 –Construction begins 1Q’11 –First light in late 2014 or 2015 –Data taking for 10 years

13 July 8, 2008SLAC Annual Program ReviewPage 13 Summary *SLAC leads the design of the O(100) PB LSST Database & Data Access System *Open-source, science oriented DBMS is becoming a reality –Led by most influential database gurus –Designed by most experienced database engineers –In collaboration with big industrial partners *LSST DM system will enable unprecedented analyses in intuitive & cost-effective way –Will likely make big positive impact on complex scientific analytics and beyond


Download ppt "July 8, 2008SLAC Annual Program ReviewPage 1 LSST Data Management and Access Jacek Becla LSST Data Access & Database Technology Group Leader."

Similar presentations


Ads by Google