Presentation on theme: "1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research"— Presentation transcript:
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research Alex Szalay Johns Hopkins University
2 The Evolution of Science Observational Science –Scientist gathers data by direct observation –Scientist analyzes data Analytical Science –Scientist builds analytical model –Makes predictions. Computational Science –Simulate analytical model –Validate model and makes predictions Data Exploration Science Data captured by instruments Or data generated by simulator –Processed by software –Placed in a database / files –Scientist analyzes database / files
3 Information Avalanche In science, industry, government,…. –better observational instruments and –and, better simulations producing a data avalanche Examples –BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information –CERN: LHC will generate 1GB/s.~10 PB/y –VLBA (NRAO) generates 1GB/s today –Pixar: 100 TB/Movie New emphasis on informatics: –Capturing, Organizing, Summarizing, Analyzing, Visualizing Image courtesy C. Meneveau & A. JHU BaBar, Stanford Space Telescope P&E Gene Sequencer From
4 World Wide Telescope Virtual Observatory Premise: Most data is (or could be online) The Internet is the worlds best telescope: –It has data on every part of the sky –In every measured spectral band: optical, x-ray, radio.. –As deep as the best instruments (2 years ago). –It is up when you are up. The seeing is always great (no working at night, no clouds no moons no..). –Its a smart telescope: links objects and data to literature on them.
5 Why Astronomy Data? It has no commercial value –No privacy concerns –Can freely share results with others –Great for experimenting with algorithms It is real and well documented – High-dimensional data (with confidence intervals) – Spatial data – Temporal data Many different instruments from many different places and many different times Federation is a goal There is a lot of it (petabytes) Great sandbox for data mining algorithms –Can share cross company –University researchers Great way to teach both Astronomy and Computational Science IRAS 100 ROSAT ~keV DSS Optical 2MASS 2 IRAS 25 NVSS 20cm WENSS 92cm GB 6cm
6 SkyServer.SDSS.org A modern Astronomy archive –Raw Pixel data lives in file servers –Catalog data (derived objects) lives in Database –Online query to any and all Also used for education –150 hours of online Astronomy –Implicitly teaches data analysis Interesting things –Spatial data search –Client query interface via Java Applet –Query interface via Emacs –Popular –Cloned by other surveys (a template design) –Web services are core of it.
7 Federation: SkyQuery.NetSkyQuery.Net Combine 4 archives initially Just added 6 more Send query to portal, portal joins data from archives. Problem: want to do multi-step data analysis (not just single query). Solution: Allow personal databases on portal Problem: some queries are monsters Solution: batch schedule on portal server, Deposits answer in personal database.
8 2MASS INT SDSS FIRST SkyQuery Portal Image Cutout SkyQuery Structure Each SkyNode publishes –Schema Web Service –Database Web Service Portal is –Plans Query (2 phase) –Integrates answers –Is itself a web service
9 Information Avalanche: science, business, personal Astronomy data SkyServer: demo pixel space record space set space Personal SkyServer download Mention data mining. World-Wide Telescope Federated web services demo Other web services Interop with Linux/Python/… Other stuff Portal with batch job scheduler