Presentation on theme: "1 Experience Building The World Wide Telescope aka: The Virtual Observatory Jim Gray Alex Szalay."— Presentation transcript:
1 Experience Building The World Wide Telescope aka: The Virtual Observatory Jim Gray Alex Szalay
2 The Evolution of Science Observational Science –Scientist gathers data by direct observation –Scientist analyzes data Analytical Science –Scientist builds analytical model –Makes predictions. Computational Science –Simulate analytical model –Validate model and makes predictions Data Exploration Science Data captured by instruments Or data generated by simulator –Processed by software –Placed in a database / files –Scientist analyzes database / files
3 Information Avalanche In science, industry, government,…. –better observational instruments and –and, better simulations producing a data avalanche Examples –BaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational Information –CERN: LHC will generate 1GB/s.~10 PB/y –VLBA (NRAO) generates 1GB/s today –Pixar: 100 TB/Movie New emphasis on informatics: –Capturing, Organizing, Summarizing, Analyzing, Visualizing Image courtesy C. Meneveau & A. JHU BaBar, Stanford Space Telescope P&E Gene Sequencer From
4 World Wide Telescope Virtual Observatory Premise: Most data is (or could be online) The Internet is the worlds best telescope: –It has data on every part of the sky –In every measured spectral band: optical, x-ray, radio.. –As deep as the best instruments (2 years ago). –It is up when you are up. The seeing is always great (no working at night, no clouds no moons no..). –Its a smart telescope: links objects and data to literature on them.
5 The WWT Components Data Sources –Literature –Archives Unified Definitions –Units, –Semantics/Concepts/Metrics, Representations, –Provenance Object model Classes and methods Portals
6 Data Sources Literature online and cross indexed –Simbad, ADS, NED, Many curated archives online –FIRST, DPOSS, 2MASS, USNO, IRAS, SDSS, VizeR,… –Typically files with English meta-data and some programs Groups, Researchers, Amateurs Publish –Datasets online in various formats –Documentation varies –Publications are Ephemeral –Unknown provenance
7 Unified Definitions Universal Content Definitions –Collated all table heads from all the literature –100,000 terms reduced to ~1,500 –Rough consensus that this is the right thing. –Refinement in progress as people use UCDs Defines –Units: gram, radian, second,... –Semantic Concepts / Metrics Std error, Chi 2 fit, magnitude, passband, velocity,
8 Provenance Most data will be derived. To do science, need to trace derived data back to source. So programs and inputs must be registered. Must be able to re-run them. Example: Space Telescope Calibrated Data –Run on demand –Can specify software version (to get old answers) Scientific Data Provenance and Curation are largely unsolved problems (some ideas but no science).
9 Object Model General acceptance of XML Recent acceptance of XML Schema (XSD over DTD) Wait-and-See about SOAP/WSDL/… – Web Services are just Corba with angle brackets. –FTP is good enough for me. Personal opinion: –Web Services are much more than Corba + <> –Huge focus on interop –Huge focus on integrated tools But the community says Show me! –Many technologists sold, but not the astronomers
10 Classes and Methods First Class: VO table –Represents an answer set in XML Defined by an XML Schema (XSD) Metadata (in terms of UCDs) Data representation(numbers and text) –First method Cone Search: Get objects in this cone
11 Other Classes Space-Time class –http://hea-www.harvard.edu/~arots/nvometa/STCdoc.pdfhttp://hea-www.harvard.edu/~arots/nvometa/STCdoc.pdf Image Class (returns pixels) –SdssCutout –Simple Image Access Protocol –HyperAtlas Spectral –Simple Spectral Access Protocol –500K spectra available at Query Services –ADQL and SkyNode Registry: –see below
12 The Registry UDDI seemed inappropriate –Complex –Irrelevant questions –Relevant questions missing Evolved Dublin Core –Represent Datasets, Services, Portals –Needs to be machine readable –Federation (DNS model) –Push & Pull: register then harvest
13 SkyQuery A Prototype WWT Started with SDSS data and schema Imported about 9 other datasets into that spine schema. Unified them with a portal Implicit spatial join among the datasets. All built on Web Services –Pure XML –Pure SOAP –Used.NET toolkit
14 Demo SkyServer: –navigator showing cutout web service –List: showing many calls and variant use. SkyQuery: –Show integration of various archives. –Explain spatial join xMatch operator.
15 MyDB Portal allows federation of data but… Intermediate results may be large. Intermediate results feed into next analysis step. Sending them back-and-forth to client is costly and sometimes infeasible. Solution: create a working DB for client at Portal: MyDB
16 MyDB Anyone can create a personal DB at SkyServer portal. –It is about 100 MB –It is private Simple queries done immediately Complex queries done by batch scheduler All queries can create/read/write MyDB tables Very popular with serious users. MyDB will be sharable with by a group.
17 Open SkyQuery SkyQuery being adopted by AstroGrid as reference implementation for OGSA-DAI (Open Grid Services Architecture, Data Access and Integration). SkyNode basic archive object SkyQuery Language (VoQL) is evolving.
18 The WWT Components Outline Data Sources –Literature –Archives Unified Definitions –Units, –Semantics/Concepts/Metrics, Representations, –Provenance Object model Classes and methods Portals WWT is a poster child for the Data Grid. What we learned Astro is a community of 10,000 Homogenous & Cooperative If you cant do it for Astro, do not bother with 3M bio-info. Agreement –Takes time –Takes endless meetings Big problems are non-technical –Legacy is a big problem. Plumbing and tools are there But… –What is the object model –What do you want to save. –How document provenance.