1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

Trying to Use Databases for Science Jim Gray Microsoft Research
Online Science -- The World-Wide Telescope Archetype
World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
1 Store Everything Online In A Database Jim Gray Microsoft Research
1 Online Science -- The World-Wide Telescope as an Archetype Jim Gray Microsoft Research Collaborating with: Alex Szalay, Peter Kunszt, Ani
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
File Management Chapter 3
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
Web Services Nasrullah. Motivation about web service There are number of programms over the internet that need to communicate with other programms over.
Virtual Observatory & Grid Technique ZHAO Yongheng (National Astronomical Observatories of China) CANS2002.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Presentation 7 part 2: SOAP & WSDL. Ingeniørhøjskolen i Århus Slide 2 Outline Building blocks in Web Services SOA SOAP WSDL (UDDI)
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
CS597A: Managing and Exploring Large Datasets Kai Li.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Teaching Science with Sloan Digital Sky Survey Data GriPhyN/iVDGL Education and Outreach meeting March 1, 2002 Jordan Raddick The Johns Hopkins University.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Computer Concepts 2014 Chapter 7 The Web and .
Data Integration Problem How to access data across 22 different data systems, most operating on different hardware, using different software, and having.
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
The Dawning of the Age of Infinite Storage William Perrizo Dept of Computer Science North Dakota State Univ.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
Alex Szalay, Jim Gray Analyzing Large Data Sets in Astronomy.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Web Server Administration Web Services XML SOAP. Overview What are web services and what do they do? What is XML? What is SOAP? How are they all connected?
Science with the Virtual Observatory Brian R. Kent NRAO.
Introduction To Internet
Section 1 # 1 CS The Age of Infinite Storage.
2004/12/02Slide Number 1 of 15 Exposure Time Calculator (ETC) as a Web Service Donald McLean 2004 Technology Open House.
Section 1 # 1 CS The Age of Infinite Storage.
Public Access to Large Astronomical Datasets Alex Szalay, Johns Hopkins Jim Gray, Microsoft Research.
Web Services. ASP.NET Web Services  Goals of ASP.NET Web services:  To enable cross-platform, cross- business computing  Great for “service” based.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Wiss. Beirat AIP, ClusterFinder & VO-Methods H. Enke German Astrophysical Virtual Observatory ClusterFinder VO Methods for Astronomical Applications.
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
INTERNET. Objectives Explain the origin of the Internet and describe how the Internet works. Explain the difference between the World Wide Web and the.
Some Grid Science California Institute of Technology Roy Williams Paul Messina Grids and Virtual Observatory Grids and and LIGO.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
RSISIPL1 SERVICE ORIENTED ARCHITECTURE (SOA) By Pavan By Pavan.
Tuesday, April 5th, 2005 N. Craig, B. J. Méndez (SEGway,UC Berkeley) R. J. Hanisch, C. A. Christian, F. Summers (NVO,StScI) B. Haisch, J. Lindblom (ManyOne.
Real Web Services Jim Gray Microsoft Research 455 Market St, SF, CA, Talk at Charles Schwab.
Advanced Web Technologies Lecture #4 By: Faraz Ahmed.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
An Introduction to Web Services Web Services using Java / Session 1 / 2 of 21 Objectives Discuss distributed computing Explain web services and their.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
1 Online Science The World-Wide Telescope as a Prototype For the New Computational Science Jim Gray Microsoft Research
Web Services An Introduction Copyright © Curt Hill.
Microsoft.Net Framework Presented by: Frank Perkins Leslie Meadows Jason Salomon.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
.NET Mobile Application Development XML Web Services.
Building Peta-Byte Data Stores Jim Claus Shira Anniversary European Media Lab 12 February 2001.
OE-NIK HP Advanced Programming Web services Standards and protocols Using web services Using web services with asynchronous calls.
Hydroinformatics Lecture 15: HydroServer and HydroServer Lite The CUAHSI HIS is Supported by NSF Grant# EAR CUAHSI HIS Sharing hydrologic data.
Microsoft Research San Francisco (aka BARC: bay area research center) Jim Gray Researcher Microsoft Research Scalable servers Scalable servers Collaboration.
How much information? Adapted from a presentation by:
Web Services Primer Overview of Web Services
Web Server Administration
BARC Scaleable Servers
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Jim Gray Microsoft Research
Google Sky.
Presentation transcript:

1 Information At Your Fingertips Web Services Jim Gray & Tom Barclay Microsoft Research Alex Szalay Johns Hopkins University

2 Communications Excitement!! Point-to-PointBroadcast Immediate Time Shifted conversation money lecture concert mail book newspaper NetWork + DB DataBase Its ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis & Automatic Processing are being added Slide borrowed from Craig Mundie

3 Information Excitement! All information will be online (somewhere) text, speech, sound, vision, graphics, spatial, time… You might record everything –read: 10MB/day, 400 GB/lifetime (5 disks today) –hear: 400MB/day, 16 TB/lifetime (2 disks/year today) –see: 1MB/s, 40GB/day, 1.6 PB/lifetime (150 disks/year maybe someday) Information at Your Fingertips –Make it easy to capture & present –Make it easy to store & organize & access –Make it easy to analyze & summarize

4 How much information is there? Soon everything can be recorded and indexed Most bytes will never be seen by humans. Data summarization, trend detection, anomaly detection are key technologies See Mike Lesk: How much information is there: See Lyman & Varian: How much information Yotta Zetta Exa Peta Tera Giga Mega Kilo A Book.Movi e All LoC books (words) All Books MultiMedia Everything ! Recorded A Photo 24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

5 How do we get information today. Human searches web (with an index) Human browses pages

6 How do we get information tomorrow? Agents gather and digest it for us. Q: How? A Microsoft : Dot Net –Discovery: UDDI, WSDL –Explore: SOAP My Agents Digital Dashboard Web Services SOAP WSDL

7 How do you publish information? Get the data. Conceptualize the data schema Provide methods that return data subsets. –Challenge: how much processing on your server? Publish the schema and methods. We are exploring these issues. f, g, x, y…

8 TerraServer Example What is TerraServer? –3TB Internet Map DB available since June 1998 –USGS photo and topo maps of the US –Integrated with Home Advisor –Shows off SQL Server availability & scalability –Designed for basic computer systems and low speed communications What is TerraService? –A.NET web service –Makes TerraServer data available to other apps

9 TerraServer Background Large database on the Web (3 TB) Operational since June 1998 Public access to USGS topo maps (DRG) and aerial images (DOQ) Designed for basic computer systems and low speed communications Operated by Microsoft Corporation Hardware provided by Compaq Computer Data provided by US Geological Survey

10 Application Goals Available – Always, 24x7x % of the time Programmable --.NET applications can integrate TerraServer data into their apps BIG — 1 TB of data including catalog, temporary space, etc. PUBLIC — available on the world wide web INTERESTING — to a wide audience ACCESSIBLE — using standard browsers (IE, Netscape) REAL — a LOB application (users can buy imagery) FREE — cannot require NDA or money to a user to access FAST — usable on low-speed (56kbps) and high speeds(T-1+) EASY — we do not want a large group to develop, deploy, or maintain the application 3 TB

11 Demo Show photo topo gazetteer demographics

12 HardwareSQL\Inst1 SQL\Inst2 SQL\Inst3 Spare F G L KPQ E E JJ O O I H M N R S One SQL database per rack Each rack contains 4.5 tb 261 total drives / 13.7 TB total Meta Data Meta Data Stored on 101 GB “Fast, Small Disks” (18 x 18.2 GB) Imagery Data Imagery Data Stored on GB “Slow, Big Disks” (15 x 73.8 GB) To Add GB Disks in Feb 2001 to create 18 TB SAN 8 Compaq DL360 “Photon” Web Servers Fiber SAN Switches 4 Compaq ProLiant 8500 Db Servers

13 TerraServer Experience Successful Web Site –Met all 8 goals – interesting, big, real, public, fast, easy, accessible, and free –High Availability – Windows Data Center & Compaq SAN Technology –Top 1000 Web Site – continues to be popular New Feature Requests –Programmable access to meta-data –User selectable image sizes, i.e. “a map server” –Permission to use TerraServer data within server applications

14 What is a Web Service?SOAP Web Service consumers can send and receive messages using XML SOAP Contract Language Web Services are defined in terms of the formats and ordering of messages SOAP Discovery You can ask a site for a description of the Web Services it offers All these capabilities are built using open Internet protocols XML & HTTP Open Internet Protocols Web Service A programmable application component accessible via standard Web protocols UDDI Universal Description, Design, and Integration Provide a Directory of Services on the Internet

15.NET TerraService Architecture Existing DB Server SQL TB Db 705 m Rows ADO.NET TerraServer Web Service OLEDB Map Server Http Handler Map UI Web Forms StandardBrowsers SmartClients Windows Forms.NETFramework SOAP/XML HTML Image/jpeg

16 TerraServer Web Services Query Gazetteer Retrieve imagery meta-data Retrieve imagery Simple Projection conversions Geo-coded places, e.g. Schools, Golf Courses, Hospitals, etc. Place Polygons e.g. Zip Codes, Cities, etc. Terra-Tile-Service Landmark-Service allows “overlay” information for Terra-Tile-Service applications Clients can present TerraServer imagery in new ways.

17 Web Service Methods Place Search –GetPlaceFacts –GetPlaceList –GetPlaceListInRect –CountPlacesInRect Projection –ConvertLonLatPtToUtmPt –ConvertUtmPtToLonLatPt –ConvertLonLatTo NearestPlace –GetTheme –GetLatLonMetrics Tile –GetAreaFromPt –GetAreaFromRect –GetAreaFromTileId –GetTileMetaFromLonLatPt –GetTileMetaFromTileId –GetTile (Image) Landmark –GetLandmarkTypes –CountOfLandmarkPointsByRect –GetLandmarkPointsByRect –CountOfLandmarkShapesByRect –GetLandmarkShapesByRect

18 Soil Viewer Uses TerraService

19 Custom End Product Web Soil Data ViewerXML Soil ReportSoil Interpretation Map

20 What Tom Showed You Converted a Web Server –HTML get post –Server returns pictures to people to a Web Service –SOAP service –returns XML self-describing data –Application integrates data (Agriculture and Geo data)

21 Rosetta Stone Distributed computing + basic services Yellow Pages ? RPC – remote procedure call, CORBA, DCOM, RMI IDL – interface definition language XDR - eXternal Data Representation Dot Net UDDI – Universal description, discovery, and integration Schema, XLANG SOAP – simple object access protocol WSDL – web services definition language XML- eXtended Markup Language

22 Sky Server –Like TerraServer pictures of the sky. –But also LOTS of data on each object So a data mining web service Luminosity (multi-spectra), morphology, spectrum So, it is a data mining application Cross-correlation is challenging because –Multi-resolution –Data is dirty/fuzzy (error bars, cosmic rays, airplanes…) –Time varying + 50 K Spectro Objects ~ 100 attributes + 30 lines 15M Photo Objects ~ 400 attributes

23 Astronomy Data In the “old days” astronomers took photos. Starting in the 1960’s they began to digitize. New instruments are digital (100s of GB/nite) Detectors are following Moore’s law. Data avalanche: double every year Total area of 3m+ telescopes in the world in m 2, total number of CCD pixels in megapixel, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels. Courtesy of Alex Szalay

24 Astronomy Data Astronomers have a few Petabytes now. –1 pixel (byte) / sq arc second ~ 4TB –Multi-spectral, temporal, … → 1PB They mine it looking for new (kinds of) objects or more of interesting ones(quasars), density variations in 400-D space correlations in 400D space Data doubles every year. Data is public after a year. So, 50% of the data is public. Some have private access to 5% more data. So: 50% vs 55% access for everyone

25 Astronomy Data But….. How do I get at that 50% of the data? Astronomers have culture of publishing. –FITS files and many tools. –Encouraged by NASA. Publishing data “details” is difficult. Astronomers want to do it but it is VERY hard. (What programs where used? what were the processing steps? How were errors treated?…)

26 Virtual Observatory Premise: Most data is (or could be online) So, the Internet is the world’s best telescope: –It has data on every part of the sky –In every measured spectral band: optical, x-ray, radio.. –As deep as the best instruments (1 year ago). –It is up when you are up. The “seeing” is always great (no working at night, no clouds no moons no..). –It’s a smart telescope: links objects and data to literature on them.

27 Virtual Observatory The Age of Mega-Surveys Large number of new surveys –multi-TB in size, 100 million objects or more –individual archives planned, or under way –Data publication an integral part of the survey –Software bill a major cost in the survey Multi-wavelength view of the sky –more than 13 wavelength coverage in 5 years Impressive early discoveries –finding exotic objects by unusual colors L,T dwarfs, high-z quasars –finding objects by time variability gravitational micro-lensing MACHO 2MASS DENIS SDSS PRIME DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE... MACHO 2MASS DENIS SDSS PRIME DPOSS GSC-II COBE MAP NVSS FIRST GALEX ROSAT OGLE... Slide courtesy of Alex Szalay, modified by jim

28 Virtual Observatory Federating the Archives The next generation mega-surveys are different –top-down design –large sky coverage –sound statistical plans –well controlled/documented data processing Each survey has a publication plan Data mining will lead to stunning new discoveries Federating these archives  Virtual Observatory Slide courtesy of Alex Szalay

29 The Multiwavelength Crab Nebula Nova first sighted 1054 A.D. by Chinese Astronomers Now: Crab Nebula X-ray, optical, infrared, and radio Slide courtesy of Robert CalTech. Crab star 1053 AD

30 Exploring Parameter Space Given an arbitrary parameter space: Data Clusters Points between Data Clusters Isolated Data Clusters Isolated Data Groups Holes in Data Clusters Isolated Points Nichol et al Slide courtesy of Robert CalTech.

31 Virtual Observatory and Education In the beginning science was empirical. Then theoretical branches evolved. Now, we have a computational branches. –The computational branch has been simulation –It is becoming data analysis/visualization The Virtual Observatory can be used to –Teach astronomy: make it interactive, demonstrate ideas and phenomena –Teach computational science skills and the process of scientific discovery

32 Sloan Digital Sky Survey A group of astronomers has been building a telescope (with 90M$ from Sloan Foundation, NSF, and a dozen universities). for the last 12 years! Now data is arriving: –250GB/nite (20 nights per year). –100 M stars, 100 M galaxies, 1 M spectra. Public data at –5% of the survey, 600 sq degrees, 15 M objects 60GB. –This data includes most of the known high z quasars. –It has a lot of science left in it but… that is just the start.

33 Demo of Sky Server Alex built SkyServer (based on TerraServer design). Demo: famous places navigator data shopping cart spectrum SQL? ?

34 Virtual Observatory Challenges Size : multi-Petabyte 40,000 square degrees is 2 Trillion pixels –One band (at 1 sq arcsec) 4 Terabytes –Multi-wavelength Terabytes –Time dimension >> 10 Petabytes –Need auto parallelism tools Unsolved Meta-Data problem –Hard to publish data & programs –Hard to find/understand data & programs Current tools inadequate –new analysis & visualization tools Transition to the new astronomy –Sociological issues

35 3-steps to Virtual Observatory Get SDSS and Palomar online –Alex Szalay, Jan Vandenberg, Ani Thakar…. –Roy Williams, Robert Brunner, Julian Bunn Do queries and crossID matches with CalTech and SDSS to expose –Schema, Units,… –Dataset problems –the typical use scenarios. Implement WebServices at CalTech and SDSS

36 The Challenges How to federate the Archives to make a VO? The hope: XML is the answer. The reality: XML is syntax and tools: FITS on XML will be good but….. Explaining the data will still be very difficult. Define Astronomy Objects and Methods. –Based on UDDI, WSDL, SOAP. –Each archive is a service shows the idea. –Working with Caltech (Brunner, Williams, Djorgovski, Bunn) –But, how does data mining work?

37 SkyServer as a WebService WSDL+SOAP just add details Archive ss = new VOService(SkyServer); Attributes A[] = ss.GetObjects(ra,dec,radius) … ?? What are the objects (attributes…)? ?? What are the methods (GetObjects()...)? ?? What query language? SQL, Xquery…?

38 Summary All information at your fingertips. How do we publish information so that our agents can digest it? Example: TerraServer -> TerraService The Virtual Observatory Concept –The Internet is worlds best telescope For astronomy For teaching astronomy and For teaching computational science

39

40 TerraServer: 1 st Generation Web Application OS Services Browsers HTML “projects” UI to many client types Servers Data, Hosts UI Logic Biz Logic

41 2 nd Generation Web Application “Stateful” “Stateless” & “Geo-Scalable” OS Services App Logic Tier Rich Client UI Logic Servers Data, Hosts Richer Browsers DHTML for better interactivity.

42 3 rd Generation Web AppStandardBrowsers SmarterClients Smarter Devices Open Internet Communications Protocols (HTTP, SMTP, XML, SOAP) Applications Leverage Globally-Available Federated Web Services Applications Become Programmable Web Services OS Services Application Tier Logic App Logic & Web Service OS Services Public Web Services Building Block Services InternalServices XML XML XML Servers Data, Hosts XML Other Services Services XML XML XML HTML

43 Production Application TerraServer Web Services