Presentation on theme: "Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER."— Presentation transcript:
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER
Microsoft Research Organization goal: Advance state of the art More than 700 staff, 55 areas Labs in US, Europe, Asia Internationally recognized teams University organizational model Open research environment Close ties to universities Close working relations with development.
My Research Goal Information at your fingertips Bring all scientific literature and data online Focus on large database issues, and scalable servers. Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts
World Wide Telescope World Wide Telescope Premise: Most Astronomy data is online The Internet is the worlds best telescope It has data on every part of the sky In every measured spectral band: As deep as the best instruments It is up when you are up. The seeing is always great (no working at night, no clouds no moons no..). Its a smart telescope: links data with literature.
SkyServer.SDSS.org SkyServer.SDSS.org Built with Johns Hopkins U. SkyServer.SDSS.org A modern archive Raw data in file servers Catalog data (derived objects) in Database 10 billon records, 2 TB Also used for education 150 hours of online Astronomy Interesting things Based on Web Services Spatial data search Cloned by other surveys (a design template)
Service Oriented Architecture Data Federations of Web Services Massive datasets live near their owners: Near instrument software pipeline, apps Near data knowledge and curation Each Archive publishes a web service Schema: documents the data Methods on objects (queries) Uniform access to multiple Archives A common global schema Scientists get personalized extracts DB
2MASS INT SDSS FIRST SkyQuery Portal Image Cutout SkyQuery Structure Each SkyNode publishes Schema Web Service Data Query Web Service Portal Plans Query (2 phase) Integrates answers Is itself a web service
Federation: SkyQuery.Net SkyQuery.Net Combines 15 archives Send query to portal, portal joins data from archives. Problem: want to do multi-step data analysis (not just single query). Solution: Allow personal databases on portal Problem: some queries are monsters Solution: batch scheduler on portal server, Deposits answer in personal db.
Current Status: CERN Pasadena Multi Stream tpc/ip 7.1 Gbps ~900 MBps New speed record @ http://ultralight.caltech.edu/lsr-winhec/ http://ultralight.caltech.edu/lsr-winhec/ Single Stream tpc/ip 6.5 Gbps ~800 MBps File Transfer Speed ~450 MBps mbps per second 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 200020012002200320042005
Challenge: Move Data from CERN to Remote Centers @ 1GBps Disk-to-Disk Disk-to-Disk gigabyte / second data rates gigabyte / second data rates 80TB/day 80TB/day 30 petabytes by 2008 30 petabytes by 2008 1 exabyte by 2014 1 exabyte by 2014 ~5 GBps CERN Filter Tier 2 Tier 3 Tier 1 … INP3RALINFNFNAL Tier 2 Institute Tier 2 Institute Tier 4 Experiment ~1 GBps ~PBps.1 GBps Physics data cache ~1 GBps Workstations OC192 = 9.9 Gbps Graphics courtesy of Harvey Newman @ Caltech
Summary Microsoft Research is active inside and outside Microsoft. World Wide Telescope is coming Exemplifies service oriented architecture Built with web services and databases Has interesting spatial database algorithms 10Gbps Networking is coming, x-64 is coming and we are investing to make them real. Details on my website: http://research.microsoft.com/~Gray http://research.microsoft.com/~Gray