What I have been Doing Peta Bumps 10k$ TB Scaleable Computing Sloan Digital Sky Survey.

Slides:



Advertisements
Similar presentations
ATM Vs GigE The Breakfast Club JQ - J ohns Q uarterly High Quality Gentlemen's Magazine.
Advertisements

1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
British and American Culture
Computer Technology Forecast Jim Gray Microsoft Research
CANARIE – CA*net 3 The Customer Empowered Networking Revolution Background Papers on Gigabit to The Home and.
University of Illinois at Chicago STAR TAP The Persistent Interconnect for International High-Performance Networks STAR TAP Engineering Overview.
Surveyor IP Performance Measurements Matt Zekauskas June, 1999 NLANR/I2 Joint Techs.
NLANR Web Caches & Squid Most Slides Provided by: Duane Wessels NLANR Caching Project.
The Institute for Professional and Executive Development, Inc. Welcome To Incentives for Historic Preservation in Seattle July 12, 2007 Washington Athletic.
The Institute for Professional and Executive Development, Inc. Welcome To HUD Multifamily Housing Compliance : What Every Tax Credit Investor, Loan Servicer,
The Institute of Professional and Executive Development, Inc. Rebuilding Communities After Hurricane Katrina: Affordable Housing and Economic Development.
February 2002 Global Terabit Research Network: Building Global Cyber Infrastructure Michael A. McRobbie Vice President for Information Technology & CIO.
David R. Richardson, Mgr. Network Engineering University of Washington Pacific Rim Networking Meeting Honolulu, Hawaii February 21-22, 2002.
HARNET Update By Charles Choy Senior network engineer Hong Kong Academic Research Network.
1 APAN-China update Contents l Research and Education Networks in China l CERNET Background and Update l Peer connectivity with other R+E.
© Otaverkko Oy 8/2000 MediaPoli - gigabit piloting environment Tuomo Karhapää Network Manager Otaverkko Oy
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Helping TCP Work at Gbps Cheng Jin the FAST project at Caltech
Abilene and Internet2 Engineering Update Guy Almes Terena Networking Conference 2002 Limerick, Ireland Guy Almes Terena Networking Conference 2002 Limerick,
1  1 =.
Word problems. Measurement. (Length)
Place Value and Expanded Form
SHARKFEST '09 | Stanford University | June 15–18, 2009 Tips and Tricks: Case Studies Laura Chappell Founder, Wireshark University
How The Internet Changed the Game Presented by: Duston Barto from Infinicom USA.
ATM Firewall Routers with Black Lists Hwajung LEE The George Washington University School of Engineering and Applied Science Electrical Engineering and.
Making Landmark or Friendly Numbers (Multiplication)
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 2 Networking Fundamentals.
The Science of Surveying by Mrs. Leone. Surveying The science of plotting maps of a specific locale Includes finding its boundaries, calculating its area,
1 International IP Backbone of Taiwan Academic Networks Wen-Shui Chen & Yu-lin Chang APAN-TW APAN 2003 Academia Sinica Computing Center, Taiwan.
By: Andre and Ty. 1. Pacific- Eurasian Plate 2. Taiwan 3. Eurasian, Philippine plate 4. Coast of Cyprus 7. Pisco, Peru 8. Shanghai, China 9. Vamno, China.
Abhigyan, Aditya Mishra, Vikas Kumar, Arun Venkataramani University of Massachusetts Amherst 1.
Digital Subscriber Line (DSL) Mary L. Pretz-Lawson Assistant Director, Computing Services Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA.
How do you use a graph to build and use a rule?. How do you use a graph to build and use a rule (dry)?
What is Memorial Day? It is a special day to remember and honor soldiers who died, especially those who died while fighting in a war Memorial Day is a.
Data-Intensive Science (eScience) Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2011.
Major Events for Segmented Audiences Margaret Miller Assistant Vice President for Alumni Affairs Princeton University.
Geographic Tools GPH 111. Tools to cover…  Latitude and Longitude  Projections  Map Scale  Conversions  Latitude and Longitude  Projections  Map.
David Evans CS150: Computer Science University of Virginia Computer Science Class 29: Making Primitives Class 29: Trick-or-Treat.
Place Value Questions.
I can add multi-digit numbers using an open number line. Graphics:
Common and Proper Nouns Capitalization By: Kaley and Aimar.
Ethel Stanley, BioQUEST Curriculum Consortium Sam Donovan, University of Pittsburgh Jackson State University Jackson, MS April 25, 2013 Cyberlearning in.
By: Callan Mueller Bus111.  - Yobongo is a new way for people to communicate with people nearby  -It is a location based app that takes your position.
赴国际水稻所访学情况汇报 长江大学农学院 邢丹英 2010 年 6 月. 学习目的 学习时间、地点 学习内容 学习收获 几点体会 汇报提纲.
Coordinating and Advancing Field-Based Marine Science Education in Puget Sound: Boat Based Programs around the US Christian P. Sarason, Ocean Inquiry Project.
Computing Basics Andres, Wen-Yuan Liao Department of Computer Science and Engineering De Lin Institute of Technology
1 Role of Ethernet in Optical Networks Debbie Montano Director R&E Alliances Internet2 Member Meeting, Apr 2006.
SDSS Data Transfer Jing-Jou Yen Division Manager National Center for High-Performance Computing (August 26, 2005)
What can we see in the sky?. IN THE SKY WE CAN SEE MUCH MORE!
U.S. Geography Where to Travel? ICAP Presentation, February 12.
Building Peta-Byte Data Stores Jim Claus Shira Anniversary European Media Lab 12 February 2001.
Category 1 Category 2 Category.
Mbps over 5,626 km ~ 4e15 bit meters per second 4 Peta Bmps (“peta bumps”) Single Stream TCP/IP throughput Information Sciences Institute Microsoft.
南水北调东线第一期工程山东段 情况简介. 主要汇报内容 二、南水北调山东段工程总体布置 三、山东段工程项目划分及工程主 要建设内容 一、南水北调东线工程概况 四、前期工作及工程建设进展情况 五、工程总投资.
CA*net3 - International High Performance Connectivity 9th Internet2 Member Meeting Mar 9, Washington, DC tel:
Reading for next class No new reading, but there will be a quiz Ch 4.1 – 4.23: Internet applications.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Internet Structure: Technology, Coordination, and Organizations
Statue of Liberty Where is it? New York City, New York.
Moving Companies in VA Moving Companies in Washington DC.
Computer Technology Forecast
Dell Desktop Support Number Dell Printer Support Number
Regions of the United States
Unit 3 Lesson 5: Regional Cities
IAS/Park City Mathematics Institute Teacher Leadership Program
Next Generation Abilene
Internet Structure: Technology, Coordination, and Organizations
Press any key to begin slide show.
United States & Puerto Rico
Chapter 8 – Data switching and routing
Presentation transcript:

What I have been Doing Peta Bumps 10k$ TB Scaleable Computing Sloan Digital Sky Survey

300 MBps OC48 = G2 Or memcpy() 90 MBps PCI Sense of scale How fat is your pipe? Fattest pipe on MS campus is the WAN! 20 MBps disk / ATM / OC3 94 MBps Coast to Coast

Redmond/Seattle, WA San Francisco, CA New York Arlington, VA 5626 km 10 hops Information Sciences Institute MicrosoftQwest University of Washington Pacific Northwest Gigapop HSCC (high speed connectivity consortium) DARPA

The Path DC -> SEA C:\tracert -d Tracing route to over a maximum of 30 hops DELL 4400 Win2K WKS Arlington Virginia, ISIAlteon GbE 1 16 ms <10 ms <10 ms Juniper M40 GbE Arlington Virginia, ISI Interface ISIe 2 <10 ms <10 ms <10 ms Cisco GSR OC48 Arlington Virginia, Qwest DC Edge 3 <10 ms <10 ms <10 ms Cisco GSR OC48 Arlington Virginia, Qwest DC Core 4 <10 ms <10 ms 16 ms Cisco GSR OC48 New York, New York, Qwest NYC Core 5 62 ms 63 ms 62 ms Cisco GSR OC48 San Francisco, CA, Qwest SF Core 6 78 ms 78 ms 78 ms Cisco GSR OC48 Seattle, Washington, Qwest Sea Core 7 78 ms 78 ms 94 ms Juniper M40 OC48 Seattle, Washington, Qwest Sea Edge 8 78 ms 79 ms 78 ms Juniper M40 OC48 Seattle, Washington, PNW Gigapop 9 78 ms 78 ms 94 ms Cisco GSR OC48 Redmond Washington, Microsoft ms 78 ms 94 ms Compaq SP750 Win2K WKS Redmond Washington, Microsoft SysKonnect GbE

750mbps over 5000 km ( 957 mbps multi-stream ) ~ 4e15 bit meters per second 4 Peta bmps (peta bumps) Single Stream tcp/ip throughput Information Sciences Institute Microsoft Qwest University of Washington Pacific Northwest Gigapop HSCC (high speed connectivity consortium) DARPA 5 Peta bmps multi-stream

PetaBumps 751 mbps for 300 seconds = (~28 GB) single-thread single-stream tcp/ip desktop-to-desktop out of the box performance* 5626 km x 751Mbps = ~ 4.2e15 bit meter / second ~ 4.2 Peta bmps Multi-steam is 952 mbps ~5.2 Peta bmps 4470 byte MTUs were enabled on all routers. 20 MB window size

Pointers The single-stream submission: Windows2000_I2_land_Speed_Contest_Entry_(Single_Stream_mail).htm The multi-stream submission: Windows2000_I2_land_Speed_Contest_Entry_(Multi_Stream_mail).htm The code: speedy.h speedy.c And a PowerPoint presentation about it. Windows2000_WAN_Speed_Record.ppt

What I have been Doing Peta Bumps 10k$ TB Scaleable Computing Sloan Digital Sky Survey

TPC-C high performance clusters Standard transaction processing benchmark Mix of 5 simple transaction types. Database scales with workload Measures balanced system.

Scalability Successes Single Site Clusters –Billions of transactions per day –Tera-Ops & Peta-Bytes (10 k node clusters) –Micro-dollar/transaction Hardware + Software advances –TPC & Sort examples (2x/year) –Many other examples

Progress since Jan 99: Running out of gas? 50% better peak perf (not 2x) 2x better Price/Performance At a cost ceiling Systems cost 7M$-13M$ June 98 result: hero effort (off-scale good!) (Compaq/Alpha/Oracle 96 cpu, 8node cluster, 102,542 5/5/98) Outa gas?

First proof point of commoditized scale-out 1.7x Better Performance 3x Better price/performance 4M$ vs 7M$-13M$ Much more to do, but… great start! 2/17/00: back on Schedule!! Back on Schedule!

Year 2000 Sort Results DaytonaIndy Penny 4.5 GB (45 m records) 886 seconds on a $1010 Win2K/Intel system HMsort: doc (74KB), pdf (32KB).doc (74KB),pdf (32KB). Brad Helmkamp, Keith McCready, Stenograph LLCBrad HelmkampKeith McCready Stenograph LLC 4.5 GB (45 m records) 886 seconds on a $1010 Win2K/Intel system HMsort: doc (74KB), pdf (32KB).doc (74KB),pdf (32KB). Brad Helmkamp, Keith McCready, Stenograph LLCBrad HelmkampKeith McCready Stenograph LLC Minute 7.6 GB in 60 seconds Ordinal Nsort SGI 32 cpu Origin 7.6 GB in 60 seconds Ordinal Nsort SGI 32 cpu Origin IRIX IRIX 21.8 GB 218 M records in sec NOW+HPVMsort 64 nodes WinNT pdf (170KB). Luis Rivera, Xianan Zhang, Andrew Chien UCSDpdf (170KB). Luis Rivera Andrew Chien TeraByte 49 minutes Daivd Cossock49 minutes Daivd Cossock, Sam Fineberg, Pankaj Mehra, John Peck 68x2 Compaq Tandem Sandia LabsSam FinebergPankaj MehraJohn Peck 1057 seconds 1057 seconds SPsort 1952 SP cluster 2168 disks Jim Wyllie PDF SPsort.pdf (80KB) Jim WyllieSPsort.pdf (80KB) Datamatio n 1 M records in.998 Seconds ( doc 703KB) or (pdf 50KB) doc 703KBpdf 50KB Mitsubishi DIAPRISM Hardware Sorter with HP 4 x 550MHz Xeon PC server + 32 SCSI disks, Windows NT4 Shinsuke AzumaShinsuke Azuma, Takao Sakuma, Tetsuya Takeo, Takaaki Ando, Kenji Shirai Mitsubishi Electric Corp. Datamation

System Bus PCI Bus Whats a Balanced System?

Rules of Thumb in Data Engineering Moores law -> an address bit per 18 months. Storage grows 100x/decade (except 1000x last decade!) Disk data of 10 years ago now fits in RAM (iso-price). Device bandwidth grows 10x/decade – so need parallelism RAM:disk:tape price is 1:10:30 going to 1:10:10 Amdahls speedup law: S/(S+P) Amdahls IO law: bit of IO per instruction/second (tBps/10 top! 50,000 disks/10 teraOP: 100 M$ Dollars) Amdahls memory law: byte per instruction/second (going to 10) (1 TB RAM per TOP: 1 TeraDollars) PetaOps anyone? Gilders law: aggregate bandwidth doubles every 8 months. 5 Minute rule: cache disk data that is reused in 5 minutes. Web rule: cache everything! MS_TR_99_100_Rules_of_Thumb_in_Data_Engineering.doc

Cheap Storage Disks are getting cheap: 7 k$/TB disks (25 40 GB 230$ each)

Cheap Storage or Balanced System Low cost storage (2 x 1.5k$ servers) 10K$ TB 2x (1K$ system + 8x70GB disks + 100MbEthernet) Balanced server (9k$/.5 TB) –2x800Mhz (2k$) –256 MB (500$) –8 x 73 GB drives (4K$) –Gbps Ethernet + switch (1.5k$) –18k$ TB, 36K$/RAIDED TB 2x800 Mhz 256 MB

160 GB, 2k$ (now) 300 GB by year end. 4x40 GB ID (2 hot plugable) –(1,100$) SCSI-IDE bridge –200k$ Box –500 Mhz cpu –256 MB SRAM –Fan, power, Enet –700$ Or 8 disks/box 600 GB for ~3K$ ( or 300 GB RAID)

Hot Swap Drives for Archive or Data Interchange 25 MBps write (so can write N x 74 GB in 3 hours) 74 GB/overnite = ~N x $/nite

Doing Studies of IO bandwidth SCSI & IDE bandwidth –~15-30 MBps sequential –SCSI 10rpm ~ $ –IDE 7.2krpm ~ $ Get 2 disks for the price of 1 –More bandwidth for reads –RAID –10K$ raid TB by 2001

What I have been Doing Peta Bumps 10k$ TB Scaleable Computing Sloan Digital Sky Survey

A project run by the Astrophysical Research Consortium (ARC) Goal: To create a detailed multicolor map of the Northern Sky over 5 years, with a budget of approximately $80M Data Size: 40 TB raw, 1 TB processed Goal: To create a detailed multicolor map of the Northern Sky over 5 years, with a budget of approximately $80M Data Size: 40 TB raw, 1 TB processed The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study SLOAN Foundation, NSF, DOE, NASA The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study SLOAN Foundation, NSF, DOE, NASA The Sloan Digital Sky Survey

Scientific Motivation Create the ultimate map of the Universe: The Cosmic Genome Project! Study the distribution of galaxies: What is the origin of fluctuations? What is the topology of the distribution? Measure the global properties of the Universe: How much dark matter is there? Local census of the galaxy population: How did galaxies form? Find the most distant objects in the Universe: What are the highest quasar redshifts?

First Light Images Telescope: First light May 9th 1998 Equatorial scans Telescope: First light May 9th 1998 Equatorial scans

The First Stripes Camera: 5 color imaging of >100 square degrees Multiple scans across the same fields Photometric limits as expected Camera: 5 color imaging of >100 square degrees Multiple scans across the same fields Photometric limits as expected

SDSS Data Flow

All raw data saved in a tape vault at Fermilab Object catalog400 GB parameters of >10 8 objects Redshift Catalog 1 GB parameters of 10 6 objects Atlas Images 1.5 TB 5 color cutouts of >10 8 objects Spectra 60 GB in a one-dimensional form Derived Catalogs 20 GB - clusters - QSO absorption lines 4x4 Pixel All-Sky Map 60 GB heavily compressed Object catalog400 GB parameters of >10 8 objects Redshift Catalog 1 GB parameters of 10 6 objects Atlas Images 1.5 TB 5 color cutouts of >10 8 objects Spectra 60 GB in a one-dimensional form Derived Catalogs 20 GB - clusters - QSO absorption lines 4x4 Pixel All-Sky Map 60 GB heavily compressed SDSS Data Products

User Interface Analysis Engine Master Objectivity RAID Slave Objectivity RAID Slave Objectivity RAID Slave Objectivity RAID Slave SX Engine Objectivity Federation Distributed Implementation

Helping move the data to SQL –Database design –Data loading Experimenting with queries on a 4 M object DB –20 questions like find gravitational lens candidates –Queries use parallelism, most run in a few seconds.(auto parallel) –Some run in hours (neighbors within 1 arcsec) –EASY to ask questions. Helping with an outreach website: SkyServer Personal goal: Try datamining techniques to re-discover Astronomy What We Have Been Doing

What I have been Doing Peta Bumps 10k$ TB Scaleable Computing Sloan Digital Sky Survey