Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 THsort PennySort Award Ceremony Beijing China 19 October 2002 Peng LiuPeng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, | ZunChong Tian, Hao Wang, Xiaoge.

Similar presentations


Presentation on theme: "1 THsort PennySort Award Ceremony Beijing China 19 October 2002 Peng LiuPeng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, | ZunChong Tian, Hao Wang, Xiaoge."— Presentation transcript:

1 1 THsort PennySort Award Ceremony Beijing China 19 October 2002 Peng LiuPeng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, | ZunChong Tian, Hao Wang, Xiaoge Wang Yao Shi Li Zhang Kuo Zhang Tian Wang ZunChong Tian Hao Wang Xiaoge Wang Trophy presentation by Jim Gray

2 2 Outline Penny Sort history and Award The need for long-range research Some long-range systems research goals. What I have been doing.

3 3 Benchmark History Wisconsin Bitton Boral DeWitt Turbyfill IBM TP 1-7 CA and Tony Lukes Debit Credit Gray Datamation Anon et al TPC-A MCC Boral &... TPC-B TPC-C TPC-W ? Teradata Bollinger &... TPC-D Sort PennySort MinuteSort

4 4 A Short History of Sort April Fools 1995: Datamation Sort –Sort 1M 100 B records –An IO benchmark: 15-min to 1 hr! 1993:{Minute | Penny}x{Daytona | Indy} 1998: TeraByte Sort Web site:

5 5 Ground Rules How much can you sort for a penny (in a minute). –Hardware and Software cost –Depreciated over 3 years –1M$ system gets about 1 second, –1K$ system gets about 1,000 seconds. – Time (seconds) = SystemPrice ($) / 946,080 Input and output are disk resident Input is –100-byte records (random data) –key is first 10 bytes. Must create output file and fill with sorted version of input file. Daytona (product) and Indy (special) categories

6 6 PennySort Hardware –266 Mhz Intel PPro –64 MB SDRAM (10ns) –Dual Fujitsu DMA 3.2GB EIDE disks Software –NT workstation 4.3 –NT 5 sort Performance –sort 15 M 100-byte records (~1.5 GB) –Disk to disk –elapsed time 820 sec cpu time = 404 sec

7 PennySort Daytona & Indy: 2.58 GB in 917 sec HMsort: Brad Helmkamp, Keith McCready, Stenograph LLC Intel 400Mhz 2 IDE disks

8 TB Sort Chris Nyberg Nsort SGI 32x Origin Minutes

9 Terabyte Sort Daytona: Daivd Cossock, Sam Fineberg, Pankaj Mehra, John Peck Tandem/Sandia TSort: 68 CPU ServerNet 47 minutes Indy: IBM SPsort 408 nodes, 1952 cpu 2168 disks 17.6 minutes = 1057sec (all for 1/3 of 94M$, slice price is 64k$ for 4cpu, 2GB ram, 6 9GB disks + interconnect

10 10 SP sort 2 – 4 GBps!

11 Sort Records 2002 Sort Records Daytona Indy Penny 9.8 GB 1098 seconds 105 million records105 million records $857 Linux/Intel THsort, report as doc (128KB) or pdf (33KB) Peng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, ZunChong Tian, Hao Wang, Xiaoge Wang High Performance Institute, Dept. of Computer Science and Technology, Tsinghua University, Beijing , China THsortdocpdf Peng LiuYao ShiLi ZhangKuo Zhang Tian WangZunChong TianHao Wang Xiaoge Wang 11.6 GB 1380 seconds 125 m records on a $672 Linux/Intel system DMsort pdf (660KB), ps(950KB) Araron Darling, Alex Mohr, U. Wisconsin, Madison DMsortpdf ps Araron DarlingAlex Mohr Minute 12 GB in 60 seconds Ordinal Nsort SGI 32 cpu Origin 12 GB in 60 seconds Ordinal Nsort SGI 32 cpu Origin IRIX IRIX 21.8 GB in sec 218 million records NOW+HPVMsort 64 nodes WinNT pdf. Luis Rivera, Andrew Chien UCSDpdf. Luis Rivera Andrew Chien TeraByt e 49 minutes Daivd Cossock, Sam Fineberg, Pankaj Mehra, John Peck 68x2 Compaq &Sandia Labs Daivd CossockSam FinebergPankaj MehraJohn Peck 1057 seconds SPsort 1952 SP cluster 2168 disks Jm Wyllie Jm Wyllie PDF SPsort.pdf (80KB)SPsort.pdf (80KB)

12 12 The THsort Team (and friend)

13 13 Partly hardware Partly software Partly economics 2x/year! THsort ~ 1TB/$

14 14 Progress on Sorting Speedup comes from Moores law 40%/year Processor/Disk/Network arrays: 60%/year (this is a software speedup). THsort ~1TB/$

15 15 Musings: PennySort=TBsort Sorts 1TB in 1Minute 2 pass so 3TB of disk = 10 disks if 330GB/disk = 5Gps (if each disk is 50Mbps) So, 600 seconds (3TB/5GBps) So, node costs 1.5k$ Costs 100x that today maybe in 4 years?

16 16 Outline Penny Sort history and Award The need for long-range research Some long-range systems research goals. What I have been doing.

17 17 Properties of a Research Goal Simple to state. Not obvious how to do it. Clear benefit. Can be broken into smaller steps –So that you can see intermediate progress. Progress and solution is testable.

18 18 I was motivated by a simple goal 1.Devise an architecture that scales up: Grow the system without limits *. This is impossible (without limits?), but... This meant automatic parallelism, automatic management, distributed, fault tolerant, high performance Benefits: –long term vision guides research problems –simple to state, so attracts colleagues and support –Can tell your friends & family what it is that you do. scaleup: 1,000,000 : 1

19 19 Three Seminal Papers Babbage: Computers Bush: Automatic Information storage & access Turing: Intelligent Machines Note: –Previous Turing lectures described several theory problems. –Problems here are systems problems. –Some include a and prove it clause. –They are enabling technologies, not applications. –Newells: Intelligent Universe (Ubiquitous computing.) missing because I could not find simple-to-state problems.

20 20 Charles Babbage ( ) Babbages computing goals have been realized –But we still need better algorithms & faster machines What happens when –Computers are free and infinitely powerful? –Bandwidth and storage is free and infinite? Remaining limits: –Content: the core asset of cyberspace –Software: Bugs, >100$ per line of code (!) –Operations: > 1,000 $/node/year

21 21 ops/s/$ Had Three Growth Curves Mechanical Relay 7-year doubling Tube, transistor, year doubling Microprocessor 1.0 year doubling Combination of Hans Moravac + Larry Roberts + Gordon Bell WordSize*ops/s/sysprice

22 22 Trouble-Free Appliances Appliance just works. TV, PDA, desktop,... State replicated in safe place (somewhere else) If hardware fails, or is lost or stolen, replacement arrives next day (plug&play). If software faults, software and state refresh from server. If you buy a new appliance, it plugs in and refreshes from the server (as though the old one failed) Most vendors are building towards this vision. Browsers come close to working this way.

23 23 Trouble-Free Systems Manager –Sets goals –Sets policy –Sets budget –System does the rest. Everyone is a CIO (Chief Information Officer) 9.Build a system –used by millions of people each day –Administered and managed by a ½ time person. On hardware fault, order replacement part On overload, order additional equipment Upgrade hardware and software automatically.

24 24 Trustworthy Systems Build a system used by millions of people that 10.Only services authorized users Service cannot be denied (cant destroy data or power). Information cannot be stolen. 11.Is always available: (out less than 1 second per 100 years = 8 9s of availability) 1950s 90% availability, Today 99% uptime for web sites, 99.99% for well managed sites (50 minutes/year) 3 extra 9s in 45 years. Goal: 5 more 9s: 1 second per century. –And prove it.

25 $ line of code? 1 bug per thousand lines? 20 $ to design and write it. 30 $ to test and document it. 50 $ to maintain it. 100$ total The only thing in Cyber Space that is getting MORE expensive & LESS reliable Application generators: Web sites, Databases,... Semi-custom apps: SAP, PeopleSoft,.. Scripting & Objects JavaScript & DOM Solution so far: Write fewer lines High level languages Non Procedural 10x not 1,000x better Very domain specific

26 26 Automatic Programming Do What I Mean (not 100$ Line of code!, no programming bugs) The holy grail of programming languages & systems 12.Devise a specification language or UI 1.That is easy for people to express designs (1,000x easier), 2.That computers can compile, and 3.That can describe all applications (is complete). System should reason about application –Ask about exception cases. –Ask about incomplete specification. –But not be onerous. This already exists in domain-specific areas. (i.e. 2 out of 3 already exists) An imitation game for a programming staff.

27 27 Outline Penny Sort history and Award The need for long-range research Some long-range systems research goals. What I have been doing.

28 28 What I Have Been Doing Traveling & Talking Helping Alex Build the SkyServer Loading data Helping build the Virtual Observatory Doing spatial geometry in SQL (no kidding)! Learning about web services (and implementing some)


Download ppt "1 THsort PennySort Award Ceremony Beijing China 19 October 2002 Peng LiuPeng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, | ZunChong Tian, Hao Wang, Xiaoge."

Similar presentations


Ads by Google