Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gordon Bell Bay Area Research Center Microsoft Corporation

Similar presentations

Presentation on theme: "Gordon Bell Bay Area Research Center Microsoft Corporation"— Presentation transcript:

1 Gordon Bell Bay Area Research Center Microsoft Corporation
More Wheels of Reincarnation Or A New PC+, www+ Era? Infinite processing, memory, and bandwidth @ zero cost Gordon Bell Bay Area Research Center Microsoft Corporation

2 The Highly Probable Future c2025 83 items from J. Coates, Futurist, Vol. 84, 1994
8.4 B, english speaking, personally tagged & identified, prosthetic assisted and/or mutant, tense people who have access & control of their medical records Everything will be smart, responsive to environment. Sensing of everything… challenge for science & engineering! Fast broadband network Smart appliances & AI Tele-all: shop, vote, meet, work, etc. Robots do everything, but there may be conflict with labor… A “managed”, physical and man-made world Reliable weather reports “Many natural disasters e.g. floods, earthquakes, will be mitigated, controlled or prevented” Nobel prize to “economist” for “value of information” No surprises. We can see 10 years, but not 20!

3 IP On Everything

4 poochi


6 The only thing that matters at the end of the day is, it’s a great building.

7 PC At An Inflection Point
Non-PC devices and Internet PCs

8 Consumer PCs Mobile Companions TV/AV The Dawn Of The PC-Plus Era, Not The Post-PC Era… devices aggregate via PCs!!! Household Management Communications Automation & Security

9 Copyright 1999 Microsoft Corporation
PCTV a.k.a. MilliBillg Using PCs to drive large screens e.g. tv sets, Plasma Panels Copyright 1999 Microsoft Corporation

10 Another big bang? Internet to TV and audio: The Net, PC meet the TV
“milliBill” Settop box Analog/digital cable distribution Home CATV Ethernet Home network Basic ideas: 1. PC records or plays thru video cable channels. 2. PC “broadcasts” art images, webcams, presentations, videos, DVDs, etc. 3. Ethernet not cable? Video capture PC broadcasts are mixed into home CATV in analog and/or MPEG digital

11 Images from: A gallery that sells art on line


13 The Next Convergence POTS connects to the Web a. k. a
The Next Convergence POTS connects to the Web a.k.a. Phone-Web Gateways Web Server PSTN The Web Voice to WEB Bridge DataBase

14 PC will prevail for the next decade as the dominant platform… its COTS or COTS’ AND www!
Moore’s Law increases performance; and alternatively reduces prices PC server clusters with low cost OS beat proprietary switches, smPs, and DSMs Home entertainment & control … Very large disks (1TB by 2005) to “store everything” personal Screens to enhance use Lack of last mile bandwidth to move pictures, data, and interact favors home mainframes aka PCs C = Commercial; C’ = Consumer

15 My betting record: No losses … so far (>5year old bets)
Not TMC & MPP domination by 1995 … c1990 with Danny Hillis Video On Demand will not exist by 1995 AT&T acquisition of NCR will not be successful Not 10K by 1/2001 Not 1 B internet users by 1/2001 or 1/2002 Cars won’t drive themselves by 2005 PCs continue with 2 digit growth through 2002

16 Outline Future predictions… 2020 and the world
Caveat: How far out can we see? WWW just >5 years old Background: Bell-Gray c1995 our bet on SNAP My own history of supercomputing… from last Salishan The hardware scene in 5-10 years? Processing and Moore’s Law Networking Disks Challenges: OSS Communities with dbases & hs nets ASP: workbenches If simulation is third mode after theory, expt, what is 4th? connection with the experimental world for data; then control… biologist workbench where work is being done.

17 SNAP … c1995 Scalable Network And Platforms A View of Computing in We all missed the impact of WWW! This talk / essay portrays our view of computer-server architecture trends. (It is silent on the client cellphones, toasters, and gameboys This is an early draft. We are sending a copy to you in hopes that you’ll read and comment on it. We would like to publish it in several forms: 2 hr video lecture, an kickoff article in a ComputerWorld issue that Gordon is editing, a monograph enlarged to be published within a year. January 1, 1995 Gordon Bell Jim Gray

18 How Will Future Computers Be Built?
Thesis: SNAP: Scalable Networks and Platforms Upsize from desktop to world-scale computer based on a few standard components Because: Moore’s law: exponential progress Standardization & Commoditization Stratification and competition When: Sooner than you think! Massive standardization gives massive use Economic forces are enormous

19 Performance versus time for various microprocessors
Moore's law has implications for speed because transistors are smaller and faster. Increases in microprocessor cache size and parallelism come from having more transistors. The result has been the quadrupling of speed every 3 years. This is fortunate since Amdahl posited that one megabyte of memory was needed for every million instructions per second the processor ran. Speed has increased at 60% per year since the late 1980s to keep up with larger memory chips.

20 Volume drives simple, cost to standard platforms
Stand-alone Desk tops PCs This illustrates the power of volume production. If you assume that power increases linearly with the number of processing elements, then several different platforms can supply power. The most cost-effective is a gang of PCs. Microsoft's scalable server, Tiger, for video on demand demonstrates this. Using workstations is more expensive and a higher speed interconnect adds a marginal amount. The various multiprocessors are more expensive than LAN connected workstations. Multis cost about the same amount as massively parallel computers. The second most cost-effective platforms are the small multiprocessors that use the PC's microprocessor. Thus, smaller, high volume platforms beats larger more specialized multiprocessors.

21 The economics of operating systems and databases

22 The Virtuous Economic Cycle drives the PC industry… & Beowulf
Competition Volume Standards DOJ Utility/value Innovation

23 The UNIX Trap: creating the myth of “open systems”
“Standard” has meant different! VendorIX platforms have created the “downsizing” market that provides an apparent, cost reduction Hardware platform vendors lock-in users with servers of proprietary UNIX dialects and unique chips to maintain margins for chip and UNIX development VendorIX R & D costs ­ $1.4 - $2 billion Implied selling price ­ $ billion for $1.4 billion, or a sales tax of 1 million UNIX units of $10,000 Users hostage with client-server, database, and apps An implicit or unconscious cartel has formed that maintains the industry status quo

24 xx The UNIX Cartel and Tax: It’s not competitive and it introduces higher downstream costs ­10,000 companies maintain dialects R & D costs ­ $1.4 - $2 billion Implied selling price ­ $ billion for $1.4 billion, or a sales tax of 1 million UNIX units of $10,000 Cost could be reduced to ­ $400 million for ONE UNIX, sales price for 1 million units would be $2, ,000 NT sales price is $650; OS2 needs to sell for $1.2b/6m Furthermore: The downstream effects on database vendors is 40% R&D efficiency causing an implied database tax of 2.5x the sales price! The downstream effects on apps vendors is similar

25 SNAP Architecture----------
With this introduction about technology, computing styles, and the chaos and hype around standards and openness, we can look at the Network & Nodes architecture I posit.

26 Computing SNAP Environment circa ­ 2000
Legacy mainframes & minicomputers servers & terms Portables Legacy mainframe & minicomputer servers & terminals Wide-area global ATM network Mobile Nets NT, Windows & UNIX person servers Local & global data comm world ATM† & Local Area Networks for: terminal, PC, workstation, & servers multicomputers built from multiple simple, servers NT, Windows & UNIX person servers* Centralized & departmental uni- & mP servers (UNIX & NT) † also mb/s pt-to-pt Ethernet Centralized & departmental scalable uni- & mP servers* (NT & UNIX) ??? First, the network is uniform and ubiquitous. It links to and is compatible with mobile networks. The many kinds of networks need to become one -- distributed and point-to-point Local Area networks, private and public Wide Area Networks, proprietary terminal and cluster interconnects and protocols, and Plain Old Telephone Service, including the telephony switching fabric. If this isn't enough, then the cable networks using broadband, broadcast technology would also adopt ATM and inter-operate with the phone and data networks. However, let me not predicate the network on having to carry switched television, although in principle it could and may. Replacing the cable and broadcast televison network is a question of plant and equipment investment, and government regulation. TC=TV+PC home ... (CATV or ATM or satellite) NFS, database, compute, print, & communication servers * Platforms: X86 PowerPC ... etc. Universal high speed data service using ATM or ?? A space, time (bandwidth), & generation scalable environment

27 Computing SNAP built entirely from PCs
Legacy mainframes & minicomputers servers & terms Portables Legacy mainframe & minicomputer servers & terminals Wide-area global network Mobile Nets Wide & Local Area Networks for: terminal, PC, workstation, & servers Person servers (PCs) scalable computers built from PCs Person servers (PCs) Centralized & departmental uni- & mP servers (UNIX & NT) Centralized & departmental servers buit from PCs ??? Here's a much more radical scenario, but one that seems very likely to me. There will be very little difference between servers and the person servers or what we mostly associate with clients. This will come because economy of scale is replaced by economy of volume. The largest computer is no longer cost-effective. Scalable computing technology dictates using the highest volume, most cost-effective nodes. This means we build everything, including mainframes and multiprocessor servers from PCs! TC=TV+PC home ... (CATV or ATM or satellite) A space, time (bandwidth), & generation scalable environment

28 GB with NT, Compaq, & HP cluster

29 In a decade we can/will have:
more powerful personal computers processing x; multiprocessors-on-a-chip 4x resolution (2K x 2K) displays to impact paper Large, wall-sized and watch-sized displays low cost, storage of one terabyte for personal use adequate networking? PCs now operate at 1 Gbps ubiquitous access = today’s fast LANs Competitive wireless networking One chip, networked platforms e.g. light bulbs, cameras everywhere, & managed by PCs! Some well-defined platforms that compete with the PC for mind (time) and market share watch, pocket, body implant, home Inevitable, continued cyberization… the challenge… interfacing platforms and people.

30 High Performance Computing
A 60+ year view


32 Star Bridge

33 Linux super howls

34 Dead Supercomputer Society

35 Dead Supercomputer Society
ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics

36 Steve Squires & Cray

37 Bell Prize and Future Peak Tflops (t)
*IBM Petaflops study target NEC CM2 XMP NCube

38 Top 10 tpc-c Top two Compaq systems are: 1.1 & 1.5X faster than IBM SPs; 1/3 price of IBM 1/5 price of SUN

39 Courtesy of Dr. Thomas Sterling, Caltech

40 Courtesy of Dr. Thomas Sterling, Caltech

41 Contributions of Beowulf
An experiment in parallel computing systems Established vision low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book GB: Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle Courtesy of Dr. Thomas Sterling, Caltech

42 High performance architectures timeline
Vtubes Trans. MSI(mini) Micro RISC nMicr “IBM PC” Processor overlap, lookahead “killer micros” Cray era Cray1 X Y C T Func Pipe Vector-----SMP > SMP mainframes---> “multis” > DSM?? Mmax. KSR SGI----> Clusters Tandm VAX IBM UNIX-> MPP if n> Ncube Intel IBM-> Local NOW and Global Networks n>10, Grid

43 High performance architectures timeline
Vtubes Trans. MSI(mini) Micro RISC nMicr “IBM PC” Sequential programming----> (single execution stream e.g. Fortran) Processor overlap, lookahead “killer micros” Cray era Cray1 X Y C T Func Pipe Vector-----SMP > SMP mainframes---> “multis” > DSM?? Mmax. KSR DASHSGI---> <SIMD Vector--// Parallelization--- THE NEW BEGINNING Parallel programs aka Cluster Computing < multicomputers <--MPP era------ Clusters Tandm VAX IBM UNIX-> MPP if n> Ncube Intel IBM-> Local NOW Beowlf and Global Networks n>10, Grid

44 High performance architecture/program timeline
Vtubes Trans. MSI(mini) Micro RISC nMicr Sequential programming----> (single execution stream) <SIMD Vector--// Parallelization--- Parallel programs aka Cluster Computing < multicomputers <--MPP era------ ultracomputers 10X in size & price! 10x MPP “in situ” resources 100x in //sm NOW VLSCC geographically dispersed Grid

45 -------- Connectivity--------
Computer types Connectivity WAN/LAN SAN DSM SM Netwrked Supers… GRID VPPuni NEC mP NEC super Cray X…T (all mPv) Clusters micros vector Legion Condor Beowulf NT clusters T3E SP2(mP) NOW SGI DSM clusters & Mainframes Multis WSs PCs

46 Technical computer types: Pick of: 4 nodes, 2-3 interconnects
SAN DSM SMP Fujitsu Hitachi NEC NEC super Cray ??? Fujitsu Hitachi micros vector IBM ?PC? SGI cluster Beow/NT SGI DSM T3 HP? HP IBM Intel SUN plain old PCs

47 Technical computer types
WAN/LAN SAN DSM SM New world: Clustered Computing (multiple program streams) Old World ( one program stream) Netwrked Supers… GRID VPPuni NEC mP T series NEC super Cray X…T (all mPv) micros vector Legion Condor Beowulf SP2(mP) NOW SGI DSM clusters & Mainframes Multis WSs PCs

48 Technical computer types
WAN/LAN SAN DSM SM MPI, Linda, PVM, Cactus, ??? distributed function Computing Vectorize Parallellelize Netwrked Supers… GRID VPPuni NEC mP T series NEC super Cray X…T (all mPv) micros vector Parallellelize Legion Condor Beowulf SP2(mP) NOW SGI DSM clusters & Mainframes Multis WSs PCs

49 Gaussian Parallelism

50 Beyond Moore’s Law …>10 yrs
Just FCB (faster, cheaper, better)… COTS will soon mean consumer off the shelf Moore’s Law and technology progress likely to continue for another decade for: processing & memory, storage, LANs, & WANs are really evolving System-on-a chip of interesting sizes will emerge to create 0 cost systems No DNA, molecular, or quantum computers, or new stores Any displacement technology is unlikely … Carver Mead’s Law c1980 A technology takes 11 years to get established On the other hand, we are on Internet time!

51 High Performance Computing
Supers we knew are Japanese… we have to stay the course. We actually may win! PC will continue to erode capacity need Scalability & COTS are in… but you have to roll your own else pay VendorIX taxes Beowulf is $14K/TB ( 6 x 4 x 40 GB) IBM 4000R 1 rack: 2x42 500Mhz processors, 84 GB, 84 disks $420K … still cheaper than the “big buys” $10-20K/node for special purpose vs $2K for a MAC EMC, IBM at $1 million/TB; vs $14K We should back radical experiments!

52 We get more of everything

53 Computer ops/sec x word length / $

54 Growth of microprocessor performance
10000 Cray T90 Micros Supers 1000 Cray 2 Cray Y-MP Cray C90 Alpha RS6000/590 Cray X-MP Alpha 100 RS6000/540 Cray 1S i860 10 Performance in Mflop/s R2000 1 80387 0.1 6881 80287 8087 0.01 1998 1980 1982 1986 1988 1990 1992 1994 1996

55 Albert Yu predictions ‘96
When Clock (MHz) x MTransistors x Mops , x Die (sq. in.) x

56 Processor Limit: DRAM Gap
60%/yr.. DRAM 7%/yr.. 1 10 100 1000 1980 1981 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance “Moore’s Law” Y-axis is performance X-axis is time Latency Cliché: Not e that x86 didn’t have cache on chip until 1989 Alpha full cache miss / instructions executed: ns/1.7 ns =108 clks x 4 or 432 instructions Caches in Pentium Pro: 64% area, 88% transistors *Taken from Patterson-Keeton Talk to SigMod

57 Sony Playstation export limiits

58 Things get cheaper 70

59 Exponential change of 10X per decade causes real turmoil!
100000 10000 1000 100 $K 10 1 0.1 0.01 8 MB 1 MB 256 KB 64 KB 16 KB Timeshared systems Single-user systems 71

60 VAX Planning Model 1975: I didn’t believe it
The model was very good 1978 timeshared $250K VAXen cost about $8K in 1997! Costs declined > 20% users got lots more memory than I predicted Single user systems didn’t come down as fast, unless you consider PDAs VAX ran out of address bits! 72

61 System-on-a-chip alternatives
FPGA Sea of un-committed gate arrays Xylinx, Altera Compile a system Unique processor for every app Tensillica Systolic | array Many pipelined or parallel processors DSP | VLIW Special purpose processors TI Pc & Mp. ASICS Gen. Purpose cores. Specialized by I/O, etc. Intel, Lucent, IBM Universal Micro Multiprocessor array, programmable I/o Cradle

62 Cradle: Universal Microsystem trading Verilog & hardware for C/C++
UMS : VLSI = microprocessor : special systems Software : Hardware Single part for all apps run time via FPGA & ROM 5 quad mPs at 3 Gflops/quad = 15 Glops Single shared memory space, caches Programmable periphery including: 1 GB/s; 2.5 Gips PCI, 100 baseT, firewire $4 per flops; 150 mW/Gflops

63 UMS Architecture Memory bandwidth scales with processing
Must allow mix and match of applications. Design reuse is important thus scalability is a must. Resources must be balanced. Cradle is developing such an architecture which has multiple processors (MSPs) which are attached to private memories and can communicate with external devices through a Dram controller and programmable I/O. Explain architecture- Regular, Modular, Processing with Memory, High speed bus Memory bandwidth scales with processing Scalable processing, software, I/O Each app runs on its own pool of processors Enables durable, portable intellectual property

64 Free 32 bit processor core

65 Linus’s Law: Linux everywhere
Software is or should be free All source code is “open” Everyone is a tester Everything proceeds a lot faster when everyone works on one code Anyone can support and market the code for any price Zero cost software attracts users! All the developers write lots of code

66 ISTORE Hardware Vision
System-on-a-chip enables computer, memory, without significantly increasing size of disk 5-7 year target: MicroDrive:1.7” x 1.4” x 0.2” : ? 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor 2x height Connected via crossbar switch growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tf

67 The Disk Farm? or a System On a Card?
14" The 500GB disc card An array of discs Can be used as 100 discs 1 striped disc 50 FT discs ....etc LOTS of accesses/second of bandwidth A few disks are replaced by 10s of Gbytes of RAM and a processor to run Apps!!


69 Disk vs Tape At 10K$/TB disks are competitive with nearline tape.
40 GB 20 MBps 5 ms seek time 3 ms rotate latency 7$/GB for drive 3$/GB for ctlrs/cabinet 4 TB/rack 1 hour scan Tape 40 GB 10 MBps 10 sec pick time second seek time 2$/GB for media 8$/GB for drive+library 10 TB/rack 1 week scan Guestimates Cern: 200 TB 3480 tapes 2 col = 50GB Rack = 1 TB =20 drives The price advantage of tape is narrowing, and the performance advantage of disk is growing

70 1988 Federal Plan for Internet

71 The virtuous cycle of bandwidth supply and demand
Increased Demand Increase Capacity (circuits & bw) Standards Create new service Lower response time Telnet & FTP WWW Audio Video Voice!

72 Information Sciences Institute Microsoft QWest
744Mbps over 5000 km to transmit 14 GB ~ 4e15 bit meters per second 4 Peta Bmps (“peta bumps”) Single Stream tcp/ip throughput Information Sciences Institute Microsoft QWest University of Washington Pacific Northwest Gigapop HSCC (high speed connectivity consortium) DARPA

73 Map of Gray Bell Prize results
Redmond/Seattle, WA single-thread single-stream tcp/ip via 7 hops desktop-to-desktop …Win 2K out of the box performance* New York Arlington, VA San Francisco, CA 5626 km 10 hops

74 Ubiquitous 10 GBps SANs in 5 years
1Gbps Ethernet are reality now. Also FiberChannel ,MyriNet, GigaNet, ServerNet,, ATM,… 10 Gbps x4 WDM deployed now (OC192) 3 Tbps WDM working in lab In 5 years, expect 10x, wow!! 1 GBps 120 MBps (1Gbps) 80 MBps 5 MBps 40 MBps 20 MBps

75 The Promise of SAN/VIA:10x in 2 years
Yesterday: 10 MBps (100 Mbps Ethernet) ~20 MBps tcp/ip saturates 2 cpus round-trip latency ~250 µs Now Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… Fast user-level communication tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us 1.6 Gbps demoed on a WAN

76 How much does wire-time cost? $/Mbyte? Odlyzko, 1998 & Jim Gray
Cost ($) Time Gbps Ethernet .2µ 10 ms 100 Mbps Ethernet .3µ 100 ms OC12 (650 Mbps) ms DSL sec POTs sec Wireless sec

77 Modern scalable switches … also hide a supercomputer
Scale from <1 to 120 Tbps 1 Gbps ethernet switches scale to 10s of Gbps, scaling upward SP2 scales from 1.2

78 Where are the challenges?
Continued development based on clusters … Scalar processors need to compete with vectors. The U.S. has cast its lot with COTS! Explore radical alternatives. WWW is here. Now exploit it in every respect. Exploit OSS… though it may not be new! Telepresence & interactive communities!!! Grid as a prelude to: Application Service Providers Prototype biologist and chemist workbenches Cell laboratory, U. of WA Sloan sky survey

79 1st, 2nd, 3rd, or New Paradigm for science?

80 Labscape

81 Labscape

82 Labscape sensors Location tracking of people/samples
multiple resolutions passive and active tags Manual tasks (e.g., use of reagents, tools) Audio/video records, vision and indexing Networked instruments (e.g., pipettes, refrigerators, etc.)

83 What am I willing to predict?
Processing & data can be anywhere… Maui… in winter. BW is the limiter! Japan… if supers are so super else use PCs In the disks Application Service Providers: can we separate our data from ourselves and businesses (ying-yang of personal versus central services) The GRID e.g. biologist & chemist workbenches iff the IP doesn’t get in way Collaboration ala astrophysics (high energy physics, math, earth sci. and any pure science if pure science continues!) OSS is the big bang for supercomputing??

84 The End


Download ppt "Gordon Bell Bay Area Research Center Microsoft Corporation"

Similar presentations

Ads by Google