Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crays, Clusters, Centers and Grids

Similar presentations


Presentation on theme: "Crays, Clusters, Centers and Grids"— Presentation transcript:

1 Crays, Clusters, Centers and Grids
Gordon Bell Bay Area Research Center Microsoft Corporation

2 Summary Sequential & data parallelism using shared memory, Fortran computers 60-90 Search for parallelism to exploit micros 85-95 Users adapted to the clusters aka multi-computers by lcd program model, MPI. >95 Beowulf standardized clusters of standard hardware and software >1998 “Do-it-yourself” Beowulfs impede new structures and threaten centers >2000 High speed nets kicking in to enable Grid.

3 Outline Retracing scientific computing evolution: Cray, DARPA SCI & “killer micros”, Clusters kick in. Current taxonomy: clusters flavors deja’vu rise of commodity computng: Beowulfs are a replay of VAXen c1980 Centers Role of Grid and Peer-to-peer Will commodities drive out new ideas?

4 High performance architecture/program timeline
Vtubes Trans. MSI(mini) Micro RISC nMicr Sequential programming----> (single execution stream) <SIMD Vector--// Parallelization--- Parallel programs aka Cluster Computing < multicomputers <--MPP era------ ultracomputers 10X in size & price! 10x MPP “in situ” resources 100x in //sm NOW CC geographically dispersed Grid

5 DARPA Scalable Computing Initiative c1985-1995; ASCI
Motivated by Japanese 5th Generation Realization that “killer micros” were Custom VLSI and its potential Lots of ideas to build various high performance computers Threat and potential sale to military

6 Steve Squires & G Bell at our “Cray” at the start of Darpa’s SCI.

7 Dead Supercomputer Society

8 Dead Supercomputer Society
ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics

9 DARPA Results Many research and construction efforts … virtually all failed. DARPA directed purchases… screwed up the market, including the many VC funded efforts. No Software funding. Users responded to the massive power potential with LCD software. Clusters, clusters, clusters using MPI. It’s not scalar vs vector, its memory bandwidth! 6-10 scalar processors = 1 vector unit 16-64 scalars = a 2 – 6 processor SMP

10 The evolution of vector supercomputers

11 The evolution of Cray Inc.

12 -------- Connectivity--------
Computer types Connectivity WAN/LAN SAN DSM SM Netwrked Supers… VPPuni NEC mP NEC super Cray X…T (all mPv) GRID & P2P Clusters Old World micros vector Legion Condor Beowulf NT clusters T3E SP2(mP) NOW SGI DSM clusters & Mainframes Multis WSs PCs

13 Top500 taxonomy… everything is a cluster aka multicomputer
Clusters are the ONLY scalable structure Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector. MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name Cluster types. Implied message passing. Constellations = clusters of >=16 P, SMP Commodity clusters of uni or <=4 Ps, SMP DSM: NUMA (and COMA) SMPs and constellations DMA clusters (direct memory access) vs msg. pass Uni- and SMPvector clusters: Vector Clusters and Vector Constellations

14

15 The Challenge leading to Beowulf
NASA HPCC Program begun in 1992 Comprised Computational Aero-Science and Earth and Space Science (ESS) Driven by need for post processing data manipulation and visualization of large data sets Conventional techniques imposed long user response time and shared resource contention Cost low enough for dedicated single-user platform Requirement: 1 Gflops peak, 10 Gbyte, < $50K Commercial systems: $1000/Mflops or 1M/Gflops

16 Linux - a web phenomenon
Linus Tovald - bored Finish graduate student writes news reader for his PC, uses Unix model Puts it on the internet for others to play Others add to it contributing to open source software Beowulf adopts early Linux Beowulf adds Ethernet drivers for essentially all NICs Beowulf adds channel bonding to kernel Red Hat distributes Linux with Beowulf software Low level Beowulf cluster management tools added

17

18 Courtesy of Dr. Thomas Sterling, Caltech

19 The Virtuous Economic Cycle drives the PC industry… & Beowulf
Attracts suppliers Greater availability @ lower cost Competition Volume Standards DOJ Utility/value Innovation Creates apps, tools, training, Attracts users

20 BEOWULF-CLASS SYSTEMS
Cluster of PCs Intel x86 DEC Alpha Mac Power PC Pure M2COTS Unix-like O/S with source Linux, BSD, Solaris Message passing programming model PVM, MPI, BSP, homebrew remedies Single user environments Large science and engineering applications

21 Interesting “cluster” in a cabinet
366 servers per 44U cabinet Single processor GB/computer (24 TBytes) Mbps Ethernets ~10x perf*, power, disk, I/O per cabinet ~3x price/perf Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes

22 Lessons from Beowulf An experiment in parallel computing systems
Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that allowed apps to form Industry begins to form beyond a research project Courtesy, Thomas Sterling, Caltech.

23 Direction and concerns
Commodity clusters are evolving to be mainline supers Beowulf do-it-yourself effect is like VAXen … clusters have taken a long time. Will they drive out or undermine centers? Or is computing so complex as to require a center to manage and support complexity? Centers: Data warehouses Community centers e.g. weather Will they drive out a diversity of ideas? Assuming there are some?

24 Grids: Why now?

25 The virtuous cycle of bandwidth supply and demand
Increased Demand Increase Capacity (circuits & bw) Standards Create new service Lower response time Telnet & FTP WWW Audio Video Voice!

26 Information Sciences Institute Microsoft QWest
744Mbps over 5000 km to transmit 14 GB ~ 4e15 bit meters per second 4 Peta Bmps (“peta bumps”) Single Stream tcp/ip throughput Information Sciences Institute Microsoft QWest University of Washington Pacific Northwest Gigapop HSCC (high speed connectivity consortium) DARPA

27 Map of Gray Bell Prize results
Redmond/Seattle, WA single-thread single-stream tcp/ip via 7 hops desktop-to-desktop …Win 2K out of the box performance* New York Arlington, VA San Francisco, CA 5626 km 10 hops

28 Ubiquitous 10 GBps SANs in 5 years
1Gbps Ethernet are reality now. Also FiberChannel ,MyriNet, GigaNet, ServerNet,, ATM,… 10 Gbps x4 WDM deployed now (OC192) 3 Tbps WDM working in lab In 5 years, expect 10x !! 1 GBps 120 MBps (1Gbps) 80 MBps 5 MBps 40 MBps 20 MBps

29 The Promise of SAN/VIA:10x in 2 years http://www.ViArch.org/
Yesterday: 10 MBps (100 Mbps Ethernet) ~20 MBps tcp/ip saturates 2 cpus round-trip latency ~250 µs Now Wires are 10x faster Myrinet, Gbps Ethernet, ServerNet,… Fast user-level communication tcp/ip ~ 100 MBps 10% cpu round-trip latency is 15 us 1.6 Gbps demoed on a WAN

30 SNAP … c1995 Scalable Network And Platforms A View of Computing in We all missed the impact of WWW! This talk / essay portrays our view of computer-server architecture trends. (It is silent on the client cellphones, toasters, and gameboys This is an early draft. We are sending a copy to you in hopes that you’ll read and comment on it. We would like to publish it in several forms: 2 hr video lecture, an kickoff article in a ComputerWorld issue that Gordon is editing, a monograph enlarged to be published within a year. January 1, 1995 Gordon Bell Jim Gray

31 How Will Future Computers Be Built?
Thesis: SNAP: Scalable Networks and Platforms Upsize from desktop to world-scale computer based on a few standard components Because: Moore’s law: exponential progress Standardization & Commoditization Stratification and competition When: Sooner than you think! Massive standardization gives massive use Economic forces are enormous

32 Computing SNAP built entirely from PCs
Legacy mainframes & minicomputers servers & terms Portables Legacy mainframe & minicomputer servers & terminals Wide-area global network Mobile Nets Wide & Local Area Networks for: terminal, PC, workstation, & servers Person servers (PCs) scalable computers built from PCs Person servers (PCs) Centralized & departmental uni- & mP servers (UNIX & NT) Centralized & departmental servers buit from PCs ??? Here's a much more radical scenario, but one that seems very likely to me. There will be very little difference between servers and the person servers or what we mostly associate with clients. This will come because economy of scale is replaced by economy of volume. The largest computer is no longer cost-effective. Scalable computing technology dictates using the highest volume, most cost-effective nodes. This means we build everything, including mainframes and multiprocessor servers from PCs! TC=TV+PC home ... (CATV or ATM or satellite) A space, time (bandwidth), & generation scalable environment

33 SNAP Architecture----------
With this introduction about technology, computing styles, and the chaos and hype around standards and openness, we can look at the Network & Nodes architecture I posit.

34 GB plumbing from the baroque: evolving from the 2 dance-hall model
Mp S --- Pc : | : |——————-- S.fiber ch. — Ms | : |— S.Cluster |— S.WAN — vs. MpPcMs — S.Lan/Cluster/Wan — :

35 Grids: Why? The problem or community dictates a Grid
Economics… thief or scavenger Research funding… that’s where the problems are

36 The Grid… including P2P GRID was/is an exciting concept …
They can/must work within a community, organization, or project. What binds it? “Necessity is the mother of invention.” Taxonomy… interesting vs necessity Cycle scavenging and object evaluation (e.g. QCD, factoring) File distribution/sharing aka IP theft (e.g. Napster, Gnutella) Databases &/or programs and experiments (astronomy, genome, NCAR, CERN) Workbenches: web workflow chem, bio… Single, large problem pipeline… e.g. NASA. Exchanges… many sites operating together Transparent web access aka load balancing Facilities managed PCs operating as cluster!

37 Some observations Clusters are purchased, managed, and used as a single, one room facility. Clusters are the “new” computers. They present unique, interesting, and critical problems… then Grids can exploit them. Clusters & Grids have little to do with one another… Grids use clusters! Clusters should be a good simulation of tomorrow’s Grid. Distributed PCs: Grids or Clusters? Perhaps some clusterable problems can be solved on a Grid… but it’s unlikely. Lack of understanding clusters & variants Socio-, political, eco- wrt to Grid.

38 deja’ vu ARPAnet: c1969 NREN: c1988 <’90 Mainframes, minis, PCs/WSs
To use remote programs & data Got FTP & mail. Machines & people overloaded. NREN: c1988 BW => Faster FTP for images, data Latency => Got Tomorrow => Gbit communication BW, latency <’90 Mainframes, minis, PCs/WSs >’90 very large, dep’t, & personal clusters VAX: c1979 one computer/scientist Beowulf: c1995 one cluster ∑PCs /scientist 1960s batch: opti-use allocate, schedule,$ 2000s GRID: opti-use allocate, schedule, $ (… security, management, etc.)

39 The end

40 Modern scalable switches … also hide a supercomputer
Scale from <1 to 120 Tbps 1 Gbps ethernet switches scale to 10s of Gbps, scaling upward SP2 scales from 1.2

41 CMOS Technology Projections
2001 logic: 0.15 um, 38 Mtr, 1.4 GHz memory: 1.7 Gbits, 1.18 access 2005 logic: 0.10 um, 250 Mtr, 2.0 GHz memory: 17.2 Gbits, 1.45 access 2008 logic: 0.07 um, 500 Mtr, 2.5 GHz memory: 68.7 Gbits, 1.63 access 2011 logic: 0.05 um, 1300 Mtr, 3.0 GHz memory: 275 Gbits, 1.85 access

42 Future Technology Enablers
SOCs: system-on-a-chip GHz processor clock rate VLIW 64-bit processors scientific/engineering application address spaces Gbit DRAMs Micro-disks on a board Optical fiber and wave division multiplexing communications (free space?)

43 The End How can GRIDs become a non- ad hoc computer structure
The End How can GRIDs become a non- ad hoc computer structure? Get yourself an application community!

44 Performance versus time for various microprocessors
Moore's law has implications for speed because transistors are smaller and faster. Increases in microprocessor cache size and parallelism come from having more transistors. The result has been the quadrupling of speed every 3 years. This is fortunate since Amdahl posited that one megabyte of memory was needed for every million instructions per second the processor ran. Speed has increased at 60% per year since the late 1980s to keep up with larger memory chips.

45 Volume drives simple, cost to standard platforms
Stand-alone Desk tops PCs This illustrates the power of volume production. If you assume that power increases linearly with the number of processing elements, then several different platforms can supply power. The most cost-effective is a gang of PCs. Microsoft's scalable server, Tiger, for video on demand demonstrates this. Using workstations is more expensive and a higher speed interconnect adds a marginal amount. The various multiprocessors are more expensive than LAN connected workstations. Multis cost about the same amount as massively parallel computers. The second most cost-effective platforms are the small multiprocessors that use the PC's microprocessor. Thus, smaller, high volume platforms beats larger more specialized multiprocessors.

46

47 In a 5-10 years we can/will have:
more powerful personal computers processing x; multiprocessors-on-a-chip 4x resolution (2K x 2K) displays to impact paper Large, wall-sized and watch-sized displays low cost, storage of one terabyte for personal use adequate networking? PCs now operate at 1 Gbps ubiquitous access = today’s fast LANs Competitive wireless networking One chip, networked platforms e.g. light bulbs, cameras Some well-defined platforms that compete with the PC for mind (time) and market share watch, pocket, body implant, home (media, set-top) Inevitable, continued cyberization… the challenge… interfacing platforms and people.

48 Linus’s & Stahlman’s Law: Linux everywhere aka Torvald Stranglehold
Software is or should be free All source code is “open” Everyone is a tester Everything proceeds a lot faster when everyone works on one code Anyone can support and market the code for any price Zero cost software attracts users! All the developers write code

49 ISTORE Hardware Vision
System-on-a-chip enables computer, memory, without significantly increasing size of disk 5-7 year target: MicroDrive:1.7” x 1.4” x 0.2” : ? 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor 2x height Connected via crossbar switch growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tf

50 The Disk Farm? or a System On a Card?
14" The 500GB disc card An array of discs Can be used as 100 discs 1 striped disc 50 FT discs ....etc LOTS of accesses/second of bandwidth A few disks are replaced by 10s of Gbytes of RAM and a processor to run Apps!!

51 The Network Revolution
Networking folks are finally streamlining LAN case (SAN). Offloading protocol to NIC ½ power point is 8KB Min round trip latency is ~50 µs. 3k ins + .1 ins/byte High-Performance Distributed Objects over a System Area Network Li, L. ; Forin, A. ; Hunt, G. ; Wang, Y. , MSR-TR-98-68


Download ppt "Crays, Clusters, Centers and Grids"

Similar presentations


Ads by Google