Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Gordon Bell 1 NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell.

Similar presentations

Presentation on theme: "© Gordon Bell 1 NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell."— Presentation transcript:

1 © Gordon Bell 1 NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell

2 © Gordon Bell 2 Position Dual use: Exploit parallelism with in situ nodes & networks Leverage WS & mP industrial HW/SW/app infrastructure! No Teraflop before its time -- its Moore's Law It is possible to help fund computing: Heuristics from federal funding & use (­50 computer systems and 30 years) Stop Duel Use, genetic engineering of State Computers 10+ years: nil pay back, mono use, poor, & still to come plan for apps porting to monos will also be ineffective -- apps must leverage, be cross-platform & self-sustaining let "Challenges" choose apps, not mono use computers "industry" offers better computers & these are jeopardized users must be free to choose their computers, not funders next generation State Computers "approach" industry 10 Tflop... why? Summary recommendations

3 © Gordon Bell 3 Principle computing Environments circa 1994 -- >4 networks to support mainframes, minis, UNIX servers, workstations & PCs Token-ring (gateways, bridges, routers, hubs, etc.) LANs PCs (DOS, Windows, NT) UNIX workstations POTs net for switching terminals ASCII & PC terminals 3270 (&PC) terminals Ethernet (gateways, bridges, routers, hubs, etc.) LANs mainframes minicomputers Novell & NT servers X terminals minicomputers NFS servers >4 Interconnect & comm. stds: POTS & 3270 terms. WAN (comm. stds.) LAN (2 stds.) Clusters (prop.) Late '80s LAN-PC world '80s Unix distributed workstations & servers world 70's mini (prop.) world & '90s UNIX mini world IBM & propritary mainframe world '50s Data comm. worlds mainframes ASCII & PC terminals clusters Compute & dbase uni- & mP servers clusters Wide-area inter-site network UNIX Multiprocessor servers operated as traditional minicomputers

4 © Gordon Bell 4 Computing Environments circa ­ 2000 Local & global data comm world ATM & Local Area Networks for: terminal, PC, workstation, & servers Centralized & departmental uni- & mP servers (UNIX & NT) Legacy mainframes & minicomputers servers & terms Wide-area global ATM network Legacy mainframe & minicomputer servers & terminals Centralized & departmental scalable uni- & mP servers* (NT & UNIX) NT, Windows & UNIX person servers Platforms: X86 PowerPC Sparc etc. Universal high speed data service using ATM or ?? NT, Windows & UNIX person servers* * multicomputers built from multiple simple, servers NFS, database, compute, print, & communication servers also10 - 100 mb/s pt-to-pt Ethernet TC=TV+PC home... (CATV or ATM) ???

5 © Gordon Bell 5 Beyond Dual & Duel Use Technology: Parallelism can & must be free! HPCS, corporate R&D, and technical users must have the goal to design, install and support parallel environments using and leveraging: every in situ workstation & multiprocessor server as part of the local... national network. Parallelism is a capability that all computing environments can & must possess! --not a feature to segment "mono use" computers Parallel applications become a way of computing utilizing existing, zero cost resources -- not subsidy for specialized ad hoc computers Apps follow pervasive computing environments

6 © Gordon Bell 6 Computer genetic engineering & species selection has been ineffective Although Problem x Machine Scalability using SIMD for simulating some physical systems has been demonstrated, given extraordinary resources, the efficacy of larger problems to justify cost-effectiveness has not. Hamming:"The purpose of computing is insight, not numbers." The "demand side" Challenge users have the problems and should be drivers. ARPA's contractors should re-evaluate their research in light of driving needs. Federally funded "Challenge" apps porting should be to multiple platforms including workstations & compatible, multis that support // environments to insure portability and understand main line cost-effectiveness Continued "supply side"programs aimed at designing, purchasing, supporting, sponsoring, & porting of apps to specialized, State Computers, including programs aimed at 10 Tflops, should be re-directed to networked computing. User must be free to choose and buy any computer, including PCs & WSs, WS Clusters, multiprocessor servers, supercomputers, mainframes, and even highly distributed, coarse grain, data parallel, MPP State computers.

7 © Gordon Bell 7 Performance (t) The teraflops Bell Prize

8 © Gordon Bell 8 We get no Teraflop before it's time: it's Moore's Law! Flops = f(t,$), not f(t) technology plans e.g. BAA 94-08 ignores $s! All Flops are not equal (peak announced performance-PAP or real app perf. -RAP) FlopsCMOSPAP*< C x 1.6**(1992-t) x $; C = 128 x 10**6 flops / $30,000 FlopsRAP =FlopsPAP x 0.5 for real apps, 1/2 PAP is a great goal Flopssupers = FlopsCMOS x 0.1; improvement of supers 15-40%/year; higher cost is f(need for profitability, lack of subsidies, volume, SRAM) 92'-94': FlopsPAP/$ =4K; Flopssupers/$=500; Flopsvsp/$ =50 M (1.6G@$25) *Assumes primary & secondary memory size & costs scale with time memory = $50/MB in 1992-1994 violates Moore's Law disks = $1/MB in1993, size must continue to increases at 60% / year When does a Teraflop arrive if only $30 million** is spent on a super? 1 TflopCMOS PAP in 1996 (x7.8) with 1 GFlop nodes!!!; or 1997 if RAP 10 TflopCMOS PAP will be reached in 2001 (x78) or 2002 if RAP How do you get a teraflop earlier? **A $60 - $240 million Ultracomputer reduces the time by 1.5 - 4.5 years.

9 © Gordon Bell 9 Funding Heuristics (50 computers & 30 years of hindsight) 1. Demand side works i.e., we need this product/technology for x; Supply side doesn't work! Field of Dreams": build it and they will come. 2. Direct funding of university research resulting in technology and product prototypes that is carried over to startup a company is the most effective. -- provided the right person & team are backed with have a transfer avenue. a. Forest Baskett > Stanford to fund various projects (SGI, SUN, MIPS) b. Transfer to large companies has not been effective c. Government labs... rare, an accident if something emerges 3. A demanding & tolerant customer or user who "buys" products works best to influence and evolve products (e.g., CDC, Cray, DEC, IBM, SGI, SUN) a. DOE labs have been effective buyers and influencers, "Fernbach policy"; unclear if labs are effective product or apps or process developers b. Universities were effective at influencing computing in timesharing, graphics, workstations, AI workstations, etc. c. ARPA, per se, and its contractors have not demonstrated a need for flops. d. Universities have failed ARPA in defining work that demands HPCS-- hence are unlikely to be very helpful as users in the trek to the teraflop. 4. Direct funding of large scale projects" is risky in outcome, long-term, training, and other effects. ARPAnet established an industry after it escaped BBN!

10 © Gordon Bell 10 Funding Heuristics-2 5. Funding product development, targeted purchases, and other subsidies to establish "State Companies"in a vibrant and overcrowded market is wasteful, likely to be wrong, likely to impede computer development, (e.g. by having to feed an overpopulated industry). Furthermore, it is likely to have a deleterious effect on a healthy industry (e.g. supercomputers). A significantly smaller universe of computing environments is needed. Cray & IBM are given; SGI is probably the most profitable technical; HP/Convex are likely to be a contender, & others (e.g., DEC) are trying.No state co (intel,TMC, Tera) is likely to be profitable & hence self-sustaining. 6. "University-Company collaboration is a new area of government R&D. So far it hasn't worked nor is it likely to, unless the company invests. Appears to be a way to help company fund marginal people and projects. 7. CRADAs or co-operative research and development agreement are very closely allied to direct product development and are equally likely to be ineffective. 8. Direct subsidy of software apps or the porting of apps to one platform, e.g., EMI analysis are a way to keep marginal computers afloat. If government funds apps, they must be ported cross-platform! 9. Encourage the use of computers across the board, but discourage designs from those who have not used or built a successful computer.

11 © Gordon Bell 11 Scalability: The Platform of HPCS & why continued funding is unnecessary Mono use aka MPPs have been, are, and will be doomed The law of scalability Four scalabilities: machine, problem x machine, generation (t), & now spatial How do flops, memory size, efficiency & time vary with problem size? Does insight increase with problem size? What's the nature of problems & work for monos? What about the mapping of problems onto monos? What about the economics of software to support monos? What about all the competitive machines? e.g. workstations, workstation clusters, supers, scalable multis, attached P?

12 © Gordon Bell 12 Special, mono-use MPPs are doomed... no matter how much fedspend! Special because it has non-standard nodes & networks -- with no apps Having not evolved to become mainline -- events have over-taken them. It's special purpose if it's only in Dongarra's Table 3. Flop rate, execution time, and memory size vs problem size shows limited applicability to very large scale problems that must be scaled to cover the inherent, high overhead. Conjecture: a properly used supercomputer will provide greater insight and utility because of the apps and generality -- running more, smaller sized problems with a plan produces more insight The problem domain is limited & now they have to compete with: supers -- do scalars, fine grain, and work and have apps workstations -- do very long grain, are in situ and have apps workstation clusters -- have identical characteristics and have apps low priced ($2 million) multis -- are superior i.e., shorter grain and have apps scalable multiprocessors -- formed from multis are in design stage Mono useful (>>//) -- hence, are illegal because they are not dual use Duel use -- only useful to keep a high budget in tact e.g., 10 TF

13 © Gordon Bell 13 The Law of Massive Parallelism is based on application scale There exists a problem that can be made sufficiently large such that any network of computers can run efficiently given enough memory, searching, & work -- but this problem may be unrelated to no other problem. A... any parallel problem can be scaled to run on an arbitrary network of computers, given enough memory and time Challenge to theoreticians: How well will an algorithm run? Challenge for software: Can package be scalable & portable? Challenge to users: Do larger scale, faster, longer run times, increase problem insight and not just flops? Challenge to HPCC: Is the cost justified? if so let users do it!

14 © Gordon Bell 14 Scalabilities Size scalable computers are designed from a few components, with no bottleneck component. Generation scalable computers can be implemented with the next generation technology with No rewrite/recompile Problem x machine scalability - ability of a problem, algorithm, or program to exist at a range of sizes so that it can be run efficiently on a given, scalable computer. Although large scale problems allow high flops, large probs running longer may not produce more insight. Spatial scalability -- ability of a computer to be scaled over a large physical space to use in situ resources.

15 © Gordon Bell 15 Linpack rate in Gflops vs Matrix Order ???

16 © Gordon Bell 16 Linpack Solution time vs Matrix Order

17 © Gordon Bell 17 GB's Estimate of Parallelism in Engineering & Scientific Applications scalar 60% vector 15% mP (<8) vector 5% >>// 5% embarrassingly or perfectly parallel 15% log (# of apps) granularity & degree of coupling (comp./comm.) new or scaled-up apps dusty decks for supers Supers WSs massive mCs & WSs ----scalable multiprocessors-----

18 © Gordon Bell 18 MPPs are only for unique, very large scale, data parallel apps s Scalar| vector |vector mP| data // | emb. // | gp work | viz | apps 100. 10. 1..1..01 mP WS mP WS s s s mP s WS mP WS mP >>// Application characterization $M WS s mono use

19 © Gordon Bell 19 Applicability of various technical computer alternatives Domain PC|WSMulti servrSC & Mfrm>>//WS Clusters scalar112na1* vector2*2132 vect.mPna213na data //na1211 ep & inf.//12321 gp wrkld311na2 vizualiz'n1na nana1 apps111nafrom WS *Current micros are weak, but improving rapidly such that subsequent >>//s that use them will have no advantage for node vectorization

20 © Gordon Bell 20 Performance using distributed computers depends on problem & machine granularity Berkeley's log(p) model characterizes granularity & needs to be understood, measured, and used Three parameters are given in terms of processing ops: l = latency -- delay time to communicate between apps o = overhead -- time lost transmitting messages g = gap - 1 / message-passing rate (­ bandwidth) - time between messages p = number of processors

21 © Gordon Bell 21 Granularity Nomograph x

22 © Gordon Bell 22 x

23 © Gordon Bell 23 Economics of Packaged Software PlatformCostLeverage# copies MPP>100K1 ­1-10 copies Minis, mainframe10-100K10-100 ­1000s copies also, evolving high performance multiprocessor servers Workstation1-100K1-10K 1-100K copies PC25-50050K-1M 1-10M copies

24 © Gordon Bell 24 Chuck Seitz comments on multicomputers I believe that the commercial, medium grained multicomputers aimed at ultra-supercomputer performance have adopted a relatively unprofitable scaling track, and are doomed to extinction.... they may as Gordon Bell believes be displaced over the next several years by shared memory multiprocessors.... For loosely coupled computations at which they excel, ultra-super multicomputers will, in any case, be more economically implemented as networks of high-performance workstations connected by high-bandwidth, local area networks...

25 © Gordon Bell 25 Convergence to a single architecture with a single address space that uses a distributed, shared memory limited ( > scalable multiprocessors workstations with 1-4 processors >> workstation clusters & scalable multiprocessors workstation clusters >> scalable multiprocessors State Computers built as message passing multicomputers >> scalable multiprocessors

26 © Gordon Bell 26 Convergence to one architecture mPs continue to be the main line

27 © Gordon Bell 27 Re-engineering HPCS Genetic engineering of computers has not produced a healthy strain that lives more than one, 3 year computer generation. Hence no app base can form. No inter-generational, MPPs exist with compatible networks & nodes. All parts of an architecture must scale from generation to generation! An archecture must be designed for at least three, 3 year generations! High price to support a DARPA U. to learn computer design -- the market is only $200 million and R&D is billions-- competition works far better Inevitable movement of standard networks and nodes can or need not be accelerated, these best evolve by a normal market mechanism through driven by users Dual use of Networks & Nodes is the path to widescale parallelism, not weird computers Networking is free via ATM Nodes are free via in situ workstations Apps follow pervasive computing environments Applicability was small and getting smaller very fast with many experienced computer companies entering the market with fine products e.g. Convex/HP, Cray, DEC, IBM, SGI & SUN that are leveraging their R&D, apps, apps, & apps Japan has a strong supercomputer industry. The more we jeprodize ours by mandating use of weird machines that take away from use, the weaker it becomes. MPP won, mainstream vendors have adopted multiple CMOS. Stop funding! environments & apps are needed, but are unlikely because the market is small

28 © Gordon Bell 28 Recommendations to HPCS Goal: By 2000, massive parallelism must exist as a by-products that leverages a widescale national network & workstation/multi HW/SW nodes Dual use not duel use of products and technology or the principle of "elegance" - one part serves more than one function network companies supply networks, node suppliers use ordinary workstations/servers with existing apps will leverage $30 billion x 10**6 R&D Fund high speed, low latency, networks for a ubiquitous service as the base of all forms of interconnections from WANs to supercomputers (in addition, some special networks will exist for small grain probs) Observe heuristics in future federal program funding scenarios... eliminate direct or indirect product development and mono-use computers Fund Challenges who in turn fund purchase, not product development Funding or purchase of apps porting must be driven by Challenges, but builds on binary compatible workstation/server apps to leverage nodes be cross- platform based to benefit multiple vendors & have cross-platform use Review effectiveness of State Computers e.g., need, economics, efficacy Each committee member might visit 2-5 sites using a >>// computer Review // program environments & the efficacy to produce & support apps Eliminate all forms of State Computers & recommend a balanced HPCS program: nodes & networks; based on industrial infrastructure stop funding the development of mono computers, including the 10Tflop it must be acceptable & encouraged to buy any computer for any contract

29 Gratis advice for HPCC* & BS* D. Bailey warns that scientists have almost lost credibility.... Focus on Gigabit NREN with low overhead connections that will enable multicomputers as a by-product Provide many small, scalable computers vs large, centralized Encourage (revert to) & support not so grand challenges Grand Challenges (GCs) need explicit goals & plans -- disciplines fund & manage (demand side)... HPCC will not Fund balanced machines/efforts; stop starting Viet Nams Drop the funding & directed purchase of state computers Revert to university research -> company & product development Review the HPCC & GCs program's output... *High Performance Cash Conscriptor; Big Spenders

30 © Gordon Bell 30 Disclaimer This talk may appear inflammatory... i.e. the speaker may have appeared "to flame". It is not the speaker's intent to make ad hominem attacks on people, organizations, countries, or computers... it just may appear that way.

31 © Gordon Bell 31 Scalability: The Platform of HPCS The law of scalability Three kinds: machine, problem x machine, & generation (t) How do flops, memory size, efficiency & time vary with problem size? What's the nature problems & work for the computers? What about the mapping of problems onto the machines?

Download ppt "© Gordon Bell 1 NRC Review Panel on High Performance Computing 11 March 1994 Gordon Bell."

Similar presentations

Ads by Google