Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 - 10.05.2015 The difference is in the software... From Beowulf to professional turn-key solutions Einar Rustad - VP Business Development.

Similar presentations


Presentation on theme: "Slide 1 - 10.05.2015 The difference is in the software... From Beowulf to professional turn-key solutions Einar Rustad - VP Business Development."— Presentation transcript:

1 Slide The difference is in the software... From Beowulf to professional turn-key solutions Einar Rustad - VP Business Development

2 Slide The difference is in the software... Outline Scali Background Clustering Rationale Scali Products Technology

3 Slide The difference is in the software... History and Facts at a glance History: Based on a development project for High Performance SAR Processing (Military), (concurrently with the Beowulf project at NASA) Spin-off from Kongsberg Gruppen ASA, 1997 Organisation: 30 Employees Head Office in Oslo, branch in Houston, sales offices in Germany, France, UK Main Owners Four Seasons Venture, SND Invest, Kongsberg, Intel Corp., Employees

4 Slide The difference is in the software... Paderborn, PC :PSC2 12 x 8 Torus 192 Processors P-3, 450MHz 86.4GFlops1998:PSC2 12 x 8 Torus 192 Processors P-3, 450MHz 86.4GFlops 1997:PSC1 8 x 4 Torus 64 Processors P-3, 300MHz 19.2GFlops1997:PSC1 8 x 4 Torus 64 Processors P-3, 300MHz 19.2GFlops

5 Slide The difference is in the software... A Major Software Challenge

6 Slide The difference is in the software... Increasing Performance Faster Processors Frequency Instruction Level Parallelism Better Algorithms Compilers Brainpower Parallel Processing Compilers Tools (Profilers, Debuggers) More Brainpower

7 Slide The difference is in the software... Clusters vs SMPs Use of SMPs Common Access to Shared Resources Processors Memory Storage Devices Running Multiple Applications Running Multiple Instances of the Same Application Running Parallel Applications Use of Clusters Common Access to Shared Resources Processors Distributed Memory Storage Devices Running Multiple Applications Running Multiple Instances of the Same Application Running Parallel Applications

8 Slide The difference is in the software... Why SMPs don´t scale CPUCPU CPU CPU I/O Memory This is an SMP CPU CPU CPU Memory CPU Memory Cache Coherent Interconnect L-3 Cache This is NOT an SMP... When CPUs cycle at 1GHz and Memory latency is >100nS, 1% Cache Miss implies <50% CPU Efficiency But, with sufficiently SLOW Processors, the problem may not be that bad after all…

9 Slide The difference is in the software... Why Do We Need SMPs? Small SMPs make Great Nodes for building Clusters! The most Cost-Effective Cluster Node is a Dual Processor SMP High Volume Manufacturing High Utilization of Shared Resources –Bus –Memory –I/O

10 Slide The difference is in the software... Clustering makes Mo(o)re Sense Microprocessor Performance Increases 50-60% per Year 1 year lag:1.0 SHV Unit = 1.6 Proprietary Units 2 year lag:1.0 SHV Unit = 2.6 Proprietary Units Volume Disadvantage When Volume Doubles, Cost is reduced to 90% 1,000 Proprietary Units vs 1,000,000 SHV units=> Proprietary Unit 3 X more Expensive 2 years lag and 1:100 Volume Disadvantage => 7 X Worse Price/Performance

11 UNCLASSIFIED RR Defence(E), Installation Engineering, Bristol François Moyroud February 14, 2001 Hardware acquisition - massive savings ! Proposed HPC platforms EDS platforms very cost effective ! high-cost Savings made: - Non-EDS compute server acquired by IE……….£75k (Alpha/PC cluster with 24 proc.) - EDS solution with 24 proc. (SGI Origin 2000)………..…£300k - Savings……………………..£225k - EDS solution with same computing power (SGI Origin 2000)………………………...£1.2M - Savings……………………£1.1M Fan Systems (Bristol) compute server 1.0

12 Slide The difference is in the software... Software Focal Points High Performance Communication ScaMPI –Record Performance (>380MB/s, <4µs Latency) –Rich set of functions –Debugging options (trace file generation) –Performance tuning with high precision timers –Easy to use Cluster Management Scali Universe –Single System Environment –Remote Operation –Job Scheduling –Monitoring –Alarms

13 Slide The difference is in the software... What Kind of Interconnect? The choice of cluster interconnect depends entirely on the Applications and the size of the Cluster Ethernet - Long Latency - Low Bandwidth –Poor Scalability –Embarrasingly Parallel codes SCI - Ultra-low Latency - High Bandwidth –Good Scalability –All kinds of parallel applications

14 Slide The difference is in the software... Channel Bonding option High Performance Interconnect: Torus Topology IEEE/ANSI std SCI 667MBytes/s/segment/ring Shared Address Space System Interconnects Maintenance and LAN Interconnect: 100Mbit/s or Gigabit Ethernet

15 Slide The difference is in the software... 3-D Torus Topology PSBPSB LC-3LC-3 PCI-bus B-Link LC-3LC-3 SCI Rings Distributed Switching: LC-3LC-3 XYZ

16 Slide The difference is in the software... 2D/3D Torus (D33X) PCI 532MB/s LC-3 PSB66 B-Link 640MB/s 6x 667MB/s SCI links LC-3

17 Slide The difference is in the software... Shared Nothing Data Transfers Application System Host Memory UserData SystemBuffer Network Adapter

18 Slide The difference is in the software... Application System Host Memory User Data System Buffer Network Adapter Network Adapter Host Memory User Data System Buffer Application System Shared Address Space Data Transfers

19 Slide The difference is in the software... A2A Scalability with 66MHz/64bits PCI

20 Slide The difference is in the software... Scalability

21 Slide The difference is in the software... Fluent Benchmarks (Performance Metric is “Jobs per Day”)

22 Slide The difference is in the software... Performance with Fluent A benchmark from the Transportation Industry Problem is partitioned along the principal axes and consists of about 4.5 million cells Dual Intel Xeon 2.0GHz, Intel 860 chip-set, 3GB RDRAM per node Fluent 5.7

23 Slide The difference is in the software... Performance with Fluent (cont’d) NEW YORK CITY, NY -- September 25, Sun Microsystems, Inc. (Nasdaq: SUNW) today announced that the newly announced 900 Mhz, 72-way Sun Fire™ 15K server outperformed a 1 GHz, 128- way IBM system by over 23 percent on the large FL5L1 dataset, using FLUENT™, a leading Computational Fluid Dynamics (CFD) software application. ARMONK, NY, October 26, 2001—The server p690, code- named "Regatta," today set a world record for processing speed on the important Fluent engineering benchmark, providing nearly 80 percent more power than the new Sun Fire 15K, which has twice as many processors and is nearly double the price.

24 Slide The difference is in the software... Performance with Fluent (cont’d) FL5L3 - Turbulent Flow Through a Transition Duct Scali TeraRack, 96 CPUs IBM “Regatta” 32 CPUs Sun SunFire 15K, 72 CPUs

25 Slide The difference is in the software... Performance with Fluent (cont’d) Relative Performance/Price ratio (running FL5L3 benchmark):

26 Slide The difference is in the software... OpenMP and ScaMPI Clustered SMPs have BOTH shared memories and distributed memories MPI is well suited as a parallelization paradigm between cluster nodes OpenMP, POSIX threads, and SMP parallelized libraries are well suited to exploit parallelism within an SMP node Combining the multithread safe ScaMPI with OpenMP provides higher application performance MPI Only: One MPI process per CPU OpenMP and MPI: One MPI process per SMP node Use OMP_NUM_THREADS to control #CPUs used Effects: Fewer processes Fewer messages exchanged Message size increased Expectation: Improved performance

27 Slide The difference is in the software... NPB MG (MultiGrid) Class B OS: Linux Memory:512MB CPU:Dual 800 MHz PIII

28 Slide The difference is in the software... Scalability of MM5 The benchmark was run remotely on the “Upsand” cluster at UCSC by Dr. Ole W. Saastad (Scali) in June 2002

29 Slide The difference is in the software... ClusterEdge™ Universe XE SCI Interconnect HW High Performance Communication Libraries ScaMPI Scali IP Shmem Scali SAN

30 Slide The difference is in the software... Platform Support Operating systems RH 6.2, 7.0, 7.1, 7.2 SuSE 7.0, 7.1 Solaris 8 Architectures: ia32 (PII, PIII, P4, AMD Athlon, Athlon MP) ia64 (Itanium) Alpha (EV6, EV67, EV68) SPARC (UltraSPARC III) x86 chipsets: 440LX, 440BX, 440GX, i840, i850, i860 VIA Apollo Pro 133A Serverworks LE, HE, WS (HE–SL) Itanium chipset: Intel 460GX, HP ZX 1 Athlon chipsets: VIA KX133, VIA KT133, AMD 750, AMD760, AMD760MP, AMD760MPX Alpha chipsets: Tsunami/Typhoon SCI boards: Dolphin D311/D312, D315, D316 Dolphin D33X series

31 Slide The difference is in the software... Universe Architecture Cluster Nodes Control Node (Frontend)GUI SCI Remote WorkstationGUI C S TCP/IP Socket Server daemon Node daemon TCP/IP Socket

32 Slide The difference is in the software... Scali Universe Multiple Cluster Management scila Common Login per Cluster Individual or Grouped Node Management

33 Slide The difference is in the software... Software Configuration Management Nodes are categorized once. From then on, new software is installed by one mouse Click, or with a single command.

34 Slide The difference is in the software... System Monitoring Resource Monitoring CPU Memory Disk Hardware Monitoring Temperature Fan Speed Operator Alarms on selected Parameters at Specified Tresholds

35 Slide The difference is in the software... Monitoring contd. Sam Sam

36 Slide The difference is in the software... Events/Alarms

37 Slide The difference is in the software... OpenPBS integration

38 Slide The difference is in the software... Fault Tolerance 2D Torus topology more routing options XY routing algorithm Node 33 fails (3) Nodes on 33’s ringlets becomes unavailable Cluster fractured with current routing setting

39 Slide The difference is in the software... Fault Tolerance Scali advanced routing algorithm: From the Turn Model family of routing algorithms All nodes but the failed one can be utilised as one big partition

40 Slide The difference is in the software... Max Planck Institute für Plasmaphysik, Germany University of Alberta, Canada University of Manitoba, Canada Etnus Software, USA Oracle Inc., USA University of Florida, USA deCODE Genetics, Iceland Uni-Heidelberg, Germany GMD, Germany Uni-Giessen, Germany Uni-Hannover, Germany Uni-Düsseldorf, Germany Linux NetworX, USA Magmasoft AG, Germany University of Umeå, Sweden University of Linkøping, Sweden PGS Inc., USA US Naval Air, USA Some Reference Customers Spacetec/Tromsø Satellite Station, Norway Norwegian Defense Research Establishment Parallab, Norway Paderborn Parallel Computing Center, Germany Fujitsu Siemens computers, Germany Spacebel, Belgium Aerospatiale, France Fraunhofer Gesellschaft, Germany Lockheed Martin TDS, USA University of Geneva, Switzerland University of Oslo, Norway Uni-C, Denmark Paderborn Parallel Computing Center University of Lund, Sweden University of Aachen, Germany DNV, Norway DaimlerChrysler, Germany AEA Technology, Germany BMW AG, Germany Audi AG, Germany University of New Mexico, USA

41 Slide The difference is in the software... Reference Customers cntd. Rolls Royce Ltd., UK Norsk Hydro, Norway NGU, Norway University of Santa Cruz, USA Jodrell Bank Observatory, UK NTT, Japan CEA, France Ford/Visteon, Germany ABB AG, Germany National Technical University of Athens, Greece Medasys Digital Systems, France PDG Linagora S.A., France Workstations UK, Ltd., England Bull S.A., France The Norwegian Meteorological Institute, Norway Nanco Data AB, Sweden Aspen Systems Inc., USA Atipa Linux Solution Inc., USA California Institute of Technology, USA Compaq Computer Corporation Inc., USA Fermilab, USA Ford Motor Company Inc., USA General Dynamics Inc., USA Intel Corporation Inc., USA IOWA State University, USA Los Alamos National Laboratory, USA Penguin Computing Inc., USA Times N Systems Inc., USA University of Alberta, Canada Manash University, Australia University of Southern Mississippi, Australia Jacusiel Acuna Ltda., Chile University of Copenhagen, Denmark Caton Sistemas Alternativos, Spain Mapcon Geografical Inform, Sweden Fujitsu Software Corporation, USA City Team OY, Finland Falcon Computers, Finland Link Masters Ltd., Holland MIT, USA Paralogic Inc., USA Sandia National Laboratory, USA Sicorp Inc., USA University of Delaware, USA Western Scientific Inc., USA Group of Parallel and Distr. Processing, Brazil

42 Slide The difference is in the software... Conclusions Industrial Users want ISV Applications Single Point of Contact Ease-of-Use Support Lower TCO, not just low Cost Short deployment time Focus on their own areas of expertise, not on being computer companies

43 Slide The difference is in the software... End of Presentation Backup Slides

44 Slide The difference is in the software... SCI vs. Myrinet 2000 All benchmarks conducted by The Numerically Intensive Computing Group at Penn State's Center for Academic Computing, Machines: Dual 1Ghz PIII with ServerWorks HE-SL Myrinet setup: GM and MPI-GM (with everything such as directcopy and shared memory transfers enabled) SCI setup: SSP Observations: Myrinet’s eager protocol was broken, and Scali had to change its copyright on the “bandwidth” program to help Myricom debug their protocol. Hence, only ping-pong numbers are reported.

45 Slide The difference is in the software... SCI vs. M2K: Ping-Pong comparison

46 Slide The difference is in the software... ScaMPI All-to-all & Barrier Machine type: i686 Operating system: Linux smp Memory:512MB CPU Type:2 x Pentium III (Coppermine) CPU Frequency: PCI bridge: Relience Computer CNB20HE (rev01)

47 Slide The difference is in the software... Ping-ping Bandwidth

48 Slide The difference is in the software... Ping-ping Latency

49 Slide The difference is in the software... Itanium vs T3E Bandwidth

50 Slide The difference is in the software... Itanium vs T3E Latency

51 Slide The difference is in the software... Scali MPI - Unique Features Fault Tolerant High Bandwidth Low Latency Multi-Thread safe and hot Simultaneous Inter/- Intra-node operation UNIX command line replicated Exact message size option Manual/debugger mode for selected processes Explicit host specification Job queuing PBS, DQS, LSF, CCS, NQS, Maui Conformance to MPI-1.2 verified through 1665 MPI tests

52 Slide The difference is in the software... MM5 on the Scali TeraRack Target: 90 minutes to complete a 3 and 1 km run for the Oslo area Source: Operational Forecasting on the SGI Origin 3800 and Linux Clusters, Roar Skålin, Norwegian Meteorological Institute CAS 2001, Annecy,

53 Slide The difference is in the software... Application Segments Primary market segments are: Automotive, Aerospace and Maritime Industry –Simulation –Computational Fluid Dynamics –Imaging Oil and Gas –Geophysical/Seismic Data Processing –Data Acquisition –Reservoir Simulation Life Sciences –Genomics –Proteonics Commercial Data Base (Oracle)

54 Slide The difference is in the software... Fluent is ported to SCI/ScaMPI Fluent 5.7 (6.0 to come) has support for ScaMPI. Up to over 2x increase in performance compared with ethernet based clusters. Unparalleled performance for short jobs.

55 Slide The difference is in the software... Fluent queue integration

56 Slide The difference is in the software... Cost of running Fluent jobs

57 Slide The difference is in the software... Partner Model Consulting Cluster systems 1st.line support Scali Software (OEM) 2nd. line support Benefits: Closeness in space and timeCloseness in space and time Excellent supportExcellent support Expert assistance for cluster application development, porting and tuningExpert assistance for cluster application development, porting and tuningBenefits: Closeness in space and timeCloseness in space and time Excellent supportExcellent support Expert assistance for cluster application development, porting and tuningExpert assistance for cluster application development, porting and tuning Example: Code that was assumed“unoptimizable” by the customer was improved by 85% in 7 days ScaliScali ResellerReseller CustomerCustomer

58 Slide The difference is in the software... Partners and ISVs

59 Slide The difference is in the software... Quote from the beowulf mailing list 26 Sep 2000: Q Just curious if anyone has any hands-on experience with Dolphin's "Wulfkit" networking product and software? Like, how easy is it to install/maintain/get up- and-running? (from Walt Dabell A Yep, I have 20 dual processor (PIII 550's) connected with Wulfkits in a 5x4 matrix.) In terms of installation and maintenance, it's a breeze. Once all of the cards are installed they provide a script that installs all of the software on all the nodes, and configures all of the cards, installs the licensing and tests the whole system to see that it works. Believe it or not, this whole procedure only takes about 15 minutes on my 20 node cluster. (Reply from "Jerome, Ron" )

60 Slide The difference is in the software... Seismic Data Processing “Scali has successfully integrated and delivered in record time complex clusters for both onshore and offshore processing of seismic data to PGS. The system requirements included different communication and connections to IBM 3590 tape drives. Scali´s middleware has proven to be very robust and a key contributor to the overall success of the deployments. In an organization that has both onshore and offshore facilities spanning all continents. Scali Universe, a comprehensive system for secure remote maintenance on offshore systems and vice versa. I am convinced that Scali´s products and expertise were of vital importance to the overall success of these deployments.” Chris Usher, President, PGS Data Processing

61 Slide The difference is in the software... Traditional Shared Nothing Communication Architecture ApplSocket TCP IPMACApplSocket TCP IPMAC Machine A Machine B Protocol Entities embedded in Packets

62 Slide The difference is in the software... SCI - Shared Address Space Architecture CPU UserInstructions CPU UserInstructions Memory Operations in a Packet Swicthed Network User Mem

63 Slide The difference is in the software... Moore´s Law 1963:Device Density will Double every 12 months 1975: Device Density will Double every 18 months Gordon Moore, Founder of Intel:

64 Slide The difference is in the software... Why Clustering Scaling of Resources Sharing of Resources Best Price/Performance Ratio (PPR) PPR is Constant with Growing System Size Standard High Volume (SHV) components Excellent TCO Flexibility High Availability Fault Resilience

65 Slide The difference is in the software... Technology Improvement Rates

66 Slide The difference is in the software... Application Areas ASP´s Interconnect PC Technology Linux OS Scali Software Basic Technologies Scalable Systems HPC Servers ISP´s E-commerce/ Databases

67 Slide The difference is in the software... Myth #1 “You can’t run real applications on PCs, because they don’t have enough memory bandwidth.”

68 Slide The difference is in the software... 2D double complex FFT

69 Slide The difference is in the software... Myth #2 “I agree that the second CPU on a PC doesn’t cost much, but I can’t use it because the PC has too little memory bandwidth.”

70 Slide The difference is in the software... Dual CPU speedup (double complex FFT)

71 Slide The difference is in the software... Myth #3 “This code can’t be optimized any further. This code has been tweaked on for years and we only achieved a few percent improvement.”

72 Slide The difference is in the software... PSTM-kernel Optimizations Dual Pentium III Coppermine 733MHz (256KB L2 cache), 256MB registered PC100 SDRAM,IDE/33 QUANTUM FIREBALLlct08 17GB. Mod1: Selection of optimal compiler and compiler switches Mod2: Rewrite inner loop to take advantage of pre-fetch and non-cache-polluting stores Mod3: Use SSE (Streaming SIMD Extension). Not yet performed.

73 Slide The difference is in the software... Scali Software Platform - SSP Scali Universe Cluster Management Scali High Performance Communication Libraries ScaMPI Scali IP Scali SAN Shmem Scali High Performance Computing Libraries

74 Slide The difference is in the software... Scali Universe Interconnect Independent Basic Cluster Management Functionality Platform Support Ia32, Ia64, AMD, Alpha, Sparc Linux, Solaris GUI or Command Line control Remote Secure Operation using encrypted communication protocol

75 Slide The difference is in the software... Scali Universe XE Interconnect Independent Platform Support Ia32, Ia64, AMD, Alpha, Sparc Linux, Solaris Full set of Cluster Management Tools GUI or Command Line control Remote Secure Operation using encrypted communication protocol

76 Slide The difference is in the software... Value Proposition ApplicationsApplications OSOS HardwareHardware Scali Software Scali Expertise Products Scali Universe™ Cluster Management High Performance Communication Libraries –ScaMPI – high-performance MPI implementation Latency: 3.6 µs, Bandwidth 385 MBytes/s Core Values –Computer Architecture –Processor and Communication Hardware –Software Design and Development –Parallelization –System integration and packaging

77 Slide The difference is in the software... Scali Products Universe Cluster Management Independent of Cluster Size Independent of Physical Interconnect Supports Intel and Alpha platforms with Linux Supports Sparc platform with Solaris High Performance Libraries for SCI Interconnects From 4 to 256 nodes Supports Intel and Alpha platforms with Linux Supports Sparc platform with Solaris

78 Slide The difference is in the software... Cluster Nodes Disk or Diskless Nodes Disk: OS-Image, Swap, Scratch Diskless: Booting from Network, OS-Image residing on Front-End File Systems Distributed over NFS PVFS over IP

79 Slide The difference is in the software... Execution Environment System Image Single System Image at Application Level Each Node runs one instance of the OS Job Scheduling Open PBS Integration in Scali Universe –Also hooks for DQS, LSF, CCS, NQS, Maui All Remote Operational

80 Slide The difference is in the software... Hardware Monitoring

81 Slide The difference is in the software... Top-level Management Menus

82 Slide The difference is in the software... Installation Scali “Nebula” Front-end Network Configuration Starting Scali OS-Install Generates “savestate” for SSP-Install SSP-Installer Requires a few parameters from the operator Automatically Installs All Scali Packages

83 Slide The difference is in the software... Diagnostics Native Cluster Tests (ScaDiag) Memory CPUs Temperature (“CPU-Burn”) SCI Torus Interconnect Link Diagnostics


Download ppt "Slide 1 - 10.05.2015 The difference is in the software... From Beowulf to professional turn-key solutions Einar Rustad - VP Business Development."

Similar presentations


Ads by Google