Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006.

Similar presentations


Presentation on theme: "© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006."— Presentation transcript:

1 © 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006

2 2 © 2005 IBM Corporation Agenda Hardware Software Documentation

3 3 © 2005 IBM Corporation Hardware Overview Processors: Nodes: Clusters:

4 4 © 2005 IBM Corporation Product Naming New NameOld NameMarketProcessor iSeriesAS400CommercialRS64 pSeries RS600 SP SP2 Technical POWER3 POWER4 POWER5 xSeries IA-32 IA-64 Server Xeon AMD zSeriesES9000MainframeRS64

5 5 © 2005 IBM Corporation Processor Progression ProcessorYearsClock RateFeature POWER2 1990 - 199420 – 60 MHzRISC P2SC 1994 - 199860 – 150 MHzBandwidth POWER3 1998 – 2002200 – 450 MHzSingle Chip POWER4 2001 – 20051 – 1.9 GHzDual Core POWER5 2004 -1.5 – 1.9 GHzMulti-Thread

6 6 © 2005 IBM Corporation POWER5 Systems POWER5 processors Single and Dual processor chips Modules Dual Chip Modules (DCM) Multi Chip Modules (MCM) Nodes Multiple modules p5-575 p5-595 Cluster Multiple nodes Connected with High Speed Switch (HPS)

7 7 © 2005 IBM Corporation Systems (“Nodes”) ModelProcessors Clock Rate (GHz) Memory (x 2^30 byte) p5-59516-64 1.65, 1.9 2000 p5-590 8-321.65, 1.91000 p5-575 81,5, 1.9256 p5-570 2-161.65, 1.9512 p5-550 2-41.6564 p5-520 21.6532 p5-510 1,21.651 - 32

8 8 © 2005 IBM Corporation POWER5 Processor Systems MCM Chip Processor DCM p5-575 p5-595 Cluster

9 9 © 2005 IBM Corporation Cluster 1600 Multi Processor Nodes Physical View Logical View Network, Disk System

10 10 © 2005 IBM Corporation Local System Name IBM p5-575 nodes 1.9 GHz POWER5 processors Single processor chips 8 processors per node HPS interconnect “575” distinction: Dual Chip Module (DCM) 8 DCMs One or two processors per chip Single Core (SC) Dual Core (DC) “595” distinction: Multi Chip Module (MCM) construction 8 MCMs

11 11 © 2005 IBM Corporation POWER5 Processors Multi-processor chip High clock rate: Multiple GHz Three cache levels Bandwidth Latency hiding Shared Memory Large memory size

12 12 © 2005 IBM Corporation POWER5 Features Private L1 cache Shared L2 cache Shared L3 cache Interleaved memory Hardware Prefetch Multiple Page Size support

13 13 © 2005 IBM Corporation Processor Characteristics High frequency clocks Deep pipelines High asymptotic rates Superscalar Speculative out-of-order instructions Up to 8 outstanding cache line misses Large number of instructions in flight Branch prediction Hardware Prefetching

14 14 © 2005 IBM Corporation Processor Features POWER4POWER5 Clock 1.0 – 1.9 GHz1.5 – 1.9 - … GHz Caches Three levels L3 Speed 1/3 clock frequency½ clock frequency Virtualization Up to 32 partitionsUp to 254 partitions Partitions Unit processorFractional Power Mang. StaticDynamic Thread Execution Single ThreadMulti Threading Memory Store Single BufferDouble Buffer Renaming Registers GP: 72 FP: 80 GP: 120 FP: 120

15 15 © 2005 IBM Corporation Caches and Memory POWER4POWER5 L1 Cache Data: 32 kbyte Instruction: 64 kbyte 2-way Assoc., FIFO Data: 32 kbyte Instruction: 64 kbyte 4-way Assoc., LRU L2 Cache 1.5 Mbyte 8-way Assoc., FIFO 1.9 Mbyte 10-way Assoc., LRU L3 Cache 32 Mbyte 8-way Assoc., LRU 120 Cycles 36 Mbyte 12-way Assoc., LRU ~80 Cycles Memory Bandwidth 4 Gbyte/s / Chip16 Gbyte/s / Chip

16 16 © 2005 IBM Corporation POWER4+POWER5 Frequency (GHz) 1.71.9 L2 Latency (Cycles) 12 L3 Latency (Cycles) 12080 Memory Latency (Cycles) 351220 Copy Bandwidth 4 proc. (Gbyte/s) 818 Linpack Rate N=1000 (Gflop/s) 3.95.6 SPECint_base2000 10771398 SPECfp_base2000 15982576 POWER4 – POWER5 Comparison

17 17 © 2005 IBM Corporation POWER5 Design: Summary More gates 170 million  260 million Enhancements Increased cache associativity Increased number of rename registers Reduced L3 and cache latency New features Simultaneous Multi Threading Dynamic power management

18 18 © 2005 IBM Corporation Processor Systems (Nodes) Multiple processors Multiple modules Various construction formats Multi Chip Modules Dual Chip Modules Shared memory

19 19 © 2005 IBM Corporation Multi Chip and Dual Chip Modules Multi Chip Module (MCM) p5-590 p5-595 Chip POWER5 Processor Dual Chip Module (MCM) p5-570 p5-575

20 20 © 2005 IBM Corporation Dual Chip Module Each Module: 1 processor chip 1 L3 cache 1 Memory card Each Processor Chip 2 processors L1 caches Registers Functional units 1 L2 cache 1 path to memory 36 Mbyte L3 Memory

21 21 © 2005 IBM Corporation Multi Chip Module Each Module: 4 processor chips 4 L3 cache chips 2 Memory cards Each Processor Chip 2 processors L1 caches Registers Functional units 1 L2 cache 1 path to memory Memory

22 22 © 2005 IBM Corporation POWER5 Multi Chip Module Four POWER5 chips Four L3 cache chips 95mm  95mm 4,491 signal I/Os 89 layers of metal

23 23 © 2005 IBM Corporation POWER5 Dual Chip Module One POWER5 chip Single or Dual Core One L3 cache chips

24 24 © 2005 IBM Corporation L3 Modifications to POWER4 System Structure PP L2 Memory L3 Fab Ctl PP L2 L3 Memory L3 Fab Ctl L3 Mem Ctl

25 25 © 2005 IBM Corporation Switch Technology Internal network In lieu of GigEthernet, Myrinet, Quadrics, etc. Fourth generation HPS Switch (POWER2 generation) SP Switch (POWER2 -> POWER3) SP Switch 2 (POWER3 -> POWER4) HPS (POWER4 -> POWER5) Multiple links per node Match number of links to number of processors

26 26 © 2005 IBM Corporation High Performance Switch (HPS) Also Known As “Federation” Follow on to SP Switch2 Also known as “Colony” Specifications: 2 Gbyte/s (bidirectional) 5 microsecond latency Configuration: Up to four adaptors per node 2 links per adaptor 16 Gbyte/s per node

27 27 © 2005 IBM Corporation HPS Specifications Latency [microsec.] Bandwidth, single [Mbyte/s] Bandwidth, multiple [Mbyte/s] SP Switch 215350550 HPS518001930

28 28 © 2005 IBM Corporation Software Overview Operating System AIX Compilers C C++ Fortran Batch Queue LoadLeveler (IBM) LSF (Platform) PBS Gridware

29 29 © 2005 IBM Corporation AIX Current Version: AIX 5.3 Processors: POWER3 POWER4 POWER5 Linux Affinity Logical PARtitions (LPAR) Nodes Operating system Memory Network connections Kernel Address Size: 64-bit 32-bit

30 30 © 2005 IBM Corporation Linux on POWER Native Linux, SuSE7  SuSE8 Rpm's and package managers Cluster Systems Manager 64-bit kernel 32/64-bit applications support (SuSE8) CompilerUser Name CXlc C++xlC Fortranxlf

31 31 © 2005 IBM Corporation Compilers C and C++ Visual Age C and C++ Professional for AIX Versions 6, 7, 8 ANSI C C++ Compiler names: xlc xlC Fortran XL Fortran for AIX Versions 8, 9, 10 Fortran 77 Fortran 90 Compiler names: xlf77 xlf90

32 32 © 2005 IBM Corporation Compiler Names CompilerUser Name Fortran 77xlf77 Fortran 90xlf90 Cxlc C++xlC MPI compilempxlf, mpcc Reentrantxlf_r, xlc_r AIX uses different compiler names to perform some tasks which are handled by compiler flags on most other systems

33 33 © 2005 IBM Corporation Compiler Usage LanguageCommandFeatureExtension ANSI C xlc xlc_r ANSI Thread safe.c Extended C ccPre-ANSI.c MPI, C mpxlcMPI.c C++ xlC xlC_rThread safe.C.cc.cpp Fortran 77 xlf xlf_rThread safe.f Fortran 90 xlf90 xlf90_rThread safe.f MPI fortran mpxlfMPI.f

34 34 © 2005 IBM Corporation User Limits Set by the system administrator Ulimit: C or K shell built-in Sets or reports resource limits Limits are defined in /etc/security/limits Sizes are in 512 byte blocks Times are in seconds $ ulimit -a

35 35 © 2005 IBM Corporation Ulimit Defaults Value LimitDefinitionDefaultTypical fsizeFile Size2097151Unlimited (-1) coreCore File Size2097151Unlimited (-1) cpuPer Process limit-1 (unlimited)Unlimited (-1) dataData Segment Size262144Unlimited (-1) stackStack Segment Size65536*Unlimited (-1) No. filesFile Descriptor Limit2000 * 64-bit address mode

36 36 © 2005 IBM Corporation Other Defaults Thread control /etc/environment AIXTHREAD_SCOPE=S AIXTHREAD_MNRATIO=1:1 AIXTHREAD_COND_DEBUG=OFF AIXTHREAD_GUARDPAGES=4 AIXTHREAD_MUTEX_DEBUG=OFF AIXTHREAD_RWLOCK_DEBUG=OFF

37 37 © 2005 IBM Corporation Batch Queuing Compile on any AIX node Use –qarch=pwr5 Submit job with available batch utility Use appropriate queue name Available queuing systems: LoadLeveler PBS Gridware LSF

38 38 © 2005 IBM Corporation Cluster Layout Compile And Submit Node Node 0Node 1 Network Node 2

39 39 © 2005 IBM Corporation Documentation Software: www.software.ibm.com Products A-Z X -> xl C, xl C/C++, xl Fortran www.servers.ibm.com/aix Compilers /usr/vac/doc /usr/vacpp/doc /usr/lpp/xlf/doc Redbooks: www.redbooks.ibm.com/ IBM eServer p5 590 and 595 System Handbook

40 40 © 2005 IBM Corporation Documentation AIX Commands Reference AIX command: /usr/sbin/infocenter /opt/ibm_help/help_start.sh http://www.unet.univie.ac.at/aix/aixgen/wbinfnav/ai xcmdsrefbooks.htmhttp://www.unet.univie.ac.at/aix/aixgen/wbinfnav/ai xcmdsrefbooks.htm Google search: “AIX Commands Reference”

41 41 © 2005 IBM Corporation Documentation Library Google Search: AIX 5L documentation Library http://publibn.boulder.ibm.com/cgi-bin/ds_rslt

42 42 © 2005 IBM Corporation Summary: Architecture System architecture Processors Nodes Cluster Processors POWER5 Three levels of cache Nodes: Eight processor p5-575 Cluster: 14 p5-575 nodes HPS interconnect


Download ppt "© 2005 IBM Essential Overview Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006."

Similar presentations


Ads by Google