Presentation on theme: "The Thoroughly Modern Mainframe"— Presentation transcript:
1The Thoroughly Modern Mainframe Dr. Michael SalsburgNTSMF Users' GroupDec 9, 2002
2Agenda Large Scale WINTEL Servers Scale Up or Scale Out ? Disruptive technology or trend?Scale Up or Scale Out ?A Workload-motivated discussion of SMP and CC-NUMAPCI-Based I/OConsolidationEmerging Technologies
3Windows 2000 will be pervasive server OS Server Industry TrendsSource: IDCIntel will dominateserver chip marketWindows 2000 will be pervasive server OS
8Scale Up or Scale Out?Two of the 3-tiers in current application architectures use scale-out for growthIncrease # of Web serversIncrease # of Application ServersDatabase back end cannot be scaled outScale up is needed for large database applicationsScale out has some inherent down sidesadditional administrative/management attentionMove “headroom” needed for heavy traffic
9SMP / NUMA Workload Discussion As code executes on the processor, memory is referenced. This can be broken into three regionsHigh Locality of ReferenceMemory is immediately re-referenced (> 95%)Working Set – the set of addresses on which the software primarily focusesPersistent Storage – addresses that are stored on physical devices
10Scale Out- SMP or NUMA? Workload Interference When two processes are running on the same system, their memory references will interfere.It is preferable to only interfere at the persistent storage levelInterference at higher levels can decrease cache efficiency and slow down processing, effectively reducing the CPU power
11SMP / NUMA SMP Topology A bank of CPUs share a bank of Memory Each CPU has a local cache to optimize high locality of referenceA cache miss has uniform latency time to get data from memory“Dirty” memory references require fetching the updated memory from another CPU’s cacheThe CPU can “stall” waiting for a memory reference
12SMP / NUMA Workload Discussion Percentages of references based on TPC-C workload profileRelative time units show orders of magnitude between cache hit and persistent storageReference LevelTime UnitsCPU Cache1Main Memory100Remote Cache200Persistent Storage10,000Reference LevelPercentageCPU Cache98.0%Main Memory01.9%Persistent Storage00.1%
13SMP / NUMA NUMA (Non-Uniform Memory Access) Overcome bus congestion and physical fabrication limitations found in a single bus architectureTwo memory latencies – near and farThe NUMA ratio is the ratio of far latency over near latencyOriginally 30, now it is around 3
14SMP / NUMA Hybrid (Unisys ES7000) Another level of cache is introducedMemory accesses can be non-uniform when comparing Next Level Cache hits to memory referencesOvercomes the fabrication/congestion problems of a single bus architecture
16PCI-Based I/O 4 3 Bus or per direction GB/sec Max 2 1 6.4 GB 6.4 GB SP2Scalability Port533MHz4PCI-Express16X, orHyperTransportPCI-X3SP1Bus or per directionGB/sec Max26621X4X8X1331PCI0.8 GB662001 & earlier2002200320042005
18Enterprise-Level Backup / Restore Complete recovery of a 2.5 terabyte database:From tape, the database was recovered in only 88 minutes with a sustained throughput during restore of 2.2 TB/hr.From the hardware snapshot, the same database was recovered in only 11 minutes.Complete backup of a 2.5 terabyte database:Backup to tape took only 68 minutes with minimal impact on online operations and sustained throughput of 2.6 TB/hr.
19Consolidation"[Our] servers were multiplying like rabbits," says Jeff Smith, manager of corporate network services at La-Z-Boy Inc., a Monroe, Mich.-based residential furniture producer that just completed a Windows NT server consolidation project. "Our distributed environment was becoming more and more difficult to manage."Thinning The Server Ranks Computerworld Aug 26, 2002
20ConsolidationHow do you stuff over 130 CPUs’ worth of workload into a 32x CPU system?Veeerrrry carefully……Why are current server farms filled with under-utilized servers?Web Hosting Sites“New web servers are installed when Peak CPU utilization reaches above 35%.”“Speed and reliability are very important to your web site. All of our servers are maintained at less than 15% CPU utilization. This ensures that your web site downloads as fast as possible!”
21Consolidation Responsive Consolidation Which would you prefer – an average queue size of 0.2 on a 1x or a 32x system?
22Consolidation Benefits Simplified Management / AdministrationHigher Utilization (less “headroom”)Less Variability of ServiceLess Overall CPU OverheadLess software licenses
23Emerging CPU Technologies 32x INTEL CPU TPC-C Results Date Publishedtpm-CChipSpeedCacheMemory11/11/2001165,218Pentium III Xeon900MHz2MB64 GB9/9/2002308,620Itanium II1GHz3 MB256 GB11/4/2002203,518Pentium IV Xeon1.6 GHz1 MB
24Itanium II What’s so great about 64 bits? For transaction processing, memory addressing is increased and therefore the amount of main memory increasesThe top 5 TPC-C results were achieved using 64 bit computingTPC-C is a large database application – this is a sweet spot for 64 bit commercial computingBigger is DEFINITELY Better!!