The Thoroughly Modern Mainframe

The Thoroughly Modern Mainframe
Dr. Michael Salsburg NTSMF Users' Group Dec 9, 2002

Agenda Large Scale WINTEL Servers Scale Up or Scale Out ?
Disruptive technology or trend? Scale Up or Scale Out ? A Workload-motivated discussion of SMP and CC-NUMA PCI-Based I/O Consolidation Emerging Technologies

Windows 2000 will be pervasive server OS
Server Industry Trends Source: IDC Intel will dominate server chip market Windows 2000 will be pervasive server OS

The x440 Competition Gartner Oct 2002
Unisys ES7000 Aries 230 Unisys ES7000 Orion Egenera Blade Frame HP ProLiant DL760 HP rp8400 IBM eServer x440 Processors Supported Intel Xeon 1.4, 1.6 GHz Pentium 700, 900 MHz HP PA-8700 750, 875 MHz 1.4, 1.5, 1.6 GHz Max Procs 16 32 96 8 8 ,16 by 2002 Max Mem 32G 64G 288G 16G Max PCI Slots 48 12 10 72

A Comparison using Moore’s Law
Comparison of CPU Speeds / tpcM for 4x cpu WINTEL systems

TPC-C Top 10

Scale Up or Scale Out? Two of the 3-tiers in current application architectures use scale-out for growth Increase # of Web servers Increase # of Application Servers Database back end cannot be scaled out Scale up is needed for large database applications Scale out has some inherent down sides additional administrative/management attention Move “headroom” needed for heavy traffic

SMP / NUMA Workload Discussion
As code executes on the processor, memory is referenced. This can be broken into three regions High Locality of Reference Memory is immediately re-referenced (> 95%) Working Set – the set of addresses on which the software primarily focuses Persistent Storage – addresses that are stored on physical devices

Scale Out- SMP or NUMA? Workload Interference
When two processes are running on the same system, their memory references will interfere. It is preferable to only interfere at the persistent storage level Interference at higher levels can decrease cache efficiency and slow down processing, effectively reducing the CPU power

SMP / NUMA SMP Topology A bank of CPUs share a bank of Memory
Each CPU has a local cache to optimize high locality of reference A cache miss has uniform latency time to get data from memory “Dirty” memory references require fetching the updated memory from another CPU’s cache The CPU can “stall” waiting for a memory reference

SMP / NUMA Workload Discussion
Percentages of references based on TPC-C workload profile Relative time units show orders of magnitude between cache hit and persistent storage Reference Level Time Units CPU Cache 1 Main Memory 100 Remote Cache 200 Persistent Storage 10,000 Reference Level Percentage CPU Cache 98.0% Main Memory 01.9% Persistent Storage 00.1%

SMP / NUMA NUMA (Non-Uniform Memory Access)
Overcome bus congestion and physical fabrication limitations found in a single bus architecture Two memory latencies – near and far The NUMA ratio is the ratio of far latency over near latency Originally 30, now it is around 3

SMP / NUMA Hybrid (Unisys ES7000)
Another level of cache is introduced Memory accesses can be non-uniform when comparing Next Level Cache hits to memory references Overcomes the fabrication/congestion problems of a single bus architecture

PCI-Based I/O Cellular MultiProcessing (CMP) Architecture

PCI-Based I/O 4 3 Bus or per direction GB/sec Max 2 1 6.4 GB 6.4 GB
SP2 Scalability Port 533MHz 4 PCI-Express 16X, or HyperTransport PCI-X 3 SP1 Bus or per direction GB/sec Max 266 2 1X 4X 8X 133 1 PCI 0.8 GB 66 2001 & earlier 2002 2003 2004 2005

Enterprise-Level Backup / Restore

Enterprise-Level Backup / Restore
Complete recovery of a 2.5 terabyte database: From tape, the database was recovered in only 88 minutes with a sustained throughput during restore of 2.2 TB/hr. From the hardware snapshot, the same database was recovered in only 11 minutes. Complete backup of a 2.5 terabyte database: Backup to tape took only 68 minutes with minimal impact on online operations and sustained throughput of 2.6 TB/hr.

Consolidation "[Our] servers were multiplying like rabbits," says Jeff Smith, manager of corporate network services at La-Z-Boy Inc., a Monroe, Mich.-based residential furniture producer that just completed a Windows NT server consolidation project. "Our distributed environment was becoming more and more difficult to manage." Thinning The Server Ranks Computerworld Aug 26, 2002

Consolidation How do you stuff over 130 CPUs’ worth of workload into a 32x CPU system? Veeerrrry carefully…… Why are current server farms filled with under-utilized servers? Web Hosting Sites “New web servers are installed when Peak CPU utilization reaches above 35%.” “Speed and reliability are very important to your web site. All of our servers are maintained at less than 15% CPU utilization. This ensures that your web site downloads as fast as possible!”

Consolidation Responsive Consolidation
Which would you prefer – an average queue size of 0.2 on a 1x or a 32x system?

Consolidation Benefits
Simplified Management / Administration Higher Utilization (less “headroom”) Less Variability of Service Less Overall CPU Overhead Less software licenses

Emerging CPU Technologies 32x INTEL CPU TPC-C Results
Date Published tpm-C Chip Speed Cache Memory 11/11/2001 165,218 Pentium III Xeon 900MHz 2MB 64 GB 9/9/2002 308,620 Itanium II 1GHz 3 MB 256 GB 11/4/2002 203,518 Pentium IV Xeon 1.6 GHz 1 MB

Itanium II What’s so great about 64 bits?
For transaction processing, memory addressing is increased and therefore the amount of main memory increases The top 5 TPC-C results were achieved using 64 bit computing TPC-C is a large database application – this is a sweet spot for 64 bit commercial computing Bigger is DEFINITELY Better!!

The Thoroughly Modern Mainframe

Similar presentations

Presentation on theme: "The Thoroughly Modern Mainframe"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Thoroughly Modern Mainframe

Similar presentations

Presentation on theme: "The Thoroughly Modern Mainframe"— Presentation transcript:

Similar presentations

About project

Feedback