Presentation is loading. Please wait.

Presentation is loading. Please wait.

The x86 Server Platform.. Resistance is futile…. Dec 6, 2004.

Similar presentations

Presentation on theme: "The x86 Server Platform.. Resistance is futile…. Dec 6, 2004."— Presentation transcript:

1 The x86 Server Platform.. Resistance is futile…. Dec 6, 2004

2 2 Server shipments – Total vs x86

3 3 Market Share: Servers, United States, 2Q04 United States: Vendor Revenue by Operating System (Millions of Dollars) 2Q033Q034Q031Q042Q04 Market Share 2Q03 Market Share 2Q04 Growth 2Q03- 2Q04 Growth 1Q04- 2Q04 Windows1,534.11,692.31,671.61,645.61, %36.18%8.6%1.2% Unix1,622.61,474.61,554.11,374.21, %31.98%-9.3%7.1% Others , %18.52%3.9%-5.0% Linux %13.32%41.5%10.5% Total4,410.24,487.94,920.74,472.14, % 4.4%2.9% Michael McLaughlin, Market Share: Servers, United States, 2Q04 7 October 2004, Gartner

4 4 x86 Platform CPUs Intel Xeon MP – Gallatin (future is Potomac) Xeon SP/DP – EM64T - Nacona Itanium II MP – Madison (future is Montecito) AMD Opteron

5 5 Gallatin - MP 130 nm 3 GHz 4 MB L3 Cache FSB MHz

6 6 ES7000 – 32 Gallatins

7 7 Nacona – Single Processor with EM64T 90 nm Clock Speed – GHz L3 – 4 MB FSB – 800 Mhz

8 8 Itanium II - Madison 130 nm 9 MB L3 cache 1.6 GHz FSB – 400 MHz


10 10

11 11 STOP Why Multi-Core?.. And while were at it, why Multi-Threading? Its all about the balance of Silicon real estate Compiler technology Cost Power …. to meeting the constant pressure to double performance every 18 months

12 12 Memory Latency vs CPU Speed Microprocessor Operating Frequency (GHz) DRAM Access Frequency (10 -9 sec) -1 Microprocessor on-chip clock Commodity DRAM Production Year

13 13 Processor Architecture When latency Ø and bandwidth we will have the perfect CPU A great deal of innovation has centered around approximating this perfect world CISC CPU Cache RISC EPIC Multi-Threading Multiple Cores

14 14 Complex Instruction Set Computer Hardware implements assembler instructions MULT A, B hardware loads registers, multiplies and stores results Multiple clocks needed for an instruction RAM requirements are relatively small Compilers translate high level languages down to assembler instructions – Von Neumann hardware

15 15 CPU Cache When CPU speeds started to increase, memory latency emerged as a bottleneck CPU caches were used to keep local references close to the CPU For SMP systems, memory banks were more than a clock away It is not uncommon today to find 3 orders of magnitude between the fastest and slowest memory latency

16 16 Reduced Instruction Set Computer Hardware is simplified – fewer transistors are needed for full instruction set RAM requirements are higher to store intermediate results and more code Compilers are more complex Clock speeds increase because instructions are simpler Deterministic, simple instructions allow pipelining

17 17 Pipelining Higher Clock Speeds! 25% busy 100% busy80% busy60% busy40% busy

18 18 Branch Prediction While processing in parallel, branches occur Branch prediction is used to increase the probability that a specific branch will be followed If incorrect, the pipeline is dead and the CPU stalls Statistics 10%-20% of instructions are branches Predictions are incorrect about 10% of the time As the pipeline increases, probability of miss increases and cycles will be discarded 80-deep pipeline / 20% branches / 10% miss => 80% chance of miss and a penalty of 80 cycles

19 19 Itanium II Epic Instruction Set Explicitly Parallel Instruction Computing Compiler can indicate code that can be executed in parallel Both branches are pipelined No lost cycles due to miss-prediction Pipeline can be deeper Complexity continues to move into the compiler

20 20 Multi-Threading

21 21

22 22 Multiple Cores Fabrication sizes continue to diminish The additional real estate has been used to put more and more memory on the die Multi-core technology provides a new way to exploit the additional space The clock rates cannot continue to climb due to the excessive heat P = C * V 2 * f C - switch capacitance V – Supply Voltage f – clock frequency Multiple cores is the next step to providing faster execution times for applications

23 23 (End of 2005?)

24 24

25 25

26 26

27 27

28 28

29 29

30 30 AMD Opteron 800 Series 130 nm Clock Speed – GHz L2 – 1 MB 6.4 GB/s Hypertransport

31 31 Architectural Comparison DDR 144-bit Opteron PCI-X Bridge I/O Hub Other Bridge Hypertransport GB/s Xeon SNC I/O Hub Memory Address Buffer PCI-X Bridge 6.4 GB/s

32 32 Mapping Workloads onto Architecture Consider a dichotomy of workloads: Large Memory Model – This needs a large, single system image and a large amount of coherent memory -Database apps - SQL Server / Oracle -Business Intelligence – Data Warehousing + Analytics -Memory-resident databases -64 bit architectures allow memory addressability above 1 TB Small/Medium Memory Model – This can be cost-effective in workloads that do not require extensive shared memory/state -Stateless Applications and Web Services -Web Servers -Clusters of systems for parallelized applications and grids

33 33 Large Server Vendors Intel Announcement (Nov 19) Otellini said product development, marketing and software efforts (for Itanium) will all now be aimed at "greater than four-way systems". He also said, "The mainframe isn't dead. That's where I'd like to push Itanium over time." The size of the SMP is affected by Intels chip set support for coherent memory OEM Vendors (Unisys, HP, SGI, Fujitsu, IBM) Each has unique chip set to build basic four-ways into large SMP systems IBM has Power5, which is a direct competitor Intel 32-bit and EM674T This could emerge as the flagship product

34 34 Where Are We Going? Since the early CISC computers, we have moved more and more of the complexity out to the compiler to achieve parallelism and fully exploit the silicon real estate The power requirements, along with the smaller fabrication sizes, have pushed the CPU vendors to exploit multiple cores The key to performance for these future machines will be the applications ability to exploit parallelism

Download ppt "The x86 Server Platform.. Resistance is futile…. Dec 6, 2004."

Similar presentations

Ads by Google