Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Similar presentations


Presentation on theme: "Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of."— Presentation transcript:

1

2 Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of Wisconsin - Madison

3 Outline TPC-W Benchmarks in JavaTPC-W Benchmarks in Java IBM RS6000 S80 Enterprise ServerIBM RS6000 S80 Enterprise Server Hardware Counters in S80Hardware Counters in S80 Experiment ResultsExperiment Results Problems and Future workProblems and Future work ConclusionsConclusions

4 TPC-W benchmark TPC-W is the TPC Council’s newest benchmark for Transactional Web Environments (E-Commerce) Modeling an online book store similar to www.amazon.com –Browsing 95% browsing, 5% transactions –Shopping 80% browsing, 20% transactions –Ordering 50% browsing, 50% transactions Transactional Web Environments: –Web serving of static and dynamic content –Online Transaction processing (OLTP) –Some decision support (DSS)

5 IBM RS6000 S80 Enterprise Server 6 RS64-III Pulsar processors (451MHz)6 RS64-III Pulsar processors (451MHz) –4-issue in-order Super Scalar microprocessor with on chip 128KB L1 I-Cache, 128KB L1 D-Cache and 8MB L2 Cache. –No Branch Prediction, Aggressive early branch resolution –Coarse grain 2-context Multithreading. SMP system. Snooping bus inter-processor connection.SMP system. Snooping bus inter-processor connection. 8GB main memory, Huge disk volumes. And very high bandwidth IO systems.8GB main memory, Huge disk volumes. And very high bandwidth IO systems.

6 System Configuration: RS64-III processor 32bits Control word RS64-III processor 32bits Control word AIX kernel Kernel Extension Performance Monitor Snooping bus Java Virtual Machine Emulated Browser Java Virtual Machine DB2 DBMS Processes JDBChttp SUN Java Web Server2.0 Java Servlet

7 Hardware Counters in S80 3 levels of objects can be counted with their own counting contexts:3 levels of objects can be counted with their own counting contexts: - System level counting, whole system level context - Process / Process group, process level context - Individual thread, thread level context. 3 major components3 major components - 8 Built-in hardware counters in each RS64-III processor. - Kernel extension to AIX 4.3 - Performance Monitor API in the next release of AIX. Some Problems with current version of PM API. - Cannot count for individual processor. - Some Listed events are not available.

8 Hardware Counters in S80: Countable Events Processor eventsProcessor events - execution cycles and the number of instructions executed. Instruction mix eventsInstruction mix events - Pipeline M, S, B and S instructions executed. Branch eventsBranch events - Conditional branch T/NT events, unconditional branches, zero cycle branches. Address Translation eventsAddress Translation events - TLB/SLB and ERAT/IERAT miss and duration events. Cache eventsCache events - Cache misses and latencies for each of the L1 I-Cache L1 D-Cache L2 Cache Bus and multi-processor bus snooping eventsBus and multi-processor bus snooping events - bus utilization. multi-processor bus snooping events

9 Results: CPI for RBE, Java Web Server and DB2

10 Results: CPU Cycle Counts Cycle Counts

11 Results: Instruction Dispatch Dispatch Percentage % Browsing MixBrowsing Mix Dispatch Percentage %

12 Results: Instruction Dispatch Shopping MixShopping Mix Dispatch Percentage %

13 Results: Instruction Dispatch Ordering MixOrdering Mix Dispatch Percentage %

14 Results: Instruction Mix Browsing MixBrowsing Mix Instruction type Percentage %

15 Results: Instruction Mix Shopping MixShopping Mix Instruction type Percentage %

16 Results: Instruction Mix Ordering MixOrdering Mix Instruction type Percentage %

17 Results: Branch Behavior Shopping Mix Browsing Mix 1.Branches conditional taken 2.Branch to link register taken 3.Branch to counter taken 4.Absolute branches 5.Branches unconditional 6.Branches conditional not taken 7.Zero cycle branch not taken 8.Zero cycle branch taken

18 Results: Branch Behavior Ordering Mix 1.Branches conditional taken 2.Branch to link register taken 3.Branch to counter taken 4.Absolute branches 5.Branches unconditional 6.Branches conditional not taken 7.Zero cycle branch not taken 8.Zero cycle branch taken

19 Results: Cache Behavior 1.L1 I cache miss duration latency 2.L1 D cache miss duration latency Browsing Mix Shopping Mix Latency/cycles

20 Results: Cache Behavior 1.L1 I cache miss duration latency 2.L1 D cache miss duration latency Latency/cycles Shopping Mix Ordering Mix

21 Results: Cache Behavior 1.L2 miss count per instruction 2.L1 I cache miss count per instruction 3.L1 D cache miss count per instruction Ordering Mix Shopping Mix Browsing Mix Count

22 Problems & Future Works Problems: - Large Dataset - Network and Server end software are the bottleneck? - Hardware counters vs. Simulations. Future works: - Measurement of other transactional processing and web serving benchmarks for comparison. - More architectural characterizations such as multithreaded processors, multiprocessor scaling and multiprocessor snooping bus issues.

23 Conclusions Server end Software is critical for high-end servers - Network and Server end software are the bottleneck - This is true both for high end commercial server systems and other high performance parallel computers designed for scientific or engineering computing. Preliminary performance characterization shows: - CPU utilization is highly dependent upon the application workloads. - High dispatching mechanism on RS64III appears less efficiently used. - Branch instructions are second to load and store instructions. - L2 cache miss rate is unreasonably low and L1 D-cache miss latency is considerable larger than that of L1 I-cache.

24 Acknowledgement Trey Cain for setting up Java TPC-W and discussion Morris Marden for helping quiet the machine and discussion Prof. Mikko Lipasti for guidance and support Everyone helped us


Download ppt "Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of."

Similar presentations


Ads by Google