Presented by: Nick Kirchem Feb 13, 2004

Presented by: Nick Kirchem Feb 13, 2004
Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Luiz A. Barroso et al. (Compaq Computer Corporation) Presented by: Nick Kirchem Feb 13, 2004

Target and Motivation Commercial applications (databases, OLTP)
Most important market for high performance servers Data dependent computation (low ILP) Little gained by complex multiple issue out-of-order processors Complexity of current processors Long design times High development costs Better use of transistors?

Project Goals Design a Chip Multiprocessing (CMP) System
Integrate 8 simple processor cores on a single chip Exploit thread-level parallelism instead of ILP High performance, Low Cost Achieve superior performance on commercial workloads Small team, modest investment, short design time

Architecture Overview

Architecture Elements
Simple Processors (500 MHz, In-Order) No I/O capability on chip (separate I/O nodes) Up to 1024 nodes in a system Individual L1 Caches (64KB, 2-way set-assoc) One Logical L2 Cache, interleaved, 1MB Intra-Chip Switch Unidirectional crossbar Transaction based, atomic transfers Bandwidth ~3x memory bandwidth

Intra-Chip Cache Coherence
MESI protocol No Inclusion (1 MB aggregate L1, 1MB L2) But, L2 holds copy of L1 tags and state (no snooping required at L1) L1 filled directly from memory (L2 = victim cache) Coherence handled by L2 controllers Can service request directly, forward to owner L1, forward to protocol engine, obtain from Memory

Inter-Node Coherence Protocol Engines (microprogrammable controllers)
Home: exports local memory Remote: imports remote memory Directory Storage Compute ECC at coarse granularity, use extra bits for directory info  no memory space overhead Directory granularity = 1 node (not individual processor) Interconnect: I/O queues, router (point-to-point, 4 links) No NAKs – avoid deadlock by sufficient buffering, and guarantee forwarded requests can be serviced

Performance Evaluation
OLTP and DSS workloads: TPC-B/D, Oracle database SimOS-Alpha environment Compared: Piranha 500 MHz and Full-Custom 1.25 GHz Next-generation Microprocessor (OOO) 1 GHz Single Chip Evaluation OOO outperforms P1 (individual proc) by 2.3x P8 outperforms OOO by 3x Speedup of P8 over P1 = 7x Multi-chip Configurations Four chips (only 4 CPUs per chip ?!) Results show that Piranha scales better than OOO

Questions/Concerns Would the Piranha design be worthwhile if there were a well-designed SMT processor (with 4 or 8 threads)? Reliability better or worse with multiple chips per processor? Power consumption?

Presented by: Nick Kirchem Feb 13, 2004

Similar presentations

Presentation on theme: "Presented by: Nick Kirchem Feb 13, 2004"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by: Nick Kirchem Feb 13, 2004

Similar presentations

Presentation on theme: "Presented by: Nick Kirchem Feb 13, 2004"— Presentation transcript:

Similar presentations

About project

Feedback