Architecture and Design of AlphaServer GS320

Architecture and Design of AlphaServer GS320
Presented by Vijeta Johri 02/13/04

Motivation Huge Demand for small and medium scale multiprocessors compared to larger servers Scarcity of scalable applications and OS Achieving high reliability and fault-containment is tough GS320 targeted at medium-scale multiprocessing Take advantage of smaller size of system Eliminate inefficiencies of directory-based protocol

AlphaServer GS320 Architecture
Hierarchical Shared Memory multiprocessor 8 QBBs 10 port local switch 4 Alpha processors Separate on chip I & D cache & external cache 4 memory modules 1-8GB SDRAM memory IO interface supports 8 PCI buses

AlphaServer GS320 Architecture
QBB’s (contd.) DIR (directory) 14-bit entry per 64 byte memory line 6 bit owner field 8 bit coarse vector having granularity of QBB Dirty sharing supported DTAG Functions as centralized full map directory Maintains coherence within QBB TTT 48 entry associative table Global Switch Supports virtual lanes & multicast

Cache Coherence Goals Invalidation based protocol 4 request types
Make the common transaction efficient Exploit small size and interconnect ordering properties Protocol messages Resource occupancy Invalidation based protocol 4 request types Read Read-exclusive Exclusive Exclusive-without-data Reply-forwarding from remote owners Eager exclusive replies

Cache Coherence Handle corner cases without NAKs/retries and blocking at home directory Guarantees owner node always services a forwarded request all transactions complete with at most 1 message to home Directory controller implemented as simple pipelined state machine Eliminates livelock, starvation Virtual lanes Q0 : processor to home (point to point order) Q1 : home / memory to processors ( total order ) Q2 : replies from third party node or processor to requestor

Cache Coherence Dealing with late request race
2 level mechanism Wait for victim signal before discarding from victim buffer For writeback to remote home, TTT maintains copy Dealing with early request race Delay forwarded request on Q1 until data arrives on Q2 Allow transactions to be served within a node No invalidate acknowledgement messages Multicast is used to send Q1 messages to multiple nodes

Cache Coherence R: Requestor H: Home O: Owner S: Sharer Dirty sharing
no sharing writeback Marker message Allows requestor node to disambiguate the order of requests

Memory Consistency Optimizations
Alpha memory model is supported Barrier instructions impose memory ordering To implement safe early acknowledgement of invalidation, reply message split into Data component needed to service the request Commit component used for ordering Generate early commit component for read and read-exclusive requests

Performance Evaluation
Relatively high back to back read latency Effective latency smaller for independent read misses Smaller L2 hit latency than snoopy systems Latency impact of sending invalidations is small and independent of no. of sharers Local home writes with remote sharers take longer than with no sharers in case of barrier Conflicting writes to same line have approximately same latency as 1-hop write latency

Questions Do you think that Alphaserver GS320 can completely replace snoopy systems and if not, why? What are the major disadvantages of AlphaServer GS320?

Architecture and Design of AlphaServer GS320

Similar presentations

Presentation on theme: "Architecture and Design of AlphaServer GS320"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Architecture and Design of AlphaServer GS320

Similar presentations

Presentation on theme: "Architecture and Design of AlphaServer GS320"— Presentation transcript:

Similar presentations

About project

Feedback