1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.

Slides:

Advertisements

Similar presentations

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Advertisements

Multiple-Writer Distributed Memory. The Sequential Consistency Memory Model P1P2 P3 switch randomly set after each memory op ensures some serial order.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

Parallel Architectures: Topologies Heiko Schröder, 2003.

Parallel Architectures: Topologies Heiko Schröder, 2003.

1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.

1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)

1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)

1 Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background.

Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.

Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]

Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

1 Lecture 16: On-Chip Networks Today: on-chip networks background.

1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.

Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.

Interconnection Network Topology Design Trade-offs

1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.

1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.

1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,

1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )

ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.

Interconnect Network Topologies

1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:

Interconnect Networks

CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.

TreadMarks Distributed Shared Memory on Standard Workstations and Operating Systems Pete Keleher, Alan Cox, Sandhya Dwarkadas, Willy Zwaenepoel.

August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Parallel Computer Architecture and Interconnect 1b.1.

Lecture 3 Innerconnection Networks for Parallel Computers

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.

1 Lecture 12: Hardware/Software Trade-Offs Topics: COMA, Software Virtual Memory.

InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

DISTRIBUTED COMPUTING

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Present By: Blair Fort Oct. 28, 2004.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z. By Nooruddin Shaik.

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

Super computers Parallel Processing

Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)

1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.

1 Lecture: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )

1 Lecture 24: WSC, Datacenters Topics: network-on-chip wrap-up, warehouse-scale computing and datacenters (Sections )

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University

Interconnection Networks Communications Among Processors.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University

Distributed and Parallel Processing

Distributed Shared Memory

Lecture 23: Interconnection Networks

Interconnection topologies

Lecture 18: Coherence and Synchronization

Lecture 14: Interconnection Networks

Interconnection Network Design Lecture 14

Lecture: Interconnection Networks

Lecture 8: Directory-Based Cache Coherence

Lecture 7: Directory-Based Cache Coherence

High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub

Lecture 25: Multiprocessors

Lecture: Networks Topics: TM wrap-up, networks.

Lecture 25: Multiprocessors

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 24: Multiprocessors

Lecture: Coherence, Synchronization

Lecture: Coherence Topics: wrap-up of snooping-based coherence,

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 19: Coherence and Synchronization

Lecture 18: Coherence and Synchronization

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics

2 Shared Virtual Memory Rd y Wr x Rd ysynch Traffic with hardware CC Traffic with software CC

3 Eager Release Consistency Invalidates/Updates are sent out to the list of sharers when a processor executes a release acqwr xrel wr xrel acqwr xrel

4 Lazy Release Consistency Invalidates/Updates are sought when a processor executes an acquire – fewer messages, higher implementation complexity acqwr xrel wr xrel acqwr xrel

5 Causality Acquires and releases pertain to specific lock variables When a process executes an acquire, it should receive all updates that were seen before the corresponding release by the releasing processor Therefore, each process must keep track of all write notices (modifications to each shared page) that were applied at every synchronization point

6 Example P1 P2 P3 P4 A1 A4 R1 R4 A1 A2 A3 R1 R3 R2 A3 A5 R3 R5 A1 R1 A1

7 Example P1 P2 P3 P4 A1 A4 R1 R4 A1 A2 A3 R1 R3 R2 A3 A5 R3 R5 A1 R1 A1

8 Implementation Each pair of synch operations in a process defines an interval A partial order is defined on intervals based on release- acquire pairs For each interval, a process maintains a vector timestamp of “preceding” intervals: the vector stores the last preceding interval for each process On an acquire, the acquiring process sends its vector timestamp to the releasing process – the releasing process sends all write notices that have not been seen by acquirer

9 LRC Performance LRC can reduce traffic by more than a factor of two for many applications (compared to ERC) Programmers have to think harder (causality!) High memory overheads at each node (keep track of vector timestamps, write notices) – garbage collection helps significantly Memory overheads can be reduced by eagerly propagating write notices to processors or a home node – will change the memory model again!

10 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies, hypercubes Consider a k-ary d-cube: a d-dimension array with k elements in each dimension, there are links between elements that differ in one dimension by 1 (mod k) Number of nodes N = k d Number of switches : Switch degree : Number of links : Pins per node : Avg. routing distance: Diameter : Bisection bandwidth : N 2d Nd 2wd d(k-1)/2 d(k-1) 2wk d-1 Should we minimize or maximize dimension?

11 Dimension For a fixed machine size N, low-dimension networks have significantly higher latencies for a packet – scalable machines should employ high dimensionality (high cost!) For a fixed number of pins, message latency decreases at first, then increases (as we increase dimensionality) What if we keep constant bisection bandwidth? Lower dimensions also reduce wire length Number of switches : Switch degree : Number of links : Pins per node : Avg. routing distance: Diameter : Bisection bandwidth : N 2d Nd 2wd d(k-1)/2 d(k-1) 2wk d-1

12 Title Bullet