We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byTurner Sanderford
Modified about 1 year ago
Multiprocessor Architecture Basics The Art of Multiprocessor Programming Spring 2007
© 2003 Herlihy and Shavit2 Multiprocessor Architecture Abstract models are (mostly) OK to understand algorithm correctness To understand how concurrent algorithms perform You need to understand something about multiprocessor architectures
© 2003 Herlihy and Shavit3 Pieces Processors Threads Interconnect Memory Caches
© 2003 Herlihy and Shavit4 Design an Urban Messenger Service in 1980 Downtown Manhattan Should you use –Cars (1980 Buicks, 15 MPG)? –Bicycles (hire recent graduates)? Better use bicycles
© 2003 Herlihy and Shavit5 Technology Changes Since 1980, car technology has changed enormously –Better mileage (hybrid cars, 35 MPG) –More reliable Should you rethink your Manhattan messenger service?
© 2003 Herlihy and Shavit6 Processors Cycle: –Fetch and execute one instruction Cycle times change –1980: 10 million cycles/sec –2005: 3,000 million cycles/sec
© 2003 Herlihy and Shavit7 Computer Architecture Measure time in cycles –Absolute cycle times change Memory access: ~100s of cycles –Changes slowly –Mostly gets worse
© 2003 Herlihy and Shavit8 Threads Execution of a sequential program Software, not hardware A processor can run a thread Put it aside –Thread does I/O –Thread runs out of time Run another thread
© 2003 Herlihy and Shavit9 Interconnect Bus –Like a tiny Ethernet –Broadcast medium –Connects Processors to memory Processors to processors Network –Tiny LAN –Mostly used on large machines
© 2003 Herlihy and Shavit10 Interconnect Interconnect is a finite resource Processors can be delayed if others are consuming too much Avoid algorithms that use too much bandwidth
© 2003 Herlihy and Shavit11 Analogy You work in an office When you leave for lunch, someone else takes over your office. If you don’t take a break, a security guard shows up and escorts you to the cafeteria. When you return, you may get a different office
© 2003 Herlihy and Shavit12 Processor and Memory are Far Apart processor memory interconnect
© 2003 Herlihy and Shavit13 Reading from Memory address
© 2003 Herlihy and Shavit14 Reading from Memory zzz…
© 2003 Herlihy and Shavit15 Reading from Memory value
© 2003 Herlihy and Shavit16 Writing to Memory address, value
© 2003 Herlihy and Shavit17 Writing to Memory zzz…
© 2003 Herlihy and Shavit18 Writing to Memory ack
© 2003 Herlihy and Shavit19 Remote Spinning Thread waits for a bit in memory to change –Maybe it tried to dequeue from an empty buffer Spins –Repeatedly rereads flag bit Huge waste of interconnect bandwidth
© 2003 Herlihy and Shavit20 Analogy In the days before the Internet … Alice is writing a paper on aardvarks Sources are in university library –Request book by campus mail –Book arrives by return mail –Send it back when not in use She spends a lot of time in the mail room
© 2003 Herlihy and Shavit21 Analogy II Alice buys –A desk In her office To keep the books books she is using now –A bookcase in the hall To keep the books she will need soon
© 2003 Herlihy and Shavit22 Cache: Reading from Memory address cache
© 2003 Herlihy and Shavit23 Cache: Reading from Memory cache
© 2003 Herlihy and Shavit24 Cache: Reading from Memory cache
© 2003 Herlihy and Shavit25 Cache Hit cache ?
© 2003 Herlihy and Shavit26 Cache Hit cache Yes!
© 2003 Herlihy and Shavit27 Cache Miss address cache ? No…
© 2003 Herlihy and Shavit28 Cache Miss cache
© 2003 Herlihy and Shavit29 Cache Miss cache
© 2003 Herlihy and Shavit30 Local Spinning With caches, spinning becomes practical First time –Load flag bit into cache As long as it doesn’t change –Hit in cache (no interconnect used) When it changes –One-time cost –See cache coherence below
© 2003 Herlihy and Shavit31 Granularity Caches operate at a larger granularity than a word Cache line: fixed-size block containing the address
© 2003 Herlihy and Shavit32 Locality If you use an address now, you will probably use it again soon –Fetch from cache, not memory If you use an address now, you will probably use a nearby address soon –In the same cache line
© 2003 Herlihy and Shavit33 Hit Ratio Proportion of requests that hit in the cache Measure of effectiveness of caching mechanism Depends on locality of application
© 2003 Herlihy and Shavit34 L1 and L2 Caches L1 L2
© 2003 Herlihy and Shavit35 L1 and L2 Caches L1 L2 Small & fast 1 or 2 cycles ~16 byte line
© 2003 Herlihy and Shavit36 L1 and L2 Caches L1 L2 Larger and slower 10s of cycles ~1K line size
© 2003 Herlihy and Shavit37 When a Cache Becomes Full… Need to make room for new entry By evicting an existing entry Need a replacement policy –Usually some kind of least recently used heuristic
© 2003 Herlihy and Shavit38 Fully Associative Cache Any line can be anywhere in the cache –Advantage: can replace any line –Disadvantage: hard to find lines
© 2003 Herlihy and Shavit39 Direct Mapped Cache Every address has exactly 1 slot –Advantage: easy to find a line –Disadvantage: must replace fixed line
© 2003 Herlihy and Shavit40 K-way Set Associative Cache Each slot holds k lines –Advantage: pretty easy to find a line –Advantage: some choice in replacing line
© 2003 Herlihy and Shavit41 Contention Alice and Bob are both writing research papers on aardvarks. Alice has encyclopedia vol AA-AC Bob asks library for it –Library asks Alice to return it –Alice returns it & rerequests it –Library asks Bob to return it…
© 2003 Herlihy and Shavit42 Contention Good to avoid memory contention. Idle processors Consumes interconnect bandwidth
© 2003 Herlihy and Shavit43 Contention Alice is still writing a research paper on aardvarks. Carol is writing a tourist guide to the German city of Aachen No conflict? –Library deals with volumes, not articles –Both require same encyclopedia volume
© 2003 Herlihy and Shavit44 False Sharing Two processors may conflict over disjoint addresses If those addresses lie on the same cache line
© 2003 Herlihy and Shavit45 False Sharing Large cache line size –increases locality –But also increases likelihood of false sharing Sometimes need to “scatter” data to avoid this problem
© 2003 Herlihy and Shavit46 Cache Coherence Processor A and B both cache address x A writes to x –Updates cache How does B find out? Many cache coherence protocols in literature
© 2003 Herlihy and Shavit47 MESI Modified –Have modified cached data, must write back to memory
© 2003 Herlihy and Shavit48 MESI Modified –Have modified cached data, must write back to memory Exclusive –Not modified, I have only copy
© 2003 Herlihy and Shavit49 MESI Modified –Have modified cached data, must write back to memory Exclusive –Not modified, I have only copy Shared –Not modified, may be cached elsewhere
© 2003 Herlihy and Shavit50 MESI Modified –Have modified cached data, must write back to memory Exclusive –Not modified, I have only copy Shared –Not modified, may be cached elsewhere Invalid –Cache contents not meaningful
© 2003 Herlihy and Shavit51 Bus Processor Issues Load Request Bus cache memory cache data load x (1)
© 2003 Herlihy and Shavit52 cache Bus Memory Responds Bus memory cache data Got it! data (3) E
© 2003 Herlihy and Shavit53 Bus Processor Issues Load Request Bus memory cache data Load x (2) E
© 2003 Herlihy and Shavit54 Bus Other Processor Responds memory cache data Got it data Bus (2) E S S
© 2003 Herlihy and Shavit55 S Modify Cached Data Bus data memory cachedata (1) S
© 2003 Herlihy and Shavit56 S memory data Bus Write-Through Cache Bus cache data Write x! (5) S
© 2003 Herlihy and Shavit57 Write-Through Caches Immediately broadcast changes Good –Memory, caches always agree –More read hits, maybe Bad –Bus traffic on all writes –Most writes to unshared data –For example, loop indexes … (1)
© 2003 Herlihy and Shavit58 Write-Through Caches Immediately broadcast changes Good –Memory, caches always agree –More read hits, maybe Bad –Bus traffic on all writes –Most writes to unshared data –For example, loop indexes … “show stoppers” (1)
© 2003 Herlihy and Shavit59 Write-Back Caches Accumulate changes in cache Write back when line evicted –Need the cache for something else –Another processor wants it
© 2003 Herlihy and Shavit60 Bus Invalidate Bus memory cachedata cache Invalidate x (4) S S M I
© 2003 Herlihy and Shavit61 Multicore Architectures The university president –Alarmed by fall in productivity Puts Alice, Bob, and Carol in same corridor –Private desks –Shared bookcase Contention costs go way down
© 2003 Herlihy and Shavit62 cache Bus Old-School Multiprocessor Bus memory cache
© 2003 Herlihy and Shavit63 cache Bus memory cache Multicore Architecture
© 2003 Herlihy and Shavit64 Multicore Private L1 caches Shared L2 caches Communication between same-chip processors now very fast Different-chip processors still not so fast
© 2003 Herlihy and Shavit65 NUMA Architectures Alice and Bob transfer to NUMA State University No centralized library Each office basement holds part of the library
© 2003 Herlihy and Shavit66 Distributed Shared-Memory Architectures Alice’s has volumes that start with A –Aardvark papers are convenient: run downstairs –Zebra papers are inconvenient: run across campus
© 2003 Herlihy and Shavit67 SMP vs NUMA SMP memory NUMA (1) SMP: symmetric multiprocessor NUMA: non-uniform memory access CC-NUMA: cache-coherent …
© 2003 Herlihy and Shavit68 Spinning Again NUMA without cache –OK if local variable –Bad if remote Cc-NUMA –Like SMP
© 2003 Herlihy and Shavit69 Relaxed Memory Remember the flag principle? –Alice and Bob’s flag variables false Alice writes true to her flag and reads Bob’s Bob writes true to his flag and reads Alice’s One must see the other’s flag true
© 2003 Herlihy and Shavit70 Not Necessarily So Sometimes the compiler reorders memory operations Can improve –cache performance –interconnect use But unexpected concurrent interactions
© 2003 Herlihy and Shavit71 Write Buffers address Absorbing Batching
© 2003 Herlihy and Shavit72 Volatile In Java, if a variable is declared volatile, operations won’t be reordered Expensive, so use it only when needed
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 7 Multiprocessors and Multicomputers 7.2 Cache Coherence & Synchronization.
Multiprocessing and NUMA. What we sort of assumed so far… Northbridge connects CPU and memory to rest of system – Memory controller implemented in Northbridge.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Notes: Multiprocessors (updated version)
SCSC 311 Information Systems: hardware and software.
Cache & SpinLocks Udi & Haim. Agenda Caching background –Why do we need caching? –Caching in modern desktop. –Cache writing. –Cache coherence. –Cache.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
CS252/Patterson Lec /28/01 CS162 Computer Architecture Lecture 15: Symmetric Multiprocessor: Cache Protocols L.N. Bhuyan Adapted from Pattersons.
What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. Operating system goals: Execute.
Computer Systems Lecturer: Szabolcs Mikulas URL: Textbook: W. Stallings,
Chapter7. System Organization. System Organization - How computers and their major components are interconnected and managed at the system level. 7.1.
IT253: Computer Organization Lecture 13: Input/Output Tonga Institute of Higher Education.
Design and Implementation Issues Today Design issues for paging systems Implementation issues Segmentation Next I/O.
The. of and a to in is you that it he for.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Chapter 18: Database System Architectures Centralized Systems Client--Server Systems Parallel.
Multiple Processor Systems Bits of Chapters 4, 10, 16 Operating Systems: Internals and Design Principles, 6/E William Stallings.
Computers in the real world Objectives Understand the key ideas of the operating system Look at standard utility software and be able to describe them.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Database System.
Linux: The Guts By Sam Evans and John Massey. History Of Linux ❖ 1984: Richard Stallman quits his job at MIT, and starts working on the GNU project. ❖
Chapters 7 & 8 Memory Management Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College, Venice,
Vipul Patel Ideas … Please …
Multiprocessors Advanced Computers Architecture, UNIT 4 Flynn's classificationVector computersPipelining in Vector computersCrayMultiprocessor interconnectionGeneral.
Final touches on Out-of-Order execution Review Superscalar Looking back Looking forward.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Cristian Hill. 6.1 Mocking Mr. Rohol is fun Introduction The CPU performs most of the calculations on the PC The CPU is a single chip on the motherboard.
COMPUTER NETWORKS. COMMUNICATION BETWEEN COMPUTERS For a computer to communicate with each other (which may be a completely different system) an interface.
Low Latency Networking Glenford Mapp Digital Technology Group Computer Laboratory
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Database.
1. Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism.
© 2016 SlidePlayer.com Inc. All rights reserved.