Administrative MP4 is out If you are changing groups for MP4, let staff (profs, TAs) and your partners know – We have to change multiple state machines to get to a consistent state about your group status – If you take the extra days, you don’t have to ask/inform TAs/Profs about taking them since we see the status in the compass 10/11/20143
4 Introduction Goal of computer architects, engineers and scientists to get more and more computing power Problem with the goal: – Previous approaches: make clock run faster to speed up – We are beginning to hit some fundamental limits on clock speed – Reason: Einstein’s special theory of relativity No electrical signal can propagate faster than the speed of light: 30 cm/ns in a vacuum and 20 cm/ns over copper or optical fiber This means: – with 10 GHz clock, signals cannot travel more than 2 cm total, – with 100 GHz clock, total signal path length at most 2 mm, and – at 1000 GHz (1 THz) computer would have to be smaller than 100 µm – Reason: heat dissipation making computers this small may be possible, but the faster the computer runs, the more heat it generates, and the smaller the computer is, the harder it is to get rid of the heat.
Computer Speedup Moore’s Law: “The density of transistors on a chip doubles every 18 months, for the same cost” (1965) Image: Tom’s Hardware
10/11/20146 Multiprocessor Systems Continuous need for faster computers – shared memory model – message passing multiprocessor – wide area distributed system Loosely coupledTightly coupled
10/11/20147 Multiprocessors Shared-memory multiprocessor: A computer system in which two or more CPUs share full access to a common RAM Inter-processor communication: one CPU writes some data into memory and another one reads the data out Multiprocessor OS performs: – Regular OS functions: handling system calls, memory management, file system access, management of I/O devices – Unique to multiprocessor OS: process synchronization, resource management, scheduling Uniform vs. non-uniform memory access (UMA, NUMA) multiprocessors
10/11/20148 Multiprocessor Hardware (1) UMA Bus-based Multiprocessors Problem: If the bus is busy when a CPU wants to read or write memory, (a)the CPU waits until the bus becomes idle. As CPU speeds scale up, this imposes an unacceptable performance penalty. What is the solution?
10/11/20149 Multiprocessor Hardware (2) UMA Multiprocessors Using Crossbar Switch Problem: with best caching we can only support 16 or 32 CPUs Solution: crossbar switch with non-blocking network
10/11/ (a) 256-node directory based multiprocessor (b) Fields of 32-bit memory address (c) Directory at node 36 Multiprocessor Hardware (3)
10/11/ Multiprocessor OS Types (1) Each CPU has its own OS bus Characteristics: share single copy of OS code partition multiprocessor memory among four CPUs sharing of set of disks and other I/O devices Design aspects: when a process makes a system call, system call is handled by its own CPU no sharing of processes – consequence: no load balancing on CPUs no sharing of pages – consequence: no load balancing on memory independent buffer caching of disk blocks problem: no load balancing on CPU, memory and inconsistencies
10/11/ Multiprocessor OS Types (2) Master-Slave Multiprocessors Characteristics: one copy of OS and its tables are present on one CPU and not the others all system calls are redirected to the one CPU for processing CPU with OS is called master, all other CPUs running applications are called slaves Design aspects: there is only data structure that keeps track of ready processes – good for load balancing pages can be allocated among all processes dynamically there is only one buffer cache so inconsistencies never occur What are the problems with this OS type? bus
10/11/ Multiprocessor OS Types (3) Symmetric Multiprocessors Characteristics: eliminates the master/slave asymmetry there is one copy of OS, but any CPU can run it when a system call is made, the CPU traps to the kernel, which processes the system call Design aspects: dynamic balance of processes and memory, since there is only one set of OS tables eliminates master as a bottleneck What are the problems with this OS Type? bus
Parallel Programming Parallelization Idea Parallelization is “easy” if processing can be cleanly split into n units:
Parallelization Idea (2) In a parallel computation, we would like to have as many threads as we have processors. e.g., a four-processor computer would be able to run four threads at the same time.
Parallelization Idea (3)
Parallelization Idea (4)
Parallelization Pitfalls But this model is too simple! How do we assign work units to worker threads? What if we have more work units than threads? How do we aggregate the results at the end? How do we know all the workers have finished? What if the work cannot be divided into completely separate tasks?
Parallelization Pitfalls (2) Each of these problems represents a point at which multiple threads must communicate with one another, or access a shared resource. Golden rule: Any memory that can be used by multiple threads must have an associated synchronization system!
The Moral: Be Careful! Synchronization is hard – Need to consider all possible shared state – Must keep locks organized and use them consistently and correctly Knowing there are bugs may be tricky; fixing them can be even worse! Keeping shared state to a minimum reduces total system complexity
10/11/ Multiprocessor Synchronization (1) To ensure atomic execution of TSL, we have to be able to lock the bus
10/11/ Multiprocessor Synchronization (2) Multiple locks used to avoid cache thrashing
10/11/ Multiprocessor Synchronization (3) Spinning versus Switching In some cases CPU must wait – waits to acquire ready list In other cases a choice exists – spinning wastes CPU cycles – switching uses up CPU cycles also – possible to make separate decision each time locked mutex encountered
10/11/ Scheduling of independent processes – use of single data structure for scheduling Priorities and quantum expiration need to be carefully considered – What are the differences between uniprocessor and multiprocessor when quantum expires? Smart scheduling, affinity scheduling/two-level scheduling Multiprocessor Scheduling (1) Time Sharing
10/11/ Multiprocessor Scheduling (2) Space Sharing Processes are related to each other, or Process consisting of multiple threads Simple algorithm: – all threads are created at the same time, – each thread is given a dedicated CPU and start at same time across multiple CPUs – if there are not enough CPUs to start all threads, wait until CPUs become available Consider a central server that keeps track of available CPUs Advantage: no multi-programming Disadvantage: Waste of CPU if it blocks
10/11/ Multiprocessor Scheduling (3) Gang Scheduling Problem: communication between two threads – both belong to process A – both running out of phase
10/11/ Multiprocessor Scheduling (4) Gang Scheduling Solution: Gang Scheduling 1. Groups of related threads scheduled as a unit, or gang 2. All members of gang run simultaneously on different timeshared CPUs 3. All gang members start and end their time slices together
10/11/ Multiprocessor Scheduling (5) Gang Scheduling
10/11/ Conclusion Multiprocessors – Popular and attractive – Offer simple communication model—all CPUs share a common memory – Processes write messages to memory which can be read by others – Synchronization using mutexes, semaphores, monitors and other well-established schemes Problem: – Large multiprocessors are difficult to build and are expensive