Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU.

Slides:

Advertisements

Similar presentations

CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture III: OS Support.

Advertisements

CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.

Scheduling Criteria CPU utilization – keep the CPU as busy as possible (from 0% to 100%) Throughput – # of processes that complete their execution per.

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

Introduction CSCI 444/544 Operating Systems Fall 2008.

SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.

SYNAR Systems Networking and Architecture Group CMPT 886: Architecture of Niagara I Processor Dr. Alexandra Fedorova School of Computing Science SFU.

Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

Parallel Computer Architectures

SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.

Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.

COMPUTER ORGANIZATIONS CSNB123 May 2014Systems and Networking1.

Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.

Chapter 18 Multicore Computers

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

9/13/20151 Threads ICS 240: Operating Systems –William Albritton Information and Computer Sciences Department at Leeward Community College –Original slides.

StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache Hyunjin Lee Sangyeun Cho Bruce R. Childers Dept. of Computer Science University.

Multi-core architectures. Single-core computer Single-core CPU chip.

Multi-Core Architectures

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Niagara: a 32-Way Multithreaded SPARC Processor

Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.

CMT OS scheduling summary Yipkei Kwok 03/18/2008.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Dr. Alexandra Fedorova School of Computing Science SFU

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

A few issues on the design of future multicores André Seznec IRISA/INRIA.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.

Thread Level Parallelism Since ILP has inherent limitations, can we exploit multithreading? –a thread is defined as a separate process with its own instructions.

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.

Lecture on Central Process Unit (CPU)

CSC 7080 Graduate Computer Architecture Lec 8 – Multiprocessors & Thread- Level Parallelism (3) – Sun T1 Dr. Khalaf Notes adapted from: David Patterson.

COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.

Threads. Readings r Silberschatz et al : Chapter 4.

Advanced Computer Architecture pg 1 Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8) Henk Corporaal

Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.

1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.

Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.

Guide to Operating Systems, 5th Edition

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Lynn Choi School of Electrical Engineering

OPERATING SYSTEMS CS 3502 Fall 2017

Visit for more Learning Resources

Distributed Processors

For Massively Parallel Computation The Chaotic State of the Art

The Multikernel: A New OS Architecture for Scalable Multicore Systems

OPERATING SYSTEMS CS3502 Fall 2017

Multi-core processors

Multi-core processors

Core i7 micro-processor

Multi-Processing in High Performance Computer Architecture:

What happens inside a CPU?

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

Hyperthreading Technology

Computer Architecture Lecture 4 17th May, 2006

CMPT 886: Computer Architecture Primer

Hardware Multithreading

Adaptive Single-Chip Multiprocessing

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

EE 193: Parallel Computing

Presentation transcript:

Dr. Alexandra Fedorova August 2007 Introduction to Systems Research at SFU

2 CMPT 401 Summer 2007 © A. Fedorova Introduction Systems: software systems, hardware systems, the interaction between them New research area at SFU, before December 2006 there were no faculty members at SFU doing systems research (not counting networking) Research opportunities at undergraduate and graduate level: –Undergraduate honours thesis –CMPT 415 –Paid research assistanships –Master’s and Ph.D.

3 CMPT 401 Summer 2007 © A. Fedorova What is Systems Research? System – a collection of software and hardware components that accomplish a certain goal Usually this does not include applications, but includes system software: –The operating system –System libraries Systems research concerns with building these components and structuring their interaction

4 CMPT 401 Summer 2007 © A. Fedorova Systems Research at SFU System software design for chip multithreading processors Computer Architecture Distributed Systems

5 CMPT 401 Summer 2007 © A. Fedorova System Software Design for Chip Multithreading Processors What is chip multithreading? Why is this research relevant? What research problems are we addressing?

6 CMPT 401 Summer 2007 © A. Fedorova Chip Multithreading (CMT) Conventional processor: one software thread runs on a chip at a given instant: Level-1 cache A CHIP Level-2 cache CMT processors: multiple threads runs on the same chip simultaneously:

7 CMPT 401 Summer 2007 © A. Fedorova CMT: The Dominant Architecture Most new processors are CMT: –Intel: 100% of new server processors and 90% of high- performance desktop processors are CMT by the end of 2007 All major hardware vendors are in the CMT business: –Sun Microsystems Niagara (32 threads on the chip) –IBM Power4, Power5, Power6 –Intel Hyper-threaded Xeon (servers, desktops) –Intel Core Duo (desktops and laptops) –Dell Quad core systems (2x Intel Dual-core processors) –AMD Quad core (coming up in Fall 2007)

8 CMPT 401 Summer 2007 © A. Fedorova Why CMT? Running one thread per chip is inefficient Due to nature of modern applications, computational hardware is underutilized –Modern applications spend 50-60% of their CPU time accessing memory –While memory is accessed CPU pipeline is stalled – it is idle, not doing anything useful –But while it is stalled, CPU is still consuming power –So there’s power waste with no benefit Idea behind CMT: while one thread stalls the pipeline, let another thread use it –Sort of like overlapping I/O and computation but at the micro level

9 CMPT 401 Summer 2007 © A. Fedorova CMT: More Efficient CPU Utilization time 1:add2:subtract4:load data from memory 3:load data from cache stall the pipeline 2:add 1:load data from memory 3:subtract thread 1 4:add thread 0 Stall the pipeline Pipeline is busy

10 CMPT 401 Summer 2007 © A. Fedorova How to Enable CMT? How to enable running multiple threads on the same chip? –Hardware multithreading –Multicore processing –Combination of the two

11 CMPT 401 Summer 2007 © A. Fedorova Hardware Multithreading Run at least two threads on the same processing core Some hardware is duplicated, some is shared Shared hardware: –Pipeline: i.e., functional units, register files, queues –Caches: Level-1 (L1) instruction and data caches, Level-2 (L2) unified cache –Interconnects Multithreaded processors: –Intel Hyper-threaded Xeon –IBM Power5, Power6, Cell –Sun Microsystems Niagara Level-1 cache A CHIP Level-2 cache

12 CMPT 401 Summer 2007 © A. Fedorova Multicore Processing Multiple processing cores on the same chip Threads share the L2 cache (and other lower-level caches), and interconnects Multicore processors: –Intel Core Duo –AMD Quad Core –IBM Power4, 5, 6 –Sun Microsystems Niagara L1 cache A CHIP L1 cache L2 cache

13 CMPT 401 Summer 2007 © A. Fedorova Multicore + Multithreading A multicore processor Each core is multithreaded Multicore and multithreaded processors: –Sun Microsystems Niagara –IBM Power5, Power6 L1 cache A CHIP L1 cache L2 cache

14 CMPT 401 Summer 2007 © A. Fedorova Research on CMT Processors Computer architecture research: –How to design a CMT processor to achieve a good combination of: CPU utilization, application performance, power efficiency System software research: –How to design system software, i.e., the operating system, that enables applications to perform well on these processors?

15 CMPT 401 Summer 2007 © A. Fedorova OS Design for CMT Processors Operating systems are traditionally responsible for the allocation of hardware resources On CMT processors, on-chip resources are shared among threads that run simultaneously How you allocate those resources among threads determines the performance that those threads will achieve Let’s look at a few examples…

16 CMPT 401 Summer 2007 © A. Fedorova Constructing Optimal Co-schedules L1 cache A CHIP L1 cache L2 cache Blue suffers when it does not have enough L1 cache, Red uses lots of L1 cache Green does not use much L1 cache Yellow does not suffer when it does not have much L1 cache

17 CMPT 401 Summer 2007 © A. Fedorova Constructing Optimal Co-schedules (cont.) How do we find out applications’ cache behaviour? –Turns out you need to consider memory access patterns - this is not trivial to measure How do you model interactions among applications? –How do you know if one application’s cache usage patterns are incompatible with another’s? These patterns/relationships cannot be measured directly Can they be modeled? –Simple models are inaccurate –Complex models are too inefficient to use inside an operating system scheduler Approach of my group: use learning methods, feedback-directed scheduling

18 CMPT 401 Summer 2007 © A. Fedorova Heterogeneous Multicore Systems One size does not fit all –Application class A runs best on core with feature set X –Application class B runs best on core with feature set Y Rather than designing a homogeneous multicore system that attempts to satisfy everyone but satisfies no one, design a heterogeneous multicore system (HMC) L1 cache A CHIP L1 cache L2 cache

19 CMPT 401 Summer 2007 © A. Fedorova Scheduling On HMC Systems L1 cache Core 1 A CHIP L1 cache Core 2 L2 cache Set A: Want to run on Core 1 Set B: Want to run on Core 2

20 CMPT 401 Summer 2007 © A. Fedorova Scheduling On HMC Systems If you schedule all threads in Set A on their preferred core, those threads will suffer from: –Low amount of CPU time –High response time Because there is high demand for that core, and they’d have to share it with others So you might want to schedule threads on their non- preferred core once in a while How do you balance between performance, fair CPU allocation and good response time?

21 CMPT 401 Summer 2007 © A. Fedorova Summary CMT systems are new and cool, yet prevalent enough for people to care about them Companies are desperate to hire students with experience on CMT systems If you are thinking about academic career: new and hot research area –Many problems –Many opportunities to publish Talk to me if you are interested in research opportunities Tell your friends who might be interested