Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel Christos D. Antonopoulos Panagiotis.

Slides:



Advertisements
Similar presentations
CS 5204 – Operating Systems 1 Scheduler Activations.
Advertisements

Chapter 5 Threads os5.
Modified from Silberschatz, Galvin and Gagne ©2009 Lecture 7 Chapter 4: Threads (cont)
Threads Irfan Khan Myo Thein What Are Threads ? a light, fine, string like length of material made up of two or more fibers or strands of spun cotton,
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
Threads CSCI 444/544 Operating Systems Fall 2008.
Process Concept An operating system executes a variety of programs
Scheduler Activations Jeff Chase. Threads in a Process Threads are useful at user-level – Parallelism, hide I/O latency, interactivity Option A (early.
Processes Part I Processes & Threads* *Referred to slides by Dr. Sanjeev Setia at George Mason University Chapter 3.
Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
Scheduler Activations: Effective Kernel Support for the User-level Management of Parallelism Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Department of Computer Science and Software Engineering
Distributed (Operating) Systems -Processes and Threads-
Processes & Threads Introduction to Operating Systems: Module 5.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Chapter 4: Threads 羅習五. Chapter 4: Threads Motivation and Overview Multithreading Models Threading Issues Examples – Pthreads – Windows XP Threads – Linux.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Chapter 4 – Thread Concepts
OPERATING SYSTEM CONCEPT AND PRACTISE
Chapter 4: Threads.
Process Management Process Concept Why only the global variables?
Chapter 3: Process Concept
PROCESS MANAGEMENT IN MACH
CS 6560: Operating Systems Design
Operating System Concepts
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Processes and Threads Processes and their scheduling
OPERATING SYSTEMS CS3502 Fall 2017
Networks and Operating Systems: Exercise Session 2
Chapter 4 – Thread Concepts
Chapter 2 Processes and Threads Today 2.1 Processes 2.2 Threads
Chapter 3 Threads and Multithreading
Chapter 4: Multithreaded Programming
Process Management Presented By Aditya Gupta Assistant Professor
Chapter 4: Threads 羅習五.
Chapter 2 Scheduling.
Chapter 4: Threads.
Threads & multithreading
Chapter 4: Threads.
Operating Systems Processes and Threads.
Chapter 4: Threads.
Department of Computer Science University of California, Santa Barbara
Threads, SMP, and Microkernels
ICS 143 Principles of Operating Systems
Lecture 21: Introduction to Process Scheduling
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
CSE 451: Operating Systems Autumn 2003 Lecture 5 Threads
Fast Communication and User Level Parallelism
Threads Chapter 4.
CSE 451: Operating Systems Winter 2003 Lecture 5 Threads
Multiprocessor and Real-Time Scheduling
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
Prof. Leonardo Mostarda University of Camerino
Concurrency, Processes and Threads
Chapter 3: Processes.
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Lecture 21: Introduction to Process Scheduling
Thomas E. Anderson, Brian N. Bershad,
CS510 Operating System Foundations
Department of Computer Science University of California, Santa Barbara
Chapter 4: Threads.
Operating System Overview
CSE 451: Operating Systems Winter 2001 Lecture 5 Threads
Chapter 3: Process Management
Threads.
Presentation transcript:

Achieving Multiprogramming Scalability of Parallel Programs on Intel SMP Platforms: Nanothreading in the Linux Kernel Christos D. Antonopoulos Panagiotis E. Hadjidoukas Theodore S. Papatheodorou Dimitrios S. Nikolopoulos Ioannis E. Venetis Eleftherios D. Polychronopoulos High Performance Information Systems Laboratory Department of Computer Engineering and Informatics University of Patras, Greece http://www.hpclab.ceid.upatras.gr ParCo’99, Delft, The Netherlands, August 17 1999

Motivation Proliferation of IA-based SMPs for parallel and mainstream computing. Multithreading POSIX Threads 1003.1c standard engineering and desktop applications Multiprogramming simultaneous execution of parallel and sequential jobs workload diversity Poor integration of multithreading with multiprogramming multiprogramming-oblivious runtime systems multithreading-oblivious operating system kernels poor performance of parallel programs in non-dedicated environments ParCo’99, Delft, The Netherlands, August 17 1999

Adaptability of Parallel Applications in non-dedicated Multiprogrammed Environments Lightweight communication path between the runtime system and the operating system Communication of critical scheduling events such as allocation and preemption of processors from the operating system to the application Communication of the application-level degree of parallelism to the operating system for guiding processor allocation One-to-one mapping: user-level threads to kernel threads, kernel threads to physical processors Fast resuming of maliciously preempted user-level threads that execute on the critical path of the application ParCo’99, Delft, The Netherlands, August 17 1999

The Nanothreading Interface Communication between the kernel and the runtime system through loads and stores in shared memory minimal overhead, no additional context switches & kernel crossings fast cloning of execution vehicles and processor assignment Polling of critical scheduling information from user space Actual number of allocated processors Actual state of the owned kernel threads Adaptation to kernel scheduler interventions automatic adjustment of thread granularity identification and resuming of preempted user-level threads that execute on the critical path minimization of idle time for maximum utilization Effective dynamic space and time sharing ParCo’99, Delft, The Netherlands, August 17 1999

The Shared Arena Kernel Space Application User Space R/W segment adaptive parallel programs R/W segment n_cpus_requested CPU Worker/ idler Worker/ idler Worker/ idler CPU VM page OS Scheduler CPU blocked/ preempted/ running blocked/ preempted/ running blocked/ preempted/ running CPU non adaptive parallel and sequential programs n_cpus_current n_cpus_preempted R/O segment ParCo’99, Delft, The Netherlands, August 17 1999

General Functionality Application communicates requests for processors EVs upcall to the user-level scheduler upon assignment of processors Notification of EV state at the program level from the runtime system workers, idlers Notification of EV state at the kernel level from the OS Running, Preempted Polling of the shared arena initiation of parallel execution phases “safe” execution points of the user-level scheduler Intra-program priority scheduling: Idlers handoff their CPU in favor of: preempted workers recently unblocked threads EVs belonging to other applications ParCo’99, Delft, The Netherlands, August 17 1999

Kernel Implementation Issues in Linux 2.0 Shared Arena Pinned memory page application-side copy with R/W privileges trusted kernel-side copy with R/O privileges copy-on-write of the R/O fields to reduce TLB flushes EV cloning in the kernel batch creation of the EVs that serve a single nanothreading application instruction pointer set to upcall to the user-level scheduler Additional Functionality share groups binding of kernel threads to processors explicit blocking/unblocking through a counting semaphore ParCo’99, Delft, The Netherlands, August 17 1999

Kernel Scheduler Modifications Nanothreading scheduler Invoked upon changes of the workload of nanothreading jobs time quantum expiration two-level scheduling Three-phase scheduling Assignment of a number of runnable EVs to processors dynamic time/space sharing Indirect assignment of specific CPUs to nanothreading applications nanothreading applications compete with non-nanothreading applications processor locality Selection of the specific kernel threads to run on each physical processor affinity scheduling priority preempted workers  preempted idlers voluntarily suspended idlers ParCo’99, Delft, The Netherlands, August 17 1999

Handoffs and Blocking of Threads in the Kernel Handoff Scheduling triggered at idling points of the user-level scheduling loop equivalent to the third phase of the nanothreads scheduler may resume an EV from another program to maximize utilization (yielding) Blocking blocking activates local scheduling unblocked threads are resumed immediately or marked as high-priority preempted threads applications with blocked EVs run with lower priority ParCo’99, Delft, The Netherlands, August 17 1999

Runtime System Modifications Initialization shared arena setup communication of maximum processor requirements Polling the shared arena before initiating parallel execution Polling the shared arena at idling points handoff scheduling Non-blocking synchronization with concurrent queues immunity to preemptions of user-level threads from the OS ParCo’99, Delft, The Netherlands, August 17 1999

Performance Evaluation Quad Pentium Pro (Compaq ProLiant 5500) 4 Pentium Pro processors clocked at 200 Mhz. 512 Kbytes L2 cache per processor 512 Mbytes main memory Linux kernel version 2.0.36 Nanothreads Runtime Library (http://www.ac.upc.es/NANOS) Multiprogrammed workloads multiple copies of SPLASH-2 LU, Volrend, FFT, Raytrace applications with task-queue or mast-slave execution paradigms 2,4, and 8-way multiprogramming ParCo’99, Delft, The Netherlands, August 17 1999

multiprogramming degree Results SPLASH-2 LU 35 30 25 20 avg. turnaround time (seconds) 15 Native Linux Kernel Nanothreading Kernel 10 5 1-way 2-way 4-way 8-way multiprogramming degree ParCo’99, Delft, The Netherlands, August 17 1999

Results SPLASH-2 Volrend 90 80 70 60 50 avg. turnaround time (seconds) 40 Native Linux Kernel 30 Nanothreading Kernel 20 10 1-way 2-way 4-way 8-way multiprogramming degree ParCo’99, Delft, The Netherlands, August 17 1999

Results SPLASH-2 Raytrace 180 160 140 120 100 avg. turnaround time (seconds) 80 Native Linux Kernel 60 Nanothreading Kernel 40 20 1-way 2-way 4-way 8-way multiprogramming degree ParCo’99, Delft, The Netherlands, August 17 1999

multiprogramming degree Results SPLASH-2 FFT 40 35 30 25 avg. turnaround time (seconds) 20 Native Linux Kernel 15 Nanothreading Kernel 10 5 1-way 2-way 4-way 8-way multiprogramming degree ParCo’99, Delft, The Netherlands, August 17 1999

Ongoing and Future Work Porting to 2.2 kernels Evaluation with non-homogeneous and I/O-centric workloads Integration with the OpenMP standard Integration with out-of-core mutlithreading runtime libraries POSIX threads 1003.1c WWW servers Database servers JVM For more information: http://www.hpclab.ceid.upatras.gr ParCo’99, Delft, The Netherlands, August 17 1999