Capriccio: Scalable Threads for Internet Services (von Behren) Kenneth Chiu.

Slides:

Advertisements

Similar presentations

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Advertisements

Chess Review May 8, 2003 Berkeley, CA Compiler Support for Multithreaded Software Jeremy ConditRob von Behren Feng ZhouEric Brewer George Necula.

1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

1 Capriccio: Scalable Threads for Internet Services Matthew Phillips.

Threads Irfan Khan Myo Thein What Are Threads ? a light, fine, string like length of material made up of two or more fibers or strands of spun cotton,

Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.

Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.

Capriccio: Scalable Threads for Internet Services ( by Behren, Condit, Zhou, Necula, Brewer ) Presented by Alex Sherman and Sarita Bafna.

Scheduling in Batch Systems

Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.

Capriccio: Scalable Threads For Internet Services Authors: Rob von Behren, Jeremy Condit, Feng Zhou, George C. Necula, Eric Brewer Presentation by: Will.

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.

Memory Management 2010.

1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Ch 4: Threads Dr. Mohamed Hefeeda.

Threads CSCI 444/544 Operating Systems Fall 2008.

1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.

1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Ch 4: Threads Dr. Mohamed Hefeeda.

Threads. Processes and Threads  Two characteristics of “processes” as considered so far: Unit of resource allocation Unit of dispatch  Characteristics.

Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Objectives Thread definitions and relationship to process Multithreading.

A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.

CS 3013 & CS 502 Summer 2006 Threads1 CS-3013 & CS-502 Summer 2006.

1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Multithreading Allows application to split itself into multiple “threads” of execution (“threads of execution”). OS support for creating threads, terminating.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.

A Revolutionary Programming Pattern that Will Clean up your Code : Coroutines in C++ David Sackstein ACCU 2015.

Scheduling Basic scheduling policies, for OS schedulers (threads, tasks, processes) or thread library schedulers Review of Context Switching overheads.

1 Scheduling The part of the OS that makes the choice of which process to run next is called the scheduler and the algorithm it uses is called the scheduling.

Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,

Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer University of California at Berkeley.

Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.

CS333 Intro to Operating Systems Jonathan Walpole.

1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego.

Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.

5204 – Operating Systems Threads vs. Events. 2 CS 5204 – Operating Systems Forms of task management serial preemptivecooperative (yield) (interrupt)

Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.

Department of Computer Science and Software Engineering

Processes, Threads, and Process States. Programs and Processes  Program: an executable file (before/after compilation)  Process: an instance of a program.

Module 2.0: Threads.

Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.

Capriccio: Scalable Threads for Internet Service

Threads-Process Interaction. CONTENTS  Threads  Process interaction.

By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan

Holistic Systems Programming Qualifying Exam Presentation UC Berkeley, Computer Science Division Rob von Behren June 21, 2004.

Threads versus Events CSE451 Andrew Whitaker. This Class Threads vs. events is an ongoing debate  So, neat-and-tidy answers aren’t necessarily available.

1 Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer (May 2003) CS533 – Spring 2006 – DONG, QIN.

Threads. Readings r Silberschatz et al : Chapter 4.

CS533 Concepts of Operating Systems Jonathan Walpole.

1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!

Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.

1 Threads, SMP, and Microkernels Chapter 4. 2 Process Resource ownership - process includes a virtual address space to hold the process image Scheduling/execution-

Multithreading vs. Event Driven in Code Development of High Performance Servers.

7/9/ Realizing Concurrency using Posix Threads (pthreads) B. Ramamurthy.

Capriccio:Scalable Threads for Internet Services

Capriccio : Scalable Threads for Internet Services

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Why Events Are A Bad Idea (for high-concurrency servers)

Presenter: Godmar Back

Capriccio – A Thread Model

Threads and Concurrency

CPU scheduling decisions may take place when a process:

Threads Chapter 4.

CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization

CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization

CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization

Presentation transcript:

Capriccio: Scalable Threads for Internet Services (von Behren) Kenneth Chiu

Background Non-blocking I/O, async I/O –NB Usually doesn’t work well for disks. –Async I/O Issue a request, get completion. epoll()/poll() convoy: tendency for threads to “bunch up” priority inversion call graph average, weighted moving average capriccio: improvisatory style, free form

The Problem Web “transactions” involve a number of steps which must be performed in sequence. For high-throughput, we want to service many of these requests concurrently. –When does concurrency help? When does it not? If we use a single thread per request, we will have too many threads. If we multiplex requests on a small set of threads, it’s more difficult.

Read two numbers and add while (true) { fd = get_read_ready(); state = lookup(fd); if (state.step == READING_FIRST) { c = read(fd, …, bytes_left); if (have enough) { state.step == READING_SECOND; } } else if (state.step == READING_SECOND) { … } while (true) { int n1, n2; readexact(fd, &n1, 4); readexact(fd, &n2, 4); printf(“%d\n”, n1 + n2); }

Thread Design and Scalability

The Case for User-Level Threads Flexibility –Level of indirection between applications and the kernel, which helps decouple the two. –Kernel-level thread scheduling must handle all applications. User-level can be tailored. –Lightweight which means can use zillions of them. Performance –Cooperative scheduling is nearly free. –Do not require kernel crossing for uncontended locks. (Why do contended locks require kernel crossings?) Disadvantages –Non-blocking I/O requires an additional system call. (Why?) –SMPs

Implementation Context switches –Built on coroutine library. I/O –Intercept blocking system calls, use epoll() and AIO for disk. –Can be less efficient Scheduling –Main scheduling loop looks very much like an event-driven application. (What is an EDA?) –Makes it relatively easy to switch schedulers. Synchronization –Cooperative threading on UP. Efficiency –All O(1), except sleep queue.

Benchmarks 2 X 2.4 GHz Xeon, 1 GB memory, 2 X 10K RPM SCSI, GigE. –2 X 1.2 GHz US III Linux , epoll(), AIO. –Solaris 8 Capriccio, LinuxThreads, NPTL

Thread Primitives CapriccioCapriccio (notrace) Linux- Threads NPTLSolaris Thread creation Thread context switch Uncontended mutex lock

Thread Scalability Producer-consumer

Thread Scalability Drop between 100 and 1000 to cache footprint.

I/O Performance pipetest –Pass a number of tokens among a set of pipes. Disk scheduling –A number of threads perform random 4 KB reads from a 1 GB file. Disk I/O through buffer cache –200 threads reading with a fixed miss rate.

When concurrency is low, performance is poorer.

Benefits of disk head scheduling.

I/O out of buffer. Performance is lower due to AIO.

Linked Stack Management

Thread Stacks If a lot of threads, the cumulative stack space can be quite large. Solution: Use a dynamic allocation policy and allocate on demand. Link stack chunks together. Problem: How do you link stack chunks together? How do you know when to link a new one?

Weighed Call Graph Use static analysis to create a weighted call graph. Each node is weighed by the maximum stack space that that function might consume. (Why is it maximum, and not exact?) Now what?

Bounds Most real-world programs use recursion. Even without, static bound wastes too much. Instead insert checkpoints at key places to link in new stack chunks. Chunks switched right before arguments are pushed.

Placing Checkpoints Make sure one checkpoint in every cycle by inserting in back edges. (How?) (Is this efficient?) Then make sure each path (sum) is not too long.

Function B is executing. Function D, both ways. Recursion.

Special Cases Function pointers –Difficult, but they try to analyze. External functions –Allow annotations. –Alternatively, link in a large chunk. Variable length arrays –C99

Question What kind of a problem is this? Is it being solved at the right level?

Resource-Aware Scheduling

Admission Control We’ve seen many graphs where performance degrades as some variable increases. Scheduling in Capriccio is to keep performance in the “good” part of the curve.

Blocking Graph Each node is a location where the program blocked. –Location is call chain. Generated at run time. Annotate with resource usage: –Average running time (with exponentially-weighted “moving” average), memory, stack, sockets, etc. Maintain a run queue for each node. Admit threads till resources reach maximum capacity.

Pitfalls Too many non-linear effects to predict. One solution is to use some kind of instrumentation, plus feedback control. –But even detecting that is hard.

Web Server Test

Summary Control flow maintains state. Control flow can be swapped for explicit maintenance. Threads perform two functions: –Maintain state (logical threads of programming model) –Allow concurrency (kernel) Should separate the two, since the overhead of concurrency is not necessary when just want to maintain state. Cooperative multitasking has been denigrated before, but can be good.