Network Applications: High-performance Server Design: Async Servers/Operational Analysis Y. Richard Yang 2/24/2016.

Slides:



Advertisements
Similar presentations
Categories of I/O Devices
Advertisements

Executional Architecture
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Network Applications: High-Performance Network Servers Y. Richard Yang 10/01/2013.
SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Computer Systems/Operating Systems - Class 8
Chapter 7 Protocol Software On A Conventional Processor.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
Capriccio: Scalable Threads for Internet Services Rob von Behren, Jeremy Condit, Feng Zhou, George Necula and Eric Brewer University of California at Berkeley.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Server Architecture Models Operating Systems Hebrew University Spring 2004.
Operational Analysis L. Grewe. Operational Analysis Relationships that do not require any assumptions about the distribution of service times or inter-arrival.
1 Thread Pools. 2 What’s A Thread Pool? A programming technique which we will use. A collection of threads that are created once (e.g. when server starts).
Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric Brewer University of California at Berkeley
Fundamentals of Python: From First Programs Through Data Structures
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
CSCI-1680 Network Programming II Rodrigo Fonseca.
Practical Session 11 Multi Client-Server Java NIO.
Network Applications: Async Servers and Operational Analysis Y. Richard Yang 10/03/2013.
Servers: Concurrency and Performance Jeff Chase Duke University.
Advanced Operating Systems CIS 720 Lecture 1. Instructor Dr. Gurdip Singh – 234 Nichols Hall –
CHEN Ge CSIS, HKU March 9, Jigsaw W3C’s Java Web Server.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Concurrent Programming. Concurrency  Concurrency means for a program to have multiple paths of execution running at (almost) the same time. Examples:
Scalable Internet Services Cluster Lessons and Architecture Design for Scalable Services BR01, TACC, Porcupine, SEDA and Capriccio.
1 (Worker Queues) cs What is a Thread Pool? A collection of threads that are created once (e.g. when a server starts) That is, no need to create.
Background: I/O Concurrency Brad Karp UCL Computer Science CS GZ03 / M030 2 nd October, 2008.
Threaded Programming in Python Adapted from Fundamentals of Python: From First Programs Through Data Structures CPE 401 / 601 Computer Network Systems.
Practical Session 12 Reactor Pattern. Disadvantages of Thread per Client It's wasteful – Creating a new Thread is relatively expensive. – Each thread.
Client-Server Model of Interaction Chapter 20. We have looked at the details of TCP/IP Protocols Protocols Router architecture Router architecture Now.
Practical Session 11 Multi Client-Server Java NIO.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
The Client-Server Model And the Socket API. Client-Server (1) The datagram service does not require cooperation between the peer applications but such.
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Threads versus Events CSE451 Andrew Whitaker. This Class Threads vs. events is an ongoing debate  So, neat-and-tidy answers aren’t necessarily available.
Java Networking I IS Outline  Quiz #3  Network architecture  Protocols  Sockets  Server Sockets  Multi-threaded Servers.
Network Applications: Network App Programming: HTTP/1.1/2; High-performance Server Design Y. Richard Yang 2/15/2016.
SPL/2010 Reactor Design Pattern 1. SPL/2010 Overview ● blocking sockets - impact on server scalability. ● non-blocking IO in Java - java.niopackage ●
CHAPTER 6 Threads, Handlers, and Programmatic Movement.
Introduction to Operating Systems Concepts
Module 12: I/O Systems I/O hardware Application I/O Interface
Threaded Programming in Python
Thread Pools (Worker Queues) cs
Advanced Operating Systems CIS 720
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Thread Pools (Worker Queues) cs
Advanced Topics in Concurrency and Reactive Programming: Asynchronous Programming Majeed Kassis.
Processes and Threads Processes and their scheduling
Lecture 21 Concurrency Introduction
Network Applications: Operational Analysis; Load Balancing among Homogeneous Servers Y. Richard Yang 10/10/2017.
Network Applications: High-performance Server Design & Operational Analysis Y. Richard Yang 10/5/2017.
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
Threaded Programming in Python
Capriccio: Scalable Threads for Internet Services
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Why Threads Are A Bad Idea (for most purposes)
Y. Richard Yang 10/9/2018 Network Applications: High-Performance Server Design (Async Select NonBlocking Servers)
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
Module 12: I/O Systems I/O hardwared Application I/O Interface
Thread per client and Java NIO
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Network Applications: High-performance Server Design: Async Servers/Operational Analysis Y. Richard Yang 2/24/2016

Admin r Assignment three posted. r Decisions m Projects or exam 2 m Date for exam 1 2

Recap: Async Network Server r Basic ideas: non-blocking operations 1. peek system state (using a select mechanism) to issue only ready operations 1. asynchronous initiation (e.g., aio_read) and completion notification (callback) 3

Recap: Async Network Server using Select r Example: select system call to check if sockets are ready for ops. 4 server TCP socket space state: listening address: {*.6789, *:*} completed connection queue : C1; C2 sendbuf: recvbuf: state: listening address: {*.25, *:*} completed connection queue: sendbuf: recvbuf: state: established address: { :6789, } sendbuf: recvbuf: state: established address: { :6789, } sendbuf: recvbuf: Completed connection recvbuf empty or has data sendbuf full or has space

Recap: Async Network Server using Select r A event loop issues commands, waits for events, invokes handlers (callbacks) Event Loop Event Handlers // clients register interests/handlers on events/sources while (true) { - ready events = select() /* or selectNow(), or select(int timeout) */ - foreach ready event { switch event type: accept: call accept handler readable: call read handler writable: call write handler }

Recap: Need to Manage Finite State Machine 6 Read input Read +Write Init Read data Write Read close Idle Accept Client Connection Read Request Find File Send Response Header Read File Send Data

Another Finite State Machine 7 !RequestReady !ResponseReady RequestReady !ResponseReady InitInterest =READ Request complete (find terminator or client request close) Interest= - RequestReady ResponseReady Generating response ResponseReady Interest=Write Closed Read from client channel Write response ResponseSent Interest= -

Finite State Machine Design r EchoServerV2: m Mixed read and write r Example last slide: staged m First read request and then write response r Choice depends on protocol and tolerance of complexity, e.g., m HTTP/1.0 channel may use staged m HTTP/1.1/2/Chat channel may use mixed 8

Toward More General Server Framework r Our example EchoServer is for a specific protocol r A general async/io programming framework tries to introduce structure to allow substantial reuse m Async io programming framework is among the more complex software systems m We will see one simple example, using EchoServer as a basis 9

A More Extensible Dispatcher Design r Fixed accept/read/write functions are not general design r Requirement: map from key to handler r A solution: Using attachment of each channel  Attaching a ByteBuffer to each channel is a narrow design for simple echo servers m A more general design can use the attachment to store a callback that indicates not only data (state) but also the handler (function) 10

A More Extensible Dispatcher Design r Attachment stores generic event handler m Define interfaces IAcceptHandler and IReadWriteHandler m Retrieve handlers at run time 11 if (key.isAcceptable()) { // a new connection is ready IAcceptHandler aH = (IAcceptHandler) key.attachment(); aH.handleAccept(key); } if (key.isReadable() || key.isWritable()) { IReadWriteHandler rwH = IReadWriteHandler)key.attachment(); if (key.isReadable()) rwH.handleRead(key); if (key.isWritable()) rwH.handleWrite(key); }

Handler Design: Acceptor r What should an accept handler object know? m ServerSocketChannel (so that it can call accept) Can be derived from SelectionKey in the call back m Selector (so that it can register new connections) Can be derived from SelectionKey in the call back m What ReadWrite object to create (different protocols may use different ones)? Pass a Factory object: SocketReadWriteHandlerFactory 12

Handler Design: ReadWriteHandler r What should a ReadWrite handler object know? m SocketChannel (so that it can read/write data) Can be derived from SelectionKey in the call back m Selector (so that it can change state) Can be derived from SelectionKey in the call back 13

Class Diagram of SimpleNAIO 14 Dispatcher … IChannelHandler handleException(); IAcceptHandler handleAccept(); IReadWriteHandler handleRead(); handleWrite(); getInitOps(); Acceptor implements EchoReadWriteHandler handleRead(); handleWrite(); getInitOps(); ISocketReadWriteHandlerFactory createHandler(); 1 EchoReadWriteHandlerFactory createHandler();

Class Diagram of SimpleNAIO 15 Dispatcher … IChannelHandler handleException(); IAcceptHandler handleAccept(); IReadWriteHandler handleRead(); handleWrite(); getInitOps(); Acceptor implements EchoReadWriteHandler handleRead(); handleWrite(); getInitOps(); ISocketReadWriteHandlerFactory createHandler(); 1 EchoReadWriteHandlerFactory createHandler(); NewReadWriteHandler handleRead(); handleWrite(); getInitOps(); NewReadWriteHandlerFactory createHandler();

SimpleNAIO r See AsyncEchoServer/v3/*.java 16

Discussion on SimpleNAIO r In our current implementation (Server.java) Create dispatcher 2. Create server socket channel and listener 3. Register server socket channel to dispatcher 4. Start dispatcher thread Can we switch 3 and 4?

Extending SimpleNAIO r A production network server often closes a connection if it does not receive a complete request in TIMEOUT r One way to implement time out is that m the read handler registers a timeout event with a timeout watcher thread with a call back m the watcher thread invokes the call back upon TIMEOUT m the callback closes the connection Any problem? 18

Extending Dispatcher Interface r Interacting from another thread to the dispatcher thread can be tricky r Typical solution: async command queue 19 while (true) { - process async. command queue - ready events = select (or selectNow(), or select(int timeout)) to check for ready events from the registered interest events of SelectableChannels - foreach ready event call handler }

Question r How may you implement the async command queue to the selector thread? 20 public void invokeLater(Runnable run) { synchronized (pendingInvocations) { pendingInvocations.add(run); } selector.wakeup(); }

Question r What if another thread wants to wait until a command is finished by the dispatcher thread? 21

22 public void invokeAndWait(final Runnable task) throws InterruptedException { if (Thread.currentThread() == selectorThread) { // We are in the selector's thread. No need to schedule // execution task.run(); } else { // Used to deliver the notification that the task is executed final Object latch = new Object(); synchronized (latch) { // Uses the invokeLater method with a newly created task this.invokeLater(new Runnable() { public void run() { task.run(); // Notifies synchronized(latch) { latch.notify(); } } }); // Wait for the task to complete. latch.wait(); } // Ok, we are done, the task was executed. Proceed. }

Recap: Async Network Server r Basic idea: non-blocking operations 1. peek system state (select) to issue only ready operations 1. asynchronous initiation (e.g., aio_read) and completion notification (callback) 23

Alternative Design: Asynchronous Channel using Future/Listener r Java 7 introduces ASynchronousServerSocketChannel and ASynchornousSocketChannel beyond ServerSocketChannel and SocketChannel m accept, connect, read, write return Futures or have a callback. Selectors are not used 24 s/AsynchronousServerSocketChannel.html nels/AsynchronousSocketChannel.html

SocketAddress address = new InetSocketAddress(args[0], port); AsynchronousSocketChannel client = AsynchronousSocketChannel.open(); Future connected = client.connect(address); ByteBuffer buffer = ByteBuffer.allocate(100); // wait for the connection to finish connected.get(); // read from the connection Future future = client.read(buffer); // do other things... // wait for the read to finish... future.get(); // flip and drain the buffer buffer.flip(); WritableByteChannel out = Channels.newChannel(System.out); out.write(buffer); class LineHandler implements CompletionHandler public void completed(Integer result, ByteBuffer buffer) { buffer.flip(); WritableByteChannel out = Channels.newChannel(System.out); try { out.write(buffer); } catch (IOException ex) { System.err.println(ex); public void failed(Throwable ex, ByteBuffer attachment) { System.err.println(ex.getMessage()); } ByteBuffer buffer = ByteBuffer.allocate(100); CompletionHandler handler = new LineHandler(); channel.read(buffer, buffer, handler);

Extending FSM r In addition to management threads, a system may still need multiple threads for performance (why?) m FSM code can never block, but page faults, file io, garbage collection may still force blocking m CPU may become the bottleneck and there maybe multiple cores supporting multiple threads (typically 2 n threads) 26 Handle Accept Handle Read Handle Write Event Dispatcher AcceptReadableWritable

Summary: Architecture r Architectures m Multi threads m Asynchronous m Hybrid r Assigned reading: SEDA 27

Problems of Event-Driven Server r Obscure control flow for programmers and tools r Difficult to engineer, modularize, and tune r Difficult for performance/failure isolation between FSMs

Another view r Events obscure control flow m For programmers and tools ThreadsEvents thread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } RequestHandler(struct session *s) { …; CacheHandler.enqueue(s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); }... ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } Accept Conn. Write Response Read File Read Request Pin Cache Web Server Exit [von Behren]

State Management ThreadsEvents thread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s); } pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s); } CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s); } RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s); }... ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); } AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); } Accept Conn. Write Response Read File Read Request Pin Cache Web Server Exit r Events require manual state management r Hard to know when to free m Use GC or risk bugs [von Behren]

Summary: The High-Performance Network Servers Journey r Avoid blocking (so that we can reach bottleneck throughput) m Introduce threads r Limit unlimited thread overhead m Thread pool, async io r Coordinating data access m synchronization (lock, synchronized) r Coordinating behavior: avoid busy-wait m Wait/notify; select FSM, Future/Listener r Extensibility/robustness m Language support/Design for interfaces 31

Beyond Class: Design Patterns r We have seen Java as an example r C++ and C# can be quite similar. For C++ and general design patterns: m tutorial4.pdf m 32

Some Questions r When is CPU the bottleneck for scalability? m So that we need to add helpers r How do we know that we are reaching the limit of scalability of a single machine? r These questions drive network server architecture design 33

Operational Analysis r Relationships that do not require any assumptions about the distribution of service times or inter-arrival times. r Identified originally by Buzen (1976) and later extended by Denning and Buzen (1978). r We touch only some techniques/results m In particular, bottleneck analysis r More details see linked reading 34

Under the Hood (An example FSM) CPU File I/O I/O request start (arrival rate λ) exit (throughput λ until some center saturates) Memory cache network

Operational Analysis: Resource Demand of a Request 36 CPU Disk Network V CPU visits for S CPU units of resource time per visit V Net visits for S Net units of resource time per visit V Disk visits for S Disk units of resource time per visit Memory V Mem visits for S Mem units of resource time per visit

Operational Quantities r T: observation interval Ai: # arrivals to device i r Bi: busy time of device i Ci: # completions at device i r i = 0 denotes system 37

Utilization Law r The law is independent of any assumption on arrival/service process r Example: Suppose NIC processes 125 pkts/sec, and each pkt takes 2 ms. What is utilization of the network NIC? 38

Deriving Relationship Between R, U, and S for one Device r Assume flow balanced (arrival=throughput), Little’s Law: r Assume PASTA (Poisson arrival--memory-less arrival--sees time average), a new request sees Q ahead of it, and FIFO r According to utilization law, U = XS 39