Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.

Similar presentations


Presentation on theme: "Modularity and Costs Greg Busby Computer Science 614 March 26, 2002."— Presentation transcript:

1 Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

2 Problem 1 – Complexity Protocols are necessary to do network communications Both ends must agree on format to exchange messages Communication protocols are complex Using several protocols together is even more complex

3 Solution 1 – Layers Implement each protocol independently Allows cleaner implementation Layer protocols Maintains modularity Reduces complexity – no need to understand interactions between protocols

4 Problem 2 – Delays Messages get larger as additional headers are added at each layer Processing overhead for switch between layers Need to wait for one protocol to finish before starting the next I/O overhead with multiple writes to memory as buffers are stored between layers

5 Solution 2 – Improve Performance Will discuss several approaches, including pros and cons of each: x-Kernel: Puts entire communication system directly in the kernel with specific objects and support routines Integrated Layer Processing (ILP): Integrates protocol layers to reduce task switching and memory writes Protocol Accelerator (PA): Reduces total data to send and shortens critical path of code between messages

6 x-Kernel Defines a uniform set of abstractions for protocols Structures protocols for efficient interaction in the common case Supports primitive routines for common protocol tasks

7 x-Kernel Architecture Provides objects for protocols, sessions, and messages Creates a kernel for a specific set of protocols (static) Instantiates sessions for each protocol as needed (dynamic) Messages are active objects that move through protocol/sessions Provides specific support routines TCP ETH IP UDP

8 x-Kernel Objects Protocols Create sessions Demux messages received Sessions Represent connections Created and destroyed when connections made/terminated Messages Contain the data itself Passed from level to level

9 x-Kernel Primitives Buffer managers Allocate, concatenate, split, and truncate Operate in local process heap Map managers Add, remove, and map bindings for protocols Event managers Provide timers to allow timeouts

10 x-Kernel Performance 2-3 x faster than Unix overall Unix cost is primarily due to sockets Protocol performance is comparable Conclusion: architecture is the difference

11 x-Kernel Conclusions Pros: Architecture simplifies the implementation of protocols Uniformity of interface between protocols makes protocol performance predictable and reduces overhead between protocols Possible to write efficient protocols by tuning the underlying architecture Don’t need to know exact protocol stack Cons: Requires new compilation of the kernel for each new set of protocols Doesn’t reduce message size (headers) or sequentiallity of processes Primarily useful as a research tool for protocol implementation, not to improve performance per se.

12 Integrated Layer Processing (ILP) Reduces protocol layers by integrating processing Tunes performance to increase caching and avoid memory I/O Eliminates redundant copies (similar to U-Net’s shared memory)

13 ILP Architecture Combine protocol-specific manipulations in a single loop where possible Process small pieces to make use of processor on-board caching Put as much processing as possible in- line (macros) versus function calls

14 ILP Loop Combine marshalling (encoding), encryption, and checksumming Work in memory, reduce copying Reduces steps from 5 to 2 (increased processing at step 1) Application Data Application Data Kernel Buffer TCP Buffer Application Data TCP Buffer Kernel Buffer 1. Marshalling (r/w) 2. Encryption (r/w) 3. Copying (r/w) 4. Checksum (r) 5. System copy (r/w)2. System copy (r/w) 1. Marshalling ILP Send Non-ILP Send encryption, and checksumming (r/w)

15 ILP Processing (send) Divide message into small parts Begin marshalling and encryption on part B, then C… Process part A once length is known Finish protocol-specific processing Doesn’t work if A must be processed first (ordering-constrained) RPC Header Data Length align. TCP Header Part APart CPart B marshalling, encryption checksum

16 ILP Performance Processing reduction of 20-25% Throughput improvement of 10-15% Actually reduces cache usage, although designed to optimize it Performance gains can easily be masked by using strong encryption which drastically increases processing Conclusion: performance results were such that use is “debatable in existing communication systems…”

17 ILP Conclusions Pros Decreased memory access up to 30% Slightly improved performance Cons Only applicable with non-ordering constrained functions Requires macros to increase speed, reducing flexibility Protocol stack must be known before- hand

18 The Protocol Accelerator (PA) Reduces header overhead by sending non- changing protocol headers only once Further reduces total bytes by packing other header information across protocols Reduces layered protocol processing overhead by splitting processing of header and data (canonical processing)

19 PA Header Reduction Four classes of Header Information Connection Identification – don’t change during session Protocol-specific Information – depends only on protocol state, not on message Message-specific Information – depends on contents of message but not protocol state Gossip – included because overhead is small, but optional Connection Cookies 8-byte field that replaces the Connection Identification information

20 PA Message Format Connection Cookie suffices for Connection ID on 2 nd & later messages Packing information explained below Gossip is optional but useful Connection cookie (62 bit number) Connection Identification (first message) Protocol-specific Information Message-specific Information Gossip (optional) Packing Information (if packed) Application Data Connection Id Present bit Byte order bit (big- or little-endian)

21 PA Processing Reduction Canonical Protocol Processing – Breaks processing in a protocol layer into 2 parts Pre-processing Phase – build or check message header without changing protocol state Post-processing Phase – update protocol state; attempt to do this after message is sent or delivered Pre-processing at every layer done before post- processing at any layer

22 PA Processing Reduction (cont.) Header Prediction Use post-processing phase to predict formation of next header Packet Filters A pre-pre-processor that checks or ensures header correctness without invoking protocol where possible; invokes protocol if necessary Message packing Pack backlogged messages together if application gets ahead – reduces space and processing since checksums etc. calculated only once

23 PA Processing (send) Check backlog; queue and exit if any Create packing and predicted header, add to message data Run packet filter to create message-specific data (and gossip, if any) Push to protocol if necessary Push connection cookie onto front of message and send Pass to protocol stack for post-processing to update protocol state Application Network Protocol Stack PackerUnpacker PA PreSend PreDeliver

24 PA Performance Can gain an order of magnitude improvement over pure layered protocols Maximal throughput achieved by reducing garbage collection and doing post-processing while messages are “on the wire” Conclusion: Useful in improving performance as long as PA is used on both ends of

25 PA Conclusions Pros Eliminates much of the overhead of layered protocols Significant speed improvement Canonical processing applicable in any case Cons Can’t communicate with non-PA peer Specific PA needed for set of protocols No fragmentation of messages, so only works on small messages

26 Summary Protocols are layered to improve modularity and reduce complexity This reduces performance Improving performance reduces modularity Requires foreknowledge of protocol stack Approaches Increase use of kernel (x-Kernel) Integrate processing of all layers together (ILP) Reduce message size and speed critical path (PA) All improve performance, but only PA results in significant improvement.


Download ppt "Modularity and Costs Greg Busby Computer Science 614 March 26, 2002."

Similar presentations


Ads by Google