Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.

Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs2 Processor 2 Processor 1 Concurrent Programs  Message-Passing Style  Processes & Channels  E.g. Streaming Languages C1C3 C2 P2P3 P4 P1  Uniprocessors  Programming Convenience ─ Embedded devices ─ Network Software Stack ─ Media Processing  Multiprocessors  Exploit parallelism  Partition Processes Problem: Compile a concurrent program to run efficiently on a Uniprocessor

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs3 Compiling Concurrent Programs  Process-based Approach  Keep processes separate  Context Switch between the processes  Small executable  Sum of Processes  Significant overhead  Automata-based Approach  Treat each process as a state machine  Combine the state machines  Small Overhead  Large Executables  Potentially Exponential  One Study Compared the two approaches and found:  Compared to Process-based approach, the Automata-based Approach generates code that is ─ Twice as fast ─ 2-3 Orders of magnitude larger executable  Neither approach is satisfactory

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs4 Our Work  Our Goal: Compile Concurrent Programs  Automated using a Compiler  Low Overhead  Small Executable Size  Our Approach: Combine the two approaches  Use process-based approach to handle all cases  Use automata-based approach to speed up the common cases

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs5 Outline  Motivation  Fast Paths  Fast Paths in Concurrent Programs  Experimental Evaluation  Conclusions

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs6 Fast Paths  Path: A dynamic execution path in the program  Fast Path or Hot Path: Well-known technique  Commonly-executed Paths (Hot Path)  Specialize and Optimize (Fast Path)  Two components  Predicate that specifies the fast path  Optimized code to execute the fast path  Compilers can be used to automate it  Mostly in sequential Programs

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs7 Manually implementing Fast Paths  To achieve good performance in Concurrent programs  Start: Insert code that identifies the common case and transfer control to fast path code  Extract and optimize fast path code manually  Finish: Patch up state and return control at the end of fast path  Obvious drawbacks  Difficult to implement correctly  Difficult to maintain

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs9 Fast Path (Automata-based) Our Approach Baseline (Process-based) Test 1 a = b; b = c * d; d = 0; if (c > 0) c++; a = c; b = c * d; d = 3; if (c > 0) c++; Optimized Code 2 Abort? 3

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs10 Specifying Fast Paths  Multiple processes  Concurrent Program  Regular expressions  Statements  Conditions (Optional)  Synchronization (Optional)  Support early abort  Advantages  Powerful  Compact  Hint fastpath example { process first { statement A, B, C, D, #1; start A ? (size<100); follows B ( C D )*; exit #1; } process second {... } process third {... }

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs11 Extracting Fast Paths  Automata-based approach to extract fast paths  A Fast Path involves a group of processes  Compiler keeps track of the execution point for each of the involved processes  On exit, control is returned to the appropriate location in each of the processes Baseline: Concurrent. Fast Path: Sequential Code  Fairness on Fast Path  Embed scheduling decisions in the fast path ─ Avoid scheduling/fairness overhead on the fast path  Rely on baseline code for fairness ─ Always taken a fraction of the time

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs12 Optimization on Fast Path  Enabling Traditional Fast Paths  Generate and Optimize baseline code  Generate Fast path code ─ Fast Paths have exit/entry points to baseline code  Use data-flow information from baseline code at the exit/entry point to start analysis and optimize the fast path code  Speeding up fast path using lazy execution  Delay operations that are not needed when fast paths are executed to the end  Such operations can be performed if the fast path is aborted

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs14 Experimental Evaluation  Implemented the techniques in the paper  In ESP Compiler ─ Supports concurrent programs  Two class of programs  Filter Programs  VMMC Firmware  Answer three questions  Programming effort (annotation complexity) needed  Size of the executable  Performance

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs15 Filter Programs  Well-defined structure  Streaming applications  Use Filter Programs by Probsting et al.  Good to evaluate our technique ─ Concurrency overheads dominate  Experimental Setup  2.66 GHz Pentium 4, 1 GB Memory, Linux 2.4  4 Versions of the code  Annotation Complexity  Program sizes: 153, 125, 190, 196 lines  Annotation sizes: 7, 7, 10, 10 lines P1 C1 P2 P3 C2 P4 C3

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs16 Filter Programs Cont’d 4.1723.5228.33 9.47 5.15 5.53 Executable Size Performance Program 1 Program 2Program 3Program 4 Better Performance than Both Relatively Small Executable

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs17 VMMC Firmware  Firmware for a gigabit network (Myrinet)  Experimental Setup  Measure network performance between two machines connected with Myrinet ─ Latency & Bandwidth  3 Versions of the firmware ─ Concurrent C version with Manual Fast Paths ─ Process-based code without Fast Paths ─ Process-based code with Compiler-extracted Fast Paths  Annotation Complexity (3 fast paths)  Fast Path Specification: 20, 14, and 18 lines  Manual Fast Paths in C: 1100 lines total

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs18 VMMC Firmware Cont’d Message size (in Bytes) Performance: Latency ss Generated Code Size Assembly Instructions

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs20 Conclusions  Fast Paths in Concurrent Programs  Evaluated using Filter programs and VMMC firmware  Process-based approach to handle all cases  Keeps executable size reasonable  Automata-based approach to handle only the common cases (Fast Path)  Avoid high overhead of process-based approach  Often outperforms the automata-based code

Questions ?

Intel Labs & Princeton UniversityFast Paths in Concurrent Programs22 ABCDEF Abcdef Ghijk

Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.

Similar presentations

Presentation on theme: "Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University.

Similar presentations

Presentation on theme: "Fast Paths in Concurrent Programs Wen Xu, Princeton University Sanjeev Kumar, Intel Labs. Kai Li, Princeton University."— Presentation transcript:

Similar presentations

About project

Feedback