Outline System architecture Current work Experiments Next Steps

Outline System architecture Current work Experiments Next Steps
Progress Architecture of current work Experiments Overhead Accuracy Next Steps The first slide presents the system architecture and the second indicates the area that we are currently working on. The next slides detail our progress and show how the architecture of that work differs from the planned architecture The next slides describe the experiments and the results The final slide describes some of the things we’ll be working on next

Collaborative Learning System Architecture
Patch/Repair Code Generation Patch/Repair results Merge Constraints (Daikon) Patches Constraints Central Management System Patches Patches MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution) MPEE Application Memory Firewall Live Shield Evaluation (observe execution) Overall Architecture of the system: Items in boxes that touch are running in the same process. Boxes connected by arrows communicate via messages Dashed lines boxes indicate processes running on the same Workstation. For example (blue background), Learning runs on the same client workstation as does the instrumented application (data acq, client, MPEE, application) Data acquisition is achieved by using the client library provided by Determina as an extension to the MPEE (DynamoRIO) to instrument the application dynamically. Sample data (the value of variables at program points) is sent to a local version of Daikon that calculates constraints. The resulting constraints are sent back to a central location where they are merged by Daikon into a complete set of constraints (that are true for all client executions) The resulting constraints are used to create patches that will check the constraints and repair them when violated. The patches are distributed via the CMS to protected workstations. Results from the patches and any errors encountered at the client are fed back into the Patch/Repair code generation process and different patches and repairs are tested on different workstations. Working repairs are widely distributed and ineffective repairs discarded. The central management system (CMS) handles all communication between clients and the central services. Some additional Determina security checks and some details of patch/repair creation are not shown. … … Client Workstations (Learning) Client Workstations (Protected)

Current Work … … Patch/Repair Code Generation Patch/Repair results
Merge Constraints (Daikon) Patches Constraints Central Management System Patches Patches MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution) MPEE Application Memory Firewall Live Shield Evaluation (observe execution) Current work is focused on data acquisition (binary instrumentation) and machine learning (areas on the left that are not grayed out) … … Client Workstations (Learning) Client Workstations (Protected)

Progress Preliminary instrumentation for data acquisition
Determina client library enhanced to support data acquisition Primitive and string parameters are logged Pointers are followed one level (e.g., request.field) Debug information is used Optimizations implemented to reduce overhead Daikon enhanced Security specific invariants Merge sample data from multiple executions Community learning integration Instrumentation and learning have been tested together Shared file system used for communication The application is instrumented using the MPEE (DynamoRIO) client library. Determina made a number of enhancements to the client library to enable this work. We are using the debug information generated when compiling the application to determine the types and locations of parameters. We’ve implemented a number of optimizations to improve performance. More can (and will) be done. Daikon was enhanced with the maximum string length and printable string invariants (as discussed in October). It was also modified to merge sample data from multiple executions. As the experiments will show, we have integrated these changes and have promising initial results.

Integration Architecture
Patch/Repair Code Generation Patch/Repair results Create Constraints (Daikon) Patches Constraints Shared File System Central Management System Patches Patches MPEE Data Acquisition Client Library Apache Sample Data MPEE Data Acquisition Client Library Apache Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution) MPEE Application Memory Firewall Live Shield Evaluation (observe execution) This figure shows the preliminary architecture being used today (again in the non-grayed out areas) Preliminary data acquisition is working as planned. Constraints across multiple executions are being created, but the constraint creation is centralized rather than implemented on each client. A shared file system is being used in place of the Central Management System (CMS). Work to resolve each of these differences from the final architecture is ongoing. The CMS is currently being augmented to provide communications and the Data Acquisition code is written in such a way that it can be easily switched to the CMS when it becomes available. Work is also being done to the CMS to support running Daikon on client workstations which will distribute the workload and reduce the amount of data that needs to be transferred (calculated constraints are much smaller than the sample data). … … Client Workstations (Learning) Client Workstations (Protected)

Integration Experiments
Evaluate community effectiveness by comparing: Learning from one copy of an application Community-based learning (multiple executions) Two experiments Overhead comparison Accuracy comparison Infrastructure Apache web server (HTTPD) on Windows Variables are captured at function entry/exit A community of ten or more executions of Apache is used Each of the experiments will compare a single execution against multiple community executions. We expect both less overhead and greater accuracy by utilizing the community These experiments are small (only ten members of the community and limited executions). Both overhead and accuracy should be improved as we move to larger numbers in the community.

Overhead experiment Baseline Community Learning
Instrument 100% of Apache Time a sequence of HTTP GET operations Daikon processes the single output file Community Learning Instrument a different 10% of Apache in 10 executions Instrument a different 1% of Apache in 100 executions Each execution will create a distinct trace of part of the program The combined executions will instrument all of Apache Daikon processes all trace files This experiment compares instrumentation overhead between a single execution of Apache with 100% of the functions instrumented and multiple executions with 1% and 10% of the functions instrumented.

Overhead Results Community learning constraints match baseline constraints Instrumentation overhead is reduced significantly The chart shows the total amount of overhead (in milliseconds) added to Apache to service the requests. In the multiple execution cases, the overhead is the average time per execution. As can be seen, instrumenting only 10% of the program significantly reduces the time (almost 90%). Only instrumenting 1% further reduces the overhead – but also shows that there is a fixed cost that can’t be reduced by decreasing the percent of the program that is instrumented. Note that we expect to be able to optimize the instrumentation to significantly reduce all of these times.

Accuracy Experiment Baseline Community Learning
Instrument 100% of Apache Capture data during one HTTP operation Build constraints based on the captured data Test constraints against data captured during all operations Community Learning Capture data during ten HTTP Operations Build constraints based on two operations, three operations, … ten operations Test each set of constraints against the data captured during all operations. This experiment compares the number of false positives found when learning takes place over one execution and when learning takes place over multiple executions. The number of false positives is calculated by applying all of the captured samples (for all ten operations) against the constraints created by the subset of the samples captured in one operation, two operations, three operations, etc. Any sample that violates a constraint is a false positive

Accuracy Experiment Results
False positives are reduced as more community learning is used. Its important to note that this is a very small experiment. Learning over more executions and for longer lengths of time should (as this seems to indicate) drive the number of false positives very low.

Possible Next Steps Build constraints on the client and merge them centrally Use CMS to provide communications On the client between data acquisition and Daikon Between Daikon on the client and central processing Investigate approaches for data acquisition without debug information Test constraints against known attacks Implement simple repair algorithms Not all of the above will be done immediately, but we would expect significant progress in several of these areas.

Outline System architecture Current work Experiments Next Steps

Similar presentations

Presentation on theme: "Outline System architecture Current work Experiments Next Steps"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outline System architecture Current work Experiments Next Steps

Similar presentations

Presentation on theme: "Outline System architecture Current work Experiments Next Steps"— Presentation transcript:

Similar presentations

About project

Feedback