Presentation is loading. Please wait.

Presentation is loading. Please wait.

Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd.

Similar presentations


Presentation on theme: "Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd."— Presentation transcript:

1 Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd AustinValeria Bertacco CGO, Chamonix, France April 6, 2011

2 NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005  >⅓ from viruses, network intrusion, etc. Software Errors Abound 2

3 Goals of this Work High quality dynamic software analysis  Find difficult bugs that other analyses miss Distribute Tests to Large Populations  Low overhead or users get angry Accomplished by sampling the analyses  Each user only tests part of the program 3

4 Dynamic Dataflow Analysis Associate meta-data with program values Propagate/Clear meta-data while executing Check meta-data for safety & correctness Forms dataflows of meta/shadow information 4

5 a += y z = y * 75 y = x * 1024 w = x + 42 Check w Example Dynamic Dataflow Analysis 5 validate(x) x = read_input() Clear a += y z = y * 75 y = x * 1024 x = read_input() Propagate Associate Input Check a Check z Data Meta-data

6 Distributed Dynamic Dataflow Analysis 6 Split analysis across large populations  Observe more runtime states  Report problems developer never thought to test Instrumented Program Potential problems

7 Taint Analysis (e.g.TaintCheck) Dynamic Bounds Checking FP Accuracy Verification Symbolic Execution Data Race Detection (e.g. Helgrind) Memory Checking (e.g. Dr. Memory) Problem: DDAs are Slow 7 10-200x 2-200x 10-80x 5-50x 100- 500x 2-300x

8 Our Solution: Sampling 8 Lower overheads by skipping some analyses Complete Analysis No Analysis No Analysis

9 Sampling Allows Distribution 9 Developer Beta TestersEnd Users Many users testing at little overhead see more errors than one user at high overhead.

10 a += y z = y * 75 y = x * 1024 w = x + 42 Cannot Naïvely Sample Code 10 Validate(x) x = read_input() a += y y = x * 1024 False Positive False Negative w = x + 42 validate(x) x = read_input() Skip Instr. Input Check w Check z Skip Instr. Check a

11 Sampling must be aware of meta-data Remove meta-data from skipped dataflows  Prevents false positives Our Solution: Sample Data, not Code 11

12 a += y z = y * 75 y = x * 1024 w = x + 42 Dataflow Sampling Example 12 validate(x) x = read_input() a += y y = x * 1024 False Negative x = read_input() Skip Dataflow Input Check w Check z Skip Dataflow Check a

13 Mechanisms for Dataflow Sampling (1) 13 Start with demand analysis Demand Analysis Tool Native Application Native Application Instrumented Application (e.g. Valgrind) Instrumented Application (e.g. Valgrind) Instrumented Application (e.g. Valgrind) Operating System Meta-data No meta-data

14 Mechanisms for Dataflow Sampling (2) 14 Remove dataflows if execution is too slow Sampling Analysis Tool Instrumented Application Instrumented Application Native Application Instrumented Application Instrumented Application Operating System Meta-data Clear meta-data OH Threshold

15 Prototype Setup Taint analysis sampling system  Network packets untrusted Xen-based demand analysis  Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled 15 Xen Hypervisor OS and Applications App … Linux Shadow Page Table Admin VM Taint Analysis QEMU Net Stack OHM

16 Benchmarks Performance – Network Throughput  Example: ssh_receive Accuracy of Sampling Analysis  Real-world Security Exploits 16 NameError Description ApacheStack overflow in Apache Tomcat JK Connector EggdropStack overflow in Eggdrop IRC bot LynxStack overflow in Lynx web browser ProFTPDHeap smashing attack on ProFTPD Server SquidHeap smashing attack on Squid proxy server

17 ssh_receive Performance of Dataflow Sampling 17 Throughput with no analysis

18 Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold  Lowest probability of detecting exploits 18 NameChance of Detecting Exploit Apache100% Eggdrop100% Lynx100% ProFTPD100% Squid100%

19 Accuracy with Background Tasks ssh_receive running in background 19

20 Conclusion & Future Work Dynamic dataflow sampling gives users a knob to control accuracy vs. performance Better methods of sample choices Combine static information New types of sampling analysis 20

21 BACKUP SLIDES 21

22 Outline 22 Software Errors and Security Dynamic Dataflow Analysis Sampling and Distributed Analysis Prototype System Performance and Accuracy

23 Detecting Security Errors Static Analysis  Analyze source, formal reasoning +Find all reachable, defined errors –Intractable, requires expert input, no system state Dynamic Analysis  Observe and test runtime state +Find deep errors as they happen –Only along traversed path, very slow 23

24 Security Vulnerability Example Buffer overflows a large class of security vulnerabilities 24 void foo() { int local_variables; int buffer[256]; … buffer = read_input(); … return; } Return address Local variables buffer Buffer Fill New Return address Bad Local variables If read_input() reads 200 ints If read_input() reads >256 ints buffer

25 netcat_receive Performance of Dataflow Sampling (2) 25 Throughput with no analysis

26 ssh_transmit Performance of Dataflow Sampling (3) 26 Throughput with no analysis

27 netcat_receive running with benchmark Accuracy with Background Tasks 27

28 Width Test 28


Download ppt "Highly Scalable Distributed Dataflow Analysis Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan Chelsea LeBlancTodd."

Similar presentations


Ads by Google