Presentation is loading. Please wait.

Presentation is loading. Please wait.

Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley.

Similar presentations

Presentation on theme: "Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley."— Presentation transcript:

1 Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley

2 Binary Analysis Is Important for Botnet Defense Botnet programs: no source code, only binary Botnet defense needs internal understanding of botnet programs – C&C reverse engineering Different possible commands, encryption/decryption – Botnet traffic rewriting – Botnet infiltration – Botnet vulnerability discovery

3 BitBlaze Binary Analysis Infrastructure: Architecture The first infrastructure: – Novel fusion of static, dynamic, formal analysis methods Loop extended symbolic execution Grammar-aware symbolic execution – Whole system analysis (including OS kernel) – Analyzing packed/encrypted/obfuscated code Vine: Static Analysis Component TEMU: Dynamic Analysis Component Rudder: Mixed Execution Component BitBlaze Binary Analysis Infrastructure

4 Dissecting Malware Dissecting Malware BitBlaze Binary Analysis Infrastructure Detecting Vulnerabilities Detecting Vulnerabilities Generating Filters Generating Filters BitBlaze: Security Solutions via Program Binary Analysis Unified platform to accurately analyze security properties of binaries Security evaluation & audit of third-party code Defense against morphing threats Faster & deeper analysis of malware

5 The BitBlaze Approach & Research Foci Semantics based, focus on root cause: Automatically extracting security-related properties from binary code for effective vulnerability detection & defense 1.Build a unified binary analysis platform for security – Identify & cater common needs of different security applications – Leverage recent advances in program analysis, formal methods, binary instrumentation/analysis techniques for new capabilities 2.Solve real-world security problems via binary analysis Extracting security related models for vulnerability detection Generating vulnerability signatures to filter out exploits Dissecting malware for real-time diagnosis & offense: e.g., botnet infiltration More than a dozen security applications & publications

6 Plans Building on BitBlaze to develop new techniques Automatic Reverse Engineering of C&C protocols of botnets Automatic rewriting of botnet traffic to facilitate botnet infiltration Vulnerability discovery of botnet

7 Preliminary Work Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering Binary code extraction and interface identification for botnet traffic rewriting Botnet analysis for vulnerability discovery

8 Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering Juan Caballero Pongsin Poosankam Christian Kreibich Dawn Song

9 Automatic Protocol Reverse-Engineering Process of extracting the application-level protocol used by a program, without the specification – Automatic process – Many undocumented protocols (C&C, Skype, Yahoo) Encompasses extracting: 1.the Protocol Grammar 2.the Protocol State Machine Message format extraction is prerequisite

10 Challenges for Active Botnet Infiltration 2.Access to one side of dialog only 1.Understand both sides of C&C protocol –Message structure –Field semantics 3.Handle encryption/obfuscation Goal: Rewrite C&C messages on either dialog side

11 Technical Contributions 1.Buffer deconstruction, a technique to extract the format of sent messages Earlier work only handles received messages 2.Field semantics inference techniques, for messages sent and received 3.Designing and developing Dispatcher 4.Extending a technique to handle encryption 5.Rewriting a botnet dialog using information extracted by Dispatcher

12 Message Format Extraction Extract format of a single message Required by Grammar and State Machine extraction GET / HTTP/1.1 HTTP/1.1 200 OK [Polyglot] [Dispatcher]

13 Message Field Tree Field Range: [3:3] Field Boundary: Fixed Field Semantics: Delimiter Field Keywords: Target: Version HTTP/1.1 200 OK\r\n\r\n MSG [0:18] Status Line [0:16] Version [0:7] Delimiter [8:8] Status-Code [9:11] Delimiter [12:12] Reason [13:14] Delimiter [15:16] Delimiter [17:18] Message format extraction has 2 steps: 1.Extract tree structure 2.Extract field attributes

14 Sent vs. Received Both protocol directions from single binary Different problems – Taint information harder to leverage – Focus on how message is constructed, not processed Different techniques needed: – Tree structure Buffer Deconstruction – Field attributes New heuristics

15 Outline Introduction Problem Techniques Buffer Deconstruction Evaluation Field Semantics Inference Handling encryption

16 Buffer Deconstruction Intuition – Programs keep fields in separate memory buffers – Combine those buffers to construct sent message Output buffer – Holds message when send function invoked – Or holds unencrypted message before encryption Recursive process – Decompose a buffer into buffers used to fill it – Starts with output buffer – Stops when theres nothing to recurse

17 Buffer Deconstruction Message field tree = inverse of output buffer structure Output is structure of message field tree – No field attributes, except range Output Buffer (19) A(17) G(2)D(1)E(3)F(1)C(8)H(2) [0:18] [0:16] [17:18] [0:7][8:8][9:11][12:12][13:14][15:16] MSG Delimiter Status Line ReasonStatus Code DelimiterVersion B(2) Delimiter HTTP/1.1 200 OK\r\n\r\n

18 Field Attributes Inference Attributes capture extra information – E.g., inter-field relationships AttributeValue Field Range[StartOffset : EndOffset] Field BoundaryFixed, Length, Delimiter Field SemanticsIP address, Timestamp, … Field Keywords Techniques identify –Keywords –Length fields –Delimiters –Variable-length field –Arrays

19 Field Semantics CookiesKeyboard input Error codesKeywords File dataLength File informationPadding FilenamesPorts Hash / ChecksumRegistry data HostnamesSleep timers Host informationStored data IP addressesTimestamps A field attribute in the message field tree Captures the type of data in the field Programs contain much semantic info leverage it! Semantics in well-defined functions and instructions – Prototype Similar to type inference Differs for received and sent messages

20 Field Semantic Inference GET /index.html HTTP/1.1 struct stat { … off_t st_size; /* total size in bytes */ … } int stat(const char*path, struct stat *buf); OUT IN HTTP/1.1 200 OK Content-Length: 25 Hello world! File path File length stat(index.html, &file_info);

21 Detecting Encoding Functions Encoding functions = (de)compression, (de)(en)cryption, (de)obfuscation… High ratio of arithmetic & bitwise instructions Use read/write set to identify buffers Work-in-progress on extracting and reusing encoding functions

22 MegaD C&C protocol type MegaD_Message = record { msg_len : uint16; encrypted_payload: bytestring &length = 8*msg_len; } &byteorder = bigendian; type encrypted_payload = record { version : uint16; mtype : uint16; data : MegaD_data (mtype); }; type MegaD_data (msg_type: uint16) = case msg_type of { 0x00 -> m00 : msg_0; […] default -> unknown : bytestring &restofdata; }; C&C on tcp/443 using proprietary encryption Use Dispatchers output to generate grammar – 15 different messages seen (7 recv, 8 sent) – 11 field semantics

23 C&C Server Cmd?EHLO MegaD Dialog Test SMTP Failed SMTP Test Server

24 Template Server C&C Server EHLOCmd? Failed MegaD Rewriting Test SMTP Get Template Template? Grammar Success SMTP Test Server

25 Summary Buffer deconstruction, a technique to extract the format of sent messages Field semantics inference techniques, for messages sent and received Designed and developed Dispatcher Extended technique to handle encryption Rewrote MegaD dialog using information extracted by Dispatcher

Download ppt "Binary Analysis for Botnet Reverse Engineering & Defense Dawn Song UC Berkeley."

Similar presentations

Ads by Google