Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.

Slides:



Advertisements
Similar presentations
1 Verification by Model Checking. 2 Part 1 : Motivation.
Advertisements

Introduction to IPv6 Presented by: Minal Mishra. Agenda IP Network Addressing IP Network Addressing Classful IP addressing Classful IP addressing Techniques.
IPv4 - The Internet Protocol Version 4
Umut Girit  One of the core members of the Internet Protocol Suite, the set of network protocols used for the Internet. With UDP, computer.
Chapter 7 – Transport Layer Protocols
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
EEC-484/584 Computer Networks Lecture 14 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
Introduction to Transport Layer. Transport Layer: Motivation A B R1 R2 r Recall that NL is responsible for forwarding a packet from one HOST to another.
EEC-484/584 Computer Networks Lecture 14 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
Stream Control Transmission Protocol 網路前瞻技術實驗室 陳旻槿.
Exploiting Packet Header Redundancy for Zero Cost Dissemination of Dynamic Resource Information Peter A. Dinda Prescience Lab Department of Computer Science.
1 Internet Networking Spring 2002 Tutorial 2 IP Checksum, Fragmentation.
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
IP-UDP-RTP Computer Networking (In Chap 3, 4, 7) 건국대학교 인터넷미디어공학부 임 창 훈.
Gursharan Singh Tatla Transport Layer 16-May
Leveraging State Information for Automated Attack Discovery In Transport Protocol Implementations Samuel Jero, Hyojeong Lee, and Cristina Nita-Rotaru Purdue.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Protocols and the TCP/IP Suite
Bug Localization with Machine Learning Techniques Wujie Zheng
Security Assessment of the Transmission Control Protocol (TCP) (draft-ietf-tcpm-tcp-security-02.txt) Fernando Gont project carried out on behalf of UK.
University of the Western Cape Chapter 12: The Transport Layer.
Packet Filtering Chapter 4. Learning Objectives Understand packets and packet filtering Understand approaches to packet filtering Set specific filtering.
ECE 526 – Network Processing Systems Design Packet Processing I: algorithms and data structures Chapter 5: D. E. Comer.
Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Institute of Technology Sligo - Dept of Computing Chapter 12 The Transport Layer.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Chapter 3 TCP and IP 1 Chapter 3 TCP and IP. Chapter 3 TCP and IP 2 Introduction Transmission Control Protocol (TCP) User Datagram Protocol (UDP) Internet.
Submitted To: Submitted By: Seminar On Parasitic Computing.
1 Layer 3: Routing & Addressing Honolulu Community College Cisco Academy Training Center Semester 1 Version
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Automatically Labeled Data Generation for Large Scale Event Extraction
Chapter 3 TCP and IP Chapter 3 TCP and IP.
Automatic Network Protocol Analysis
An IPv6 Flow Label Specification Proposal
CRF &SVM in Medication Extraction
Technology Mapping into General Programmable Cells
Long-haul Transport Protocols
Internet Networking Spring 2002
TCP-in-UDP draft-welzl-irtf-iccrg-tcp-in-udp-00.txt
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Introduction of Transport Protocols
IP - The Internet Protocol
Automated Attack Discovery in TCP Congestion Control using a Model-guided Approach Samuel Jero1, Endadul Hoque2, David Choffnes3, Alan Mislove3, and Cristina.
CSc4730/6730 Scientific Visualization
IP - The Internet Protocol
iSRD Spam Review Detection with Imbalanced Data Distributions
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CSCD 330 Network Programming
CSE S. Tanimoto Paradigms
IP - The Internet Protocol
Building A Network: Cost Effective Resource Sharing
Statistical NLP Spring 2011
Network Architecture Models: Layered Communications
Jana Diesner, PhD Associate Professor, UIUC
Internet Traffic Classification Using Bayesian Analysis Techniques
IP - The Internet Protocol
Extracting Why Text Segment from Web Based on Grammar-gram
Embedding based entity summarization
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Presentation transcript:

Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru

Motivation Can we automate it? Network protocol implementations have a long history of bugs and attacks Testing protocol implementations by injecting packets Grammar-based Fuzzing - Use the protocol logic when injecting packets A case for NLP - It requires a big manual effort to work correctly Can we automate it? Computer networks are at the center of most -if not all- modern applications. There are thousands of different network protocol implementations, they are complex, highly optimized and with a long history of implementation bugs and attacks. Protocol fuzzing is among the techniques that have come up to test these protocols. The main idea behind fuzzing is to generate and inject packets into the protocol stream to find bugs and other vulnerabilities. For this method to be successful, the generated packets need to be crafted carefully. With this in mind, grammar-based fuzzing attempts to generae attacks based on a grammar that encodes protocol semantics. The main drawback of this method, is that it equires a lot of manual effort, reading all specification documents and providing the input. The main goal of our work, is to try to automate this process by relying on NLP techniques

Grammar-based Fuzzing Example TCP Packet Header A TCP Packet contains a checksum of the rest of the header. To test other vulnerabilities, we need to pass the checksum check.

Manual Grammar Extraction Effectiveness fuzzing depends on correctly capturing the protocol logic Different protocols have different logic Manual effort to specify protocol grammar Untapped resource - Natural language specification documents (RFCs) Two Goals: Minimize manual effort used Adapt to new protocols without re-training

Automating Grammar Extraction A grammar is composed of a set of fields that correspond to the header and properties associated to those fields. We identify two NLP problems: Type Extraction - Given a protocol document, extract the set of protocol field and property symbols Symbol Identification and Linking - Identify mentions of these symbols in text, and link field mentions to their relevant properties.

Zero-Shot Learning for Symbol Identification A fully supervised approach would require a separate classifier for each protocol Chunk->Type of Mention ZSL approach - learn a mapping {Type T, Chunk} -> {t,f} from a tuple containing input and output to a Boolean value indicating whether the pair is correct. Learn a similarity function between textual phrases and protocol symbols. This approach adapts to new, unseen protocols.

Example of ZSL approach for field mentions (This field, Source Port) Protocol Field Symbols: Source Port Destination Port … Checksum Urgent Pointer Options Chunked Text: [This field] [is] [only] [be interpreted] [in] [segments] [with] [the URG control bit set] (This field, Dest. Port) (This field, Checksum) (This field, Urgent Pointer) (This field, Options)

System Design Training Model Model Fuzzer Pre- process Extract Types Train Classifier Model Training TCP, SCTP, IPv6, IP, GRE Pre- process Extract Types Model Post- process Fuzzer DCCP Protocol Grammar

Extraction Example Extract: Header_Length(Data_Offset) [The offset] from the start of the packet’s DCCP [header] to the start of its application data area, in 32-bit words . Extract: Header_Length(Data_Offset)

Intrinsic Evaluation: Information Extraction Our Approach: Linear classifier that learns a similarity metric between text chunks and symbol types, considering character-level similarity, writing style, context words, etc. Baselines: Overlap between symbol types and text chunks Rule based systems that use our feature set RB1: weight each feature by frequency of occurrence RB2: weight each feature by majority vote

Intrinsic Evaluation: Information Extraction Table 2: Property Mentions Table 1: Field Mentions Properties can span several chunks * S-TPR: Span true positive rate * C-FPR: Chunk false positive rate Tradeoff in overlap systems is very clear: The higher the degree of overlap, the highest the precision, and the lower the recall. As we become more permissive, recall increases but precisions suffers. We can attibute this to the fact that only certain features are going to be good indicators, depending on the case. For this reason, we can benefit from a classifier.

Extrinsic Evaluation: Fuzzer Use an NLP pipeline to extract protocol grammars on a real scenario SNAKE: State-of-the-art grammar-based fuzzer for network transport protocols Evaluated Approaches: Random: no information about the protocol grammar Manual: manually created protocol grammar NLP-based: automatically extracted protocol grammar

Extrinsic Evaluation: SNAKE Fuzzer TCP DCCP Unique Pkt Type Traces Total Strategies Interesting Attacks Unique Attacks Random 13 1000 18 Manual 784 901 63 5 718 871 44 2 NLP 713 819 69 816 1022 47 Effort (blue) -- Random vs Knowledge Generated attacks (orange) -- Manual vs NLP A packet type trace records the order in which different types of packets are observed in a flow. Thus, a packet type trace succinctly summarizes a protocol connection and approximates the path traversed through the code. To effectively test a protocol, as many unique connections, or code paths, as possible should be explored. --- the next two metrics that we show correspond to the generated attacks. There are a number of attacks that we consider irrelevant for these protocols, given that TCP and DCCP do not attempt to protect against them. For this reason, we focus on the number of off-path or interesting attacks generated. From all of these attacks, many attempt to test the same underlying vulnerability. For this reason, we perform a manual analysis of all reported attacks and identify the number of unique attacks generated.

Summary and Conclusions Proposed a methodology for Information Extraction from technical documents using domain adaptation and minimal supervision Built an NLP framework to extract grammars from natural language specification documents and combined it with a state-of-the-art grammar-based fuzzer We compared our extracted grammar to manual grammars on two protocols and identified the same set of unique attacks in a fully automated manner. Promising research direction to achieve effective automated testing We find this to be a promising direction to generalize testing strategies that reduce manual effort.

Thank You