Network Traffic Analysis - SiLK

Network Traffic Analysis - SiLK
Paul Krystosek, PhD Matthew Heckathorn

Notices Copyright 2015 Carnegie Mellon University This material is based upon work funded and supported by the Department of Defense under Contract No. FA C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at Carnegie Mellon®, CERT®, CERT Coordination Center® and FloCon® are registered marks of Carnegie Mellon University. DM

Housekeeping Restrooms on past registration desk Breaks and lunch in same location Follow exit signs in case of emergency Ask questions any time, don’t be shy

Conference WIFI SSID: FLOCON17 PW: SEI20171

Course Objectives At the end of this module, you will have the knowledge and skills needed to perform the following tasks: Name the major components of SiLK. Retrieve network flow records using the rwfilter command. Manipulate network flow records using basic SiLK commands. Analyze traffic and profile a network using basic SiLK commands. ©2014 Carnegie Mellon University

Agenda Network flow Basic SiLK tools What is network flow
Interpreting flow records SiLK commands Basic SiLK tools SiLK Records, Files, and the Repository Analysis Tools and Categorization IP Sets

Schedule 9:00 AM SiLK Part 1 of 4
Basics of Network Flow and Unix Commands 10:45 AM Break 11:00 AM SiLK Part 2 of 4 Basics of SiLK 12:30 PM Lunch 1:30 PM SiLK Part 3 of 4 Network flow analysis with SiLK 3:15 PM 3:30 PM SiLK Part 4 of 4 More network flow analysis with SiLK 5:00 PM Adjourn 6:00 PM Welcome Reception Near reception

Setting up your analysis environment
SSH to flocon.cloudapp.net Username: demo Password: flocon2017 ssh –o PubkeyAuthentication=no

Analysis environment continued

Analysis environment – Account creation
Create your own application account Remember your information!

Analysis environment – Account information
Accounts last for 14 days The service will be shutdown on 1/25/2017 You are limited to 2 GB of Hard drive space Exceeding this limit will cause your account to be wiped Do not store anything of value on the server! All information will be wiped

Analysis environment – SiLK Training Image

Analysis environment – At the Prompt

Exercise 0: *NIX PS1='\W \!> ' # this is not permanent export SILK_IPV6_POLICY=asv4 cd /data/ ls -l silk.conf less silk.conf # type “q” to exit from less cd PS1 is Prompt String 1, and controls bash’s primary prompt string. SILK_IPV6_POLICY controls how IPv6 addresses are displayed and determines how wide IP text fields are. If SiLK is compiled with local time support instead of UTC, the following command will make SiLK behave with UTC times in commands: export TZ=UTC0 ©2014 Carnegie Mellon University

Part I: Network Flow

Part I Lessons: Network Flow
What is Network Flow? Interpreting Flow Records Issuing SiLK Commands

Lesson I.1 What is Network Flow?

Lesson I.1 Learning Objective
Given a sequence of packets and some basic knowledge of packets, the learner will be able to identify the uniflows comprising the packets.

https://malakawijeweera.wordpress.com/2011/02/06/123/
What is Network Flow? A log of all network activity; not a recording of all packets A record of metadata from related packets similar to a phone bill (call detail record) Content of messages is not recorded much, much more compact longer retention less processing increased privacy less impact from encryption ©2014 Carnegie Mellon University

What SiLK Does Investigation analysis
most useful for analysing past network events may feed an automated report generator good for forensics (what happened before the incident?)‏ Descriptive analysis – profiling/categorizing Directed analysis (hunt) – looking for specific malicious behavior Exploratory analysis – looking for the unusual Predictive Analysis ©2014 Carnegie Mellon University

Did you ever wonder… What’s on my network?
What’s on my network? What happened before the event? Where are policy violations occurring? What are the most popular web servers? By how much would volume be reduced with a blacklist? Do my users browse to known infected web servers? Do I have a spammer on my network? When did my web server stop responding to queries? Who uses my public servers? Consider the many different ways that flow can be used as part of your overall security suite. It can act as a tool to support forensic and historical analysis, it can monitor your infrastructure, help you gain situational awareness about your network, and locate some classes of security events. All these questions, and more, can be answered without examining payload content. Some POLICY VIOLATIONS that can be detected with network flow: Attempts to contact blocked sites and/or services Inside hosts as sources of inbound traffic (type int2int) Attempts to contact an open network site from a closed network. Could be seen when an open network host moved to a closed network. Exceeding bandwidth restrictions. Insider scanning outside network. Attempts to access known proxies, open DNS servers, or Tor. 21

Unidirectional Flows (Uniflows)
1 3 8 9 4 2 Each rounded rectangle represents a packet. The picture represents packets flowing in both directions on a network cable. Same-colored packets belong to the same flow. There are never packets in the same flow travelling in opposite directions. One TCP connection consists of two flows. The red and green flows appear to be related in time; they might constitute a single TCP connection (session).

Application layer message
Packet Encapsulation Ethernet frame Dest MAC address Source MAC addr Type of packet IP datagram (packet) Src IP address Dst IP address Type of segment Transport segment Application layer message (HTTP, SMTP, DNS) Src port Dest port Flags Packets are like envelopes within envelopes. The innermost envelope contains a message. On Ethernet links, an Ethernet frame carries an IP datagram, which carries a TCP or UDP segment, which carries a message for a Network Application, such as HTTP (WWW), SMTP ( ), or DNS. On non-Ethernet links there is still a frame, but it’s different than an Ethernet frame. SiLK does NOT examine the frame. Ethernet and IP have fields to describe the type of payload that they carry. IP’s “type of segment” field is called the “IP protocol” field by IP. SiLK just calls it “protocol.” TCP and UDP don’t. They have port numbers, which serve double duty as an address (like an apartment number) and as a weak indication of the payload type.

Two TCP/IP Sockets Make a Connection
TCP/IP SOCKET IP address: L4 protocol: TCP Ephemeral port # TCP/IP SOCKET IP address: L4 protocol: TCP Well-Known Port # A TCP/IP socket is not a physical jack. A TCP/IP socket is a logical entity representing the resources at one end of a TCP/IP connection. A TCP/IP socket is identified by three items: (1) An IP address which identifies a physical interface on a particular host (2) A Layer 4 (Transport layer) protocol, e.g., TCP or UDP (3) A port number, which together with the L4 protocol identifies a particular process or service within the host The L4 protocol must be the same for both sockets in the connection. The IP address may be the same or different between the two sockets. The triple (IP address, L4 protocol, port) must be different between the two sockets in the connection. One socket may participate in multiple connections, particularly on servers; however, the 5-tuple (source IP, destination IP, source port, destination port, L4 protocol) cannot be the same for two concurrent connections. It doesn’t make a lot of sense to talk about connections in a connectionless protocol like UDP, so we prefer to call these bidirectional flows. Credits: Client Process Server Process Connection ©2014 Carnegie Mellon University

Network Flow versus NetFlow
Network Flow—a generic term for the summarization of packets related to the same flow or connection into a single record NetFlow —a term that Cisco coined to describe a set of format specifications for storing network flow information in a digital record. Most common are versions 5 and 9. Often used interchangeably with Network Flow. Also written as netflow. IPFIX—a format specification from the IETF for flow records, an extension of Cisco NetFlow v9 SiLK—another set of format specifications for flow records and other related data, plus the tool suite to process that data ©2014 Carnegie Mellon University

What’s in a Record? Fields found to be useful in analysis:
source address, destination address source port, destination port (Internet Control Message Protocol [ICMP] type/code) IP [transport] protocol bytes, packets in flow accumulated TCP flags (all packets, first packet) start time, duration (milliseconds) end time (derived) sensor identity flow termination conditions application-layer protocol Much of the data collected in the SiLK environment is network flow data. Each network flow record summarizes one side of a socket connection on the network. The SiLK data includes a subset of the fields that are standard for Cisco Netflow v5, plus some additional fields described later. The main fields are the source IP address, destination IP, source and destination port numbers, and transport protocol. This set of numbers is referred to as the “5-tuple”, and uniquely defines one side of a socket connection at a particular period of time. For ICMP traffic, the destination port encodes the ICMP type and code values, which are each 8 bits in length, by packing them into the 16-bit destination port field. Some options in the SiLK tools permit filtering based on the individual type and code fields, and their display in human-readable form. The byte and packet fields simply count how many bytes and packets occurred in the particular flow being recorded. TCP flags are recorded by performing a binary OR operation on all the packets seen in any flow of protocol number 6, TCP. The result is a record of which flags were set in any of the packets in the flow. The start time and duration are time fields. The end time is calculated from the start time and duration, and also has millisecond precision (but not necessarily accuracy). The sensor ID is a field that indicates which sensor recorded the flow, and each ID number is mapped to a brief text label that is more human-readable. ©2014 Carnegie Mellon University

DNS packets viewed in Wireshark
Here is a packet sniffer’s display of a simple DNS query and response. Looking at the packet-list pane (under the toolbars) what is the same about the IP addresses of the two packets? What is different? Packet #2 is the currently selected packet. Its details are shown in the packet-details pane, below the packet-list pane. What type of computer is the DNS client? What type of computer is the DNS server? Could be glean the same information if we were looking at flow data? Wireshark is a registered trademark of the Wireshark Foundation

T I M E Sequence Diagram DNS Client 192.168.1.105 UDP port 50744
DNS Server UDP port 53 T I M E Request (type A) Response (type A) The vertical bar on the left represents the DNS Client. The bar on the right represents the server. The black arrows represent packets; they slant downward to show that they take time to travel. How many [unidirectional] flows do you see? What can you say about the port numbers being used? Are there any routers between the client and the server? Which type of DNS server is this (local/recursive or remote/iterative)? Where is it probably located in the network? Where do you suppose the flow sensor is placed to be able to record this traffic?

SiLK tool (rwcut) output
sIP| dIP|sPort|dPort|pro|packets|bytes|sensor|type| | |50744| 53| 17| | 64| S1| out| | | 53|50744| 17| | 80| S1| in| This is output from a tool converts machine-readable, machine-efficient non-text (“binary”) data into human-readable text. Each row represents one flow record. A flow record has information about a single flow.

Lesson I.1 Summary Flow records constitute a log of network activity.
Flow analysis can answer many questions without storing content. Flow records are extremely compact. Benefits are long retention faster processing reduced privacy concerns encryption is not an obstacle SiLK uses unidirectional flows—uniflows. ©2014 Carnegie Mellon University

thinking-organizationblog.com/are-you-hooked-on-hope-pium-2/
Next Lesson In lesson I.2, you will learn to interpret SiLK flow records and understand the nature of the associated network activity.

Lesson I.2 Interpreting Flow Records

Lesson I.2 Learning Objective
Given a series of uniflows and general knowledge of TCP/IP, the learner will be able to deduce and infer the nature of the network activity.

iSiLK is a trademark of Carnegie Mellon University
Network Monitoring Internet Other internetwork sensor sensor sensor SiLK Console sensor The green network of interest usually has sensors at the perimeter, where it connects to other internetworks such as the Internet. Sensors can also be placed in the interior of the network of interest (the monitored network) to monitor traffic that doesn’t leave the network of interest. A SiLK console or an iSiLK client workstation can query the SiLK repository to answer questions, such as: What is the volume of traffic between my network and the Internet? How many connections were there last week from outside to my server? What devices query external DNS? SiLK repository terminal iSiLK™ iSiLK is a trademark of Carnegie Mellon University

Realistic Sequence Diagram
DNS Client UDP port 50744 Local Server Sensor Root Server Request (type A) Dest port 53 .com Server Additional information can be placed with the arrows, such as port numbers, flags, etc. How many flows do you see? The traffic to/from the 3 outside DNS servers may or may not actually occur, depending on what information is cached. Also, these hosts may or may not be included in the diagram depending upon whether their activity is interesting to analysts. When the Local Server receives requests from, and replies to, the client it uses port 53. When the Local Server communicates with other Name Servers it uses an ephemeral port. Response (type A) .mudynamics.com Server Src port 53

What is this? — 1 sIP| dIP|sPort|dPort|pro|packets|flags|initF| type|
| |50744| 53| 17| | | | out| | | 53|50744| 17| | | | in| | |49152| 80| 6| | SRPA| S |outweb| | | 80|49152| 6| | S PA| S A| inweb| How many flows (uniflows) do you see? How many conversations (sessions, biflows)? How many connections (depends upon your definition)? What is the nature of the activity in the first session? What is the nature of the activity in the second session? Are the port numbers the only giveaway?

HTTP Sequence Diagram HTTP Client 192.168.1.105
HTTP Server DNS Server DNS Request (type A) DNS Response (type A) SYN SYN, ACK ACK GET How many uniflows do you see? ACK Okay RST

What Is This? — 2 sIP| dIP|sPort|dPort|pro|packets| bytes|flags|
| |52415| 25| 6| | 14045|FSRPA| | | 25|52415| 6| | 1283|FS PA| | |52415| 25| 6| | 40| R | At this point, we will begin looking at some sample flow data. As each of these examples is shown, pause the presentation for a short time to get familiar with the data before continuing on. What is shown in this example? Can you tell which IP address appears to be the client and which is the server? We see communications on port 25, which means this is probably (SMTP) traffic. IP address appears to be the server since it is hosting the SMTP port 25 service. Since IP address is using a high numbered ephemeral port, it is the SMTP client. Take a look at the byte volume. There’s a big difference between the client and the server—the client set about ten times as much data as the server, a total of about 14k bytes. That makes sense if we consider to be sending a message to What about the RST packets? Well, they’re sent from the client to the server, and it seems like they’re sent after the FIN connection teardown has completed. Although we can’t tell exactly what is going on here, it’s not uncommon to see spurious RST packets at the end of a TCP session, particularly with high- volume clients and servers. 38

What Is This? — 3 sIP| dIP|pro|packets|bytes| sTime|
What do you notice in this set of flow records? Does this look like normal traffic to you? Taking a quick look at the IP addresses, you see that one address—the source address—remains the same, while the destination address changes. In fact, you’ll notice that while the destination address changes, it only changes within a given range of addresses; the first two octets of the destination address remain the same while the second to octets change. It looks like the IP address is scanning the network. This is a scan or sweep. But can you tell anything else about the scan? Let’s take a closer look. The protocol is always 1. That means this is an ICMP scan. Each scan target receives two packets for a total of 122 bytes. That means each packet is probably half that, or 61 bytes. Each packet probably has a 40- byte IP header and four bytes of ICMP header, leaving room for 17 bytes of data. Finally, look at the packet timing. We see about two to four packets per second. Is this a fast scan, or maybe even a denial-of-service attack? Not at that rate. Considering everything we’ve observed, this just looks like routine scan activity. 39

What Is This? — 4 sIP| dIP|sPort|dPort|pkts| bytes|flags| sTime|
| |40936| 80| 83| 3512|FS PA|2010/12/08T11:00:01| | | 80|40936| 84|104630|FS PA|2010/12/08T11:00:01| | |40938| 80| 120| 4973|FS PA|2010/12/08T11:00:04| | | 80|40938| 123|155795|FS PA|2010/12/08T11:00:05| | |56172| 80| 84| 3553|FS PA|2010/12/08T12:00:02| | | 80|56172| 83|103309|FS PA|2010/12/08T12:00:02| | |56177| 80| 123| 5093|FS PA|2010/12/08T12:00:05| | | 80|56177| 124|157116|FS PA|2010/12/08T12:00:05| Next example. At first glance, this seems a bit more confusing than our last example. Our first clue in unraveling this traffic is to look at the ports. You should quickly recognize port 80, the port used for normal web or HTTP traffic. Every flow record on the list uses this port, so it’s probably all HTTP traffic. Looking at the first record, since 80—the service port—is the destination port, that means that the destination IP address is the server. So we know that is service HTTP traffic. Similarly, looking at the high port, the ephemeral port, we identify the client as Also notice that the ephemeral port increases with new connections, which is to be expected. We see three separate connections from the client to the web server all within 20 seconds, and then three later connections. Each connection has normal flags—SYN, ACK and FIN—and a reasonable number of packets. All in all, the first set of six flows—three TCP connections—looks like a standard web page loading and then pulling down some included reference data like images or style sheets. Finally, take a look at all the flows together. Everything’s pretty similar, except for the timing. There’s repetitive behavior here, but it’s pretty slow—there’s 3 seconds between the 2nd and 3rd packets and between the 6th and 7th, but almost an hour between the 4th and 5th—and the timing is not exact. This does not look like system behavior; instead, it looks like normal user interaction with a web site: click, read, click, read, click. Overall, this looks like normal web browsing activity. 40

Standard SiLK Types Internal network Sensor External network Null
outweb, outicmp, out Internal network Sensor External network int2int ext2ext inweb, inicmp, in outnull innull Null other* The router is the sensor. Modern sensors are more likely to run YAF. The actual types defined for a particular SiLK installation depend upon configuration. Int2int traffic originates and terminates on the internal network; where it goes in between is unknown, but it certainly gets as far as the sensor; it may go further. Ext2ext similarly originates and terminates on the external network. Innull is for inbound traffic that doesn’t get pas the router/sensor; this happens because the packets are dropped by an ACL on the router, or because the packet was destined for the router itself (such as routing protocol traffic or router management traffic). Outnull is similar, but for outbound packets. Innull and outnull types are usually assigned only if the sensor is a router. *to/from network that is neither internal nor external ©2014 Carnegie Mellon University

More complete SiLK Types
outweb, outicmp, out, ipv6out Internal network Sensor External network int2int, ipv6int2int ext2ext, ipv6ext2ext inweb, inicmp, in, ipv6in outnull innull Null The router is the sensor. Modern sensors are more likely to run YAF. The actual types defined for a particular SiLK installation depend upon configuration. ICMP traffic can be categorized as inicmp and outicmp or categorized as in, out ipv6in, or ipv6out. Int2int traffic originates and terminates on the internal network; where it goes in between is unknown, but it certainly gets as far as the sensor; it may go further. Ext2ext similarly originates and terminates on the external network. Innull is for inbound traffic that doesn’t get past the router/sensor; this happens because the packets are dropped by an ACL on the router, or because the packet was destined for the router itself (such as routing protocol traffic or router management traffic). Outnull is similar, but for outbound packets. Innull and outnull types are usually assigned only if the sensor is a router. ©2014 Carnegie Mellon University

Common Types in SiLK Type Description inweb, outweb
Inbound/outbound TCP ports 80, 443, 8080 innull, outnull Inbound/outbound filtered traffic inicmp, outicmp Inbound/outbound IP protocol 1 in, out Inbound/outbound not in above categories int2int, ext2ext Internal to internal, external to external other Source not internal or external, or destination not internal, external, or null With a router, interfaces and/or VLANs are defined as internal, external, or other. The direction of a flow is determined by the ingress interface/VLAN and the egress interface/VLAN. For a non-router flow generator, address blocks are defined as internal, external, or other. The direction of a flow is determined by the ingress address block and the egress address block. Names in bold are often default types, confirm with: rwsiteinfo --fields=default-type ©2014 Carnegie Mellon University

Lesson I.2 Summary Sensor placement affects what is seen or not seen in flow records. We learned to interpret network activity from flow records. A class of SiLK sensors uses a particular set of record types.

Next Lesson In lesson I.3 we will learn how to issue *NIX commands, how to obtain online help for SiLK commands, and how to obtain information about the SiLK repository using a SiLK command.

Lesson I.3 Issuing SiLK Commands
Lesson I.3 Issuing SiLK Commands

Lesson I.3 Learning Objectives
The learner will be able to issue simple SiLK commands correctly. The learner will be able to obtain online help text for SiLK commands and other *NIX commands. The learner will be able to obtain information about a SiLK sensor network and a SiLK flow-record repository.

*NIX commands System prompt Info + prompt character e.g., ~ 101>
User command command name rwfilter (case sensitive) options -h --help -k2 --key=2 arguments results.rw redirections > >> < << pipe | For example: rwcut --all-fields results.rw >results.txt rwcut --fields=1-6 results.rw | more Up arrow brings back a previous command, so it can be edited and re-issued. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries UNIX is a registered trademark of The Open Group ©2014 Carnegie Mellon University

Some standard *NIX commands
ls – list name and attributes of files and directories cd – change the current working directory cat – output the contents of a file head – output the first lines of a file echo – output the argument more and less – display a file one page at a time cut – output only selected fields of a file sort – reorder the records (lines) of a file wc – word count (optionally, character and line count) of a file exit – logout and terminate a terminal window ©2014 Carnegie Mellon University

*NIX Standard Streams Standard In (stdin) – where normal (especially interactive) input comes from Standard Out (stdout) – where normal/expected (especially interactive) output goes to Standard Error (stderr) – where messages (especially unexpected) go to Defaults: stdin – keyboard stdout – screen/window stderr – screen/window Defaults are overridden by redirections and pipes ©2014 Carnegie Mellon University

SSH is the registered trademark of SSH Communications Security Corp
Shell Scripts Put a complicated command, pipeline, or sequence of pipelines into a script file. It saves your commands for reuse or learning. It eases making changes. Create your script with nano or vi (vim). Vi or vim can be found on every Linux/UNIX (*NIX) system. Name your shell script something like dothis.sh Add execute permission chmod +x dothis.sh Execute (run) your script: ./dothis.sh This should mention execute permission and chmod SSH is the registered trademark of SSH Communications Security Corp ©2014 Carnegie Mellon University

Exercise 1: Use a few relevant Linux commands
Create a new directory, change to it and use the echo command with redirect “>” and append “>>” to create a file. Then examine it with ls, cat and wc mkdir ex1 cd ex1 echo > adr1.txt echo >> adr1.txt ls adr1.txt ls –l adr1.txt cat adr1.txt wc adr1.txt cd The file created here will be used later

Collection, Packing, and Analysis
Collection of flow data Examines packets and summarizes into standard flow records Timeout and payload-size values are established during collection Packing stores flow records in a scheme optimized for space and ease of analysis Analysis of flow data Investigation of flow records using SiLK tools Collection and Packing are steps required to build the SiLK repository. Analysis consists of querying the SiLK repository to draw conclusions about network traffic. Credits:

Collection YAF IPFIX tcpdump PCAP Idle-timeout, Active-timeout
Termination-attribute, Application, Start-time, Duration, Packets, Bytes, Flags… YAF IPFIX tcpdump PCAP (Packet CAPture) is an industry-standard file format for representing packets recorded from a live network. tcpdump is a utility for capturing live network traffic (packets) and recording them in a PCAP file. YAF — Yet Another Flowmeter, is a reference implementation of a tool to convert packets into flow data. YAF can obtain packets directly from a live network or from a PCAP file. YAF decides if packets with the same source and destination belong to the same flow using idle-timeout and active-timeout settings. IPFIX — IP Flow Information Export, is an IETF standard format for storing network flow records in a file, and a protocol for shipping the records. PCAP

Analysis SiLK tool chain SiLK repository
Raw (binary) flow records in a file SiLK repository SiLK repository SiLK tool chain Text Raw (binary) flow records in a file Raw flow records produced by the SiLK tool chain are used as input to another SiLK tool chain.

UNIX text tools (sed, awk, …)
Reporting UNIX text tools (sed, awk, …) Text Text Visualization tools (gnuplot, Rayon, Excel) The “Text output” on the right can be prepared for human consumption or fed back in on the left for producing graphs. Rayon is a trademark of Carnegie Mellon University Excel is a registered trademark of Microsoft Corporation

Exercise 2: Which sensors are defined?
rwsiteinfo --fields=id-sensor,sensor rwsiteinfo --fields=id-sensor,sensor | less rwsiteinfo --fields=id-sensor,sensor,describe-sensor \ | less The pound sign (#, hash mark, number sign, tic-tac-toe board) indicates a comment that is ignored by the computer and need not be typed. “Deprecated” means that the command, option, or feature still works but should no longer be used since it has been replaced by another command, option, or feature. The backslash (\) at the end of a physical line indicates that the logical line is not finished, and is to be continued on the next physical line. The backslash may be omitted if the command is typed all on one physical line (that is, without pressing Enter or Return before the end of the complete command). ©2014 Carnegie Mellon University

Where can I get more information?
We can’t discuss all parameters for every tool. Resources Analyst’s Handbook SiLK Reference Guide (collected man pages) --help option man command Additional resources shown at the end tools.netsa.cert.org is the site for the open-source distribution of the SiLK tool suite. Your site may have a version different from the open-source version, so it is wise to check with your local manual pages, documentation, and the output of ‘rwfilter --help‘ in case of conflicts with the open-source distribution. Likewise, the Analysts’ Handbook tends to lag slightly behind the most current version of the SiLK Analysis suite, since the distributors prefer to make functionality immediately available and update tutorial material later. It is rare for an example in the handbook not to work, and the more detailed explanation may aid you in understanding the commands. The handbook contains a lot more explanation (and examples) of rwfilter than we present in this module. Finally, seek out experienced analysts to aid you in becoming familiar with rwfilter. While this module will help in getting you started, interaction with an experienced analyst will improve your skills rapidly. ©2014 Carnegie Mellon University

Answers to questions you haven’t asked yet
At this point you probably have dozens of questions. Typical answers are: Yes, it does, and here is how to do it Yes, read about it in <reference> Yes, but it will take to long to describe right now Yes, but it is not a good idea because <some lame excuse> Because <long silence> No, it doesn’t because <really good reason> No, it doesn’t <long silence> No, but that’s a really good idea, please it to me. Thanks!

Lesson I.3 Summary We learned the parts of *NIX commands. Data should be kept in binary form as long as possible. We learned where to get more information about commands. We learned to obtain information about the SiLK repository using the rwsiteinfo command. ©2014 Carnegie Mellon University

Next Lesson In lesson II.1 we will learn how to choose just the flow records that are applicable to our inquiry.

Part II: Basic SiLK Tools

Part II Lessons: Network Flow
SiLK Records, Files, and the Repository Analysis Tools and Categorization IP Sets

Lesson II.1 SiLK Records, Files, and the Repository

Lesson II.1 Learning Objectives
The learner will be able to display selected fields from a sequence of flow records. The learner will be able to determine which flow-record fields will be useful for a given analysis. The learner will be able to identify which rwfilter keywords are selection options. The learner will be able to pull flow records from a SiLK repository.

Basic SiLK Tools: rwcut
But I can’t read binary... rwcut provides a way to display binary records as human-readable ASCII: useful for printing flows to the screen useful for input to text-processing tools Usually you’ll only need the --fields option. sip packets type flags dip bytes in initialflags sport sensor out sessionflags dport scc dur application protocol dcc stime attributes class nhip etime itype & icode Flow records are stored in a compressed binary format unreadable by most humans. Rwcut is a tool that prints network flow data in an easy to read format. While rwcut provides many options, most of the time you’ll only want to use the --fields option in order to only display those fields of interest to you. Derived fields are not actually stored in the flow record. They are derived from stored field(s) and, sometimes, prefix maps. In SiLK 3.8.1, icmpTypeCode is replaced by the iType and iCode fields. Field names in italics are derived fields ©2014 Carnegie Mellon University

rwcut Default Display By default sIP (1), sPort (3) dIP (2), dPort (4)
Protocol (5) packets, bytes flags sTime, eTime, duration sensor --all-fields # way too much info --fields=1-5,sTime # just right By default rwcut displays the IP address and Port of the source and destination hosts, The protocol in use. The number of packets and amount of bytes in those packets. The flags that appear in the flow. The start and end times, as well as the duration of the flow, and the sensor that collected the flow. While that may be a lot of information, SiLK actually collects even more. You can print out ALL of the information collected about the flow with the --all-fields option. ©2014 Carnegie Mellon University

Create the ex3records.rw file
# rwfilter will not overwrite a file rm ex3records.rw rwfilter --type=all \ --sensor=S \ --start=2009/04/20T11 \ --proto= \ --pass=stdout \ | rwsort --fields=stime \ | rwfilter --input-pipe=stdin \ --max-pass= \ --pass=ex3records.rw

Create the ex3records.rw file in 3 commands
# rwfilter will not overwrite a file rm ex3*rw rwfilter --type=all \ --sensor=S \ --start=2009/04/20T11 \ --proto= \ --pass=ex3.rw rwsort --fields=stime \ –-output-path=ex3sorted.rw \ ex3.rw rwfilter --max-pass= \ --pass=ex3records.rw \ ex3sorted.rw The previous command is a lot to type all at once Either edit it as a script or, if you wish run it in three parts

Exercise 3: What do the data look like?
rwcut ex3records.rw --fields=1-5,packets Try other values for --fields. Try omitting the --fields option. ©2014 Carnegie Mellon University

rwcut --fields=1-5,packets ex3records.rw sIP| dIP|sPort|dPort|pro| packets| | |50398| 80| 6| | | | 80|50398| 6| | | |50189| 5222| 6| | | | 5222|50189| 6| | | |49592| 443| 6| | | | 443|49592| 6| | | | 0| 2048| 1| | | | 0| 0| 1| | | | 0| 2048| 1| | | | 0| 0| 1| | | |56515| 53| 17| | | | 53|56515| 17| | | | 0| 2048| 1| | | | 0| 0| 1| | | | 0| 2048| 1| | | | 0| 0| 1| | | |60515| 25| 6| | | | 1031| 53| 17| | | | 53| 1031| 17| | | | 3507| 53| 6| | | | 53| 3507| 6| | | | 3508| 53| 6| | | | 53| 3508| 6| |

rwcut ex3records.rw sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| duration| eTime|sen| | |50398| 80| 6| 5| 383|FS PA |2009/04/20T11:35:19.439| 0.006|2009/04/20T11:35:19.445| S0| | | 80|50398| 6| 5| 674|FS PA |2009/04/20T11:35:19.440| 0.005|2009/04/20T11:35:19.445| S0| | |50189| 5222| 6| 6| 433|FS PA |2009/04/20T11:35:19.446| 0.009|2009/04/20T11:35:19.455| S0| | | 5222|50189| 6| 5| 446|FS PA |2009/04/20T11:35:19.447| 0.008|2009/04/20T11:35:19.455| S0| | |49592| 443| 6| 10| 1085|FS PA |2009/04/20T11:35:19.463| 0.056|2009/04/20T11:35:19.519| S0| | | 443|49592| 6| 10| 4162|FS PA |2009/04/20T11:35:19.464| 0.055|2009/04/20T11:35:19.519| S0| | | 0| 2048| 1| 276| 23184| |2009/04/20T11:35:19.490| |2009/04/20T12:05:19.389| S0| | | 0| 0| 1| 276| 23184| |2009/04/20T11:35:19.491| |2009/04/20T12:05:19.390| S0| | | 0| 2048| 1| 279| 23436| |2009/04/20T11:35:19.494| |2009/04/20T12:05:19.394| S0| | | 0| 0| 1| 279| 23436| |2009/04/20T11:35:19.495| |2009/04/20T12:05:19.394| S0| | |56515| 53| 17| 1| 65| |2009/04/20T11:35:19.500| 0.002|2009/04/20T11:35:19.502| S0| | | 53|56515| 17| 1| 81| |2009/04/20T11:35:19.502| 0.000|2009/04/20T11:35:19.502| S0| | | 0| 2048| 1| 276| 23184| |2009/04/20T11:35:19.514| |2009/04/20T12:05:19.416| S0| | | 0| 0| 1| 276| 23184| | S0|

rwcut --fields=1-5,packets,flags,stime ex3records.rw sIP| dIP|sPort|dPort|pro| packets| flags| sTime| | |50398| 80| 6| |FS PA |2009/04/20T11:35:19.439| | | 80|50398| 6| |FS PA |2009/04/20T11:35:19.440| | |50189| 5222| 6| |FS PA |2009/04/20T11:35:19.446| | | 5222|50189| 6| |FS PA |2009/04/20T11:35:19.447| | |49592| 443| 6| |FS PA |2009/04/20T11:35:19.463| | | 443|49592| 6| |FS PA |2009/04/20T11:35:19.464| | | 0| 2048| 1| | |2009/04/20T11:35:19.490| | | 0| 0| 1| | |2009/04/20T11:35:19.491| | | 0| 2048| 1| | |2009/04/20T11:35:19.494| | | 0| 0| 1| | |2009/04/20T11:35:19.495| | |56515| 53| 17| | |2009/04/20T11:35:19.500| | | 53|56515| 17| | |2009/04/20T11:35:19.502| | | 0| 2048| 1| | |2009/04/20T11:35:19.514| | | 0| 0| 1| | |2009/04/20T11:35:19.516| | | 0| 2048| 1| | |2009/04/20T11:35:19.528| | | 0| 0| 1| | |2009/04/20T11:35:19.529| | |60515| 25| 6| | S |2009/04/20T11:35:19.529| | | 1031| 53| 17| | |2009/04/20T11:35:23.415| | | 53| 1031| 17| | |2009/04/20T11:35:23.417| | | 3507| 53| 6| | S |2009/04/20T11:35:23.443| | | 53| 3507| 6| | R A |2009/04/20T11:35:23.444| | | 3508| 53| 6| | S |2009/04/20T11:35:23.445| | | 53| 3508| 6| | R A |2009/04/20T11:35:23.446| | | 3507| 53| 6| | S |2009/04/20T11:35:24.084| | | 3508| 53| 6| | S |2009/04/20T11:35:24.085| | | 53| 3507| 6| | R A |2009/04/20T11:35:24.085| | | 53| 3508| 6| | R A |2009/04/20T11:35:24.086| | | 3507| 53| 6| | S |2009/04/20T11:35:24.632| | | 3508| 53| 6| | S |2009/04/20T11:35:24.632| | | 53| 3508| 6| | R A |2009/04/20T11:35:24.633|

rwcut --fields=1-5,iType,iCode,packets,flags,stime ex3records.rw sIP| dIP|sPort|dPort|pro|iTy|iCo| packets| flags| sTime| | |50398| 80| 6| | | |FS PA |2009/04/20T11:35:19.439| | | 80|50398| 6| | | |FS PA |2009/04/20T11:35:19.440| | |50189| 5222| 6| | | |FS PA |2009/04/20T11:35:19.446| | | 5222|50189| 6| | | |FS PA |2009/04/20T11:35:19.447| | |49592| 443| 6| | | |FS PA |2009/04/20T11:35:19.463| | | 443|49592| 6| | | |FS PA |2009/04/20T11:35:19.464| | | 0| 2048| 1| 8| 0| | |2009/04/20T11:35:19.490| | | 0| 0| 1| 0| 0| | |2009/04/20T11:35:19.491| | | 0| 2048| 1| 8| 0| | |2009/04/20T11:35:19.494| | | 0| 0| 1| 0| 0| | |2009/04/20T11:35:19.495| | |56515| 53| 17| | | | |2009/04/20T11:35:19.500| | | 53|56515| 17| | | | |2009/04/20T11:35:19.502| | | 0| 2048| 1| 8| 0| | |2009/04/20T11:35:19.514| | | 0| 0| 1| 0| 0| | |2009/04/20T11:35:19.516| | | 0| 2048| 1| 8| 0| | |2009/04/20T11:35:19.528| | | 0| 0| 1| 0| 0| | |2009/04/20T11:35:19.529| | |60515| 25| 6| | | | S |2009/04/20T11:35:19.529| | | 1031| 53| 17| | | | |2009/04/20T11:35:23.415| | | 53| 1031| 17| | | | |2009/04/20T11:35:23.417| | | 3507| 53| 6| | | | S |2009/04/20T11:35:23.443| | | 53| 3507| 6| | | | R A |2009/04/20T11:35:23.444| | | 3508| 53| 6| | | | S |2009/04/20T11:35:23.445| | | 53| 3508| 6| | | | R A |2009/04/20T11:35:23.446| | | 3507| 53| 6| | | | S |2009/04/20T11:35:24.084| | | 3508| 53| 6| | | | S |2009/04/20T11:35:24.085| | | 53| 3507| 6| | | | R A |2009/04/20T11:35:24.085| | | 53| 3508| 6| | | | R A |2009/04/20T11:35:24.086| | | 3507| 53| 6| | | | S |2009/04/20T11:35:24.632| | | 3508| 53| 6| | | | S |2009/04/20T11:35:24.632| | | 53| 3508| 6| | | | R A |2009/04/20T11:35:24.633|

rwcut --fields=1-5,packets,flags,initialFlags,sessionFlags,stime ex3records.rw sIP| dIP|sPort|dPort|pro| packets| flags|initialF|sessionF| sTime| | |50398| 80| 6| |FS PA | S |F PA |2009/04/20T11:35:19.439| | | 80|50398| 6| |FS PA | S A |F PA |2009/04/20T11:35:19.440| | |50189| 5222| 6| |FS PA | S |F PA |2009/04/20T11:35:19.446| | | 5222|50189| 6| |FS PA | S A |F PA |2009/04/20T11:35:19.447| | |49592| 443| 6| |FS PA | S |F PA |2009/04/20T11:35:19.463| | | 443|49592| 6| |FS PA | S A |F PA |2009/04/20T11:35:19.464| | | 0| 2048| 1| | | | |2009/04/20T11:35:19.490| | | 0| 0| 1| | | | |2009/04/20T11:35:19.491| | | 0| 2048| 1| | | | |2009/04/20T11:35:19.494| | | 0| 0| 1| | | | |2009/04/20T11:35:19.495| | |56515| 53| 17| | | | |2009/04/20T11:35:19.500| | | 53|56515| 17| | | | |2009/04/20T11:35:19.502| | | 0| 2048| 1| | | | |2009/04/20T11:35:19.514| | | 0| 0| 1| | | | |2009/04/20T11:35:19.516| | | 0| 2048| 1| | | | |2009/04/20T11:35:19.528| | | 0| 0| 1| | | | |2009/04/20T11:35:19.529| | |60515| 25| 6| | S | S | S |2009/04/20T11:35:19.529| | | 1031| 53| 17| | | | |2009/04/20T11:35:23.415| | | 53| 1031| 17| | | | |2009/04/20T11:35:23.417| | | 3507| 53| 6| | S | S | |2009/04/20T11:35:23.443| | | 53| 3507| 6| | R A | R A | |2009/04/20T11:35:23.444| | | 3508| 53| 6| | S | S | |2009/04/20T11:35:23.445| | | 53| 3508| 6| | R A | R A | |2009/04/20T11:35:23.446| | | 3507| 53| 6| | S | S | |2009/04/20T11:35:24.084| | | 3508| 53| 6| | S | S | |2009/04/20T11:35:24.085| | | 53| 3507| 6| | R A | R A | |2009/04/20T11:35:24.085| | | 53| 3508| 6| | R A | R A | |2009/04/20T11:35:24.086| | | 3507| 53| 6| | S | S | |2009/04/20T11:35:24.632| | | 3508| 53| 6| | S | S | |2009/04/20T11:35:24.632| | | 53| 3508| 6| | R A | R A | |2009/04/20T11:35:24.633|

Exercise 3a: I wonder what a raw file looks like?
cd # make home directory the working directory rm –f ex3arecords.rw # remove file; ok if not there rwfilter --type=in \ --start-date=2009/4/20:14 --protocol=0- \ --compress=none \ --max-pass=1 \ --pass=ex3arecords.rw ls -l ex3arecords.rw rwfileinfo ex3arecords.rw rwcut --fields=1-5,packets ex3arecords.rw rwcut –-all-fields ex3arecords.rw hexdump -C ex3arecords.rw # any readable text? The option in hexdump uses a capital C. *NIX is case-sensitive for most things. What happens when there’s a --start-date but no --end-date? If the --start-date specifies just a date, then it represents a whole day; if it specifies an hour, then it represents an hour. When you type these commands, you don’t have to type the comments, which begin with a pound sign. Note the colon in the start-date. It can be confusing. Does 21:15 represent 15 minutes past 9 PM, or hour 15 of the 21st day of the month? Later we will see another syntax that may be clearer. ©2014 Carnegie Mellon University

Exercise 3a Output ex3a.sh
-rw-r--r--. 1 pnk pnk 264 Jan 7 21:10 ex3arecords.rw rwfileinfo ex3arecords.rw ex3arecords.rw: format(id) FT_RWIPV6ROUTING(0x0c) version byte-order littleEndian compression(id) none(0) header-length record-length record-version silk-version count-records file-size command-lines 1 rwfilter --type=in --start-date=2009/4/20:11 --protocol=0- --compress=none --max-pass=1 --pass=ex3arecords.rw rwcut --fields=1-5,packets ex3arecords.rw sIP| dIP|sPort|dPort|pro| packets| | | 3507| 53| 6| | rwcut --all-fields ex3arecords.rw sIP| dIP|sPort|dPort|pro| packets| bytes| flags| sTime| duration| eTime|sen| in| out| nhIP|initialF|sessionF|attribut|appli|cla| type| sTime+msec| eTime+msec| dur+msec|iTy|iCo| | | 3507| 53| 6| | | S |2009/04/20T11:35:23.443| |2009/04/20T11:35:23.444| S0| 0| 0| | S | | | 0|all| in|2009/04/20T11:35:23.443|2009/04/20T11:35:23.444| | | |

Repository Example RootDir sensor1 sensor2 silk.conf in inweb int2int
out outweb ext2ext year month The flowtype is a combination of class and type. day flowtype-SENSOR_yyyymmdd.hh hour, sensor e.g., in-SENS1_ © 2006 Carnegie Mellon University

Basic SiLK Tools: rwfilter
Pick files from the repository Plug in additional tools Compression Basic statistics Direct flow output rwfilter is the Swiss Army Knife of the SiLK toolset. It’s always the starting point for your analyses: rwfilter pulls data from the repository. Specific switches locate the data within the repository rwfilter has advanced filtering criteria to evaluate each flow as it’s pulled from the repository rwfilter allows you to direct matching, non-matching or all flows through standard unix pipes rwfilter provides some basic statistics and counting to let you know what was pulled from the repository rwfilter has built-in compression, specified when the tool is installed on your site rwfilter (along with other SiLK tools) contains hooks to plug in additional filtering capabilities to suit your needs There’s a lot of complexity in the parameters to rwfilter, but a lot of expressive power to rapidly retrieve the flows for your analyses. Advanced flow-by-flow filtering Swiss Army knife logo is a registered trademark of Victorinox AG ©2014 Carnegie Mellon University

rwfilter Syntax rwfilter {INPUT | SELECTION} PARTITION OUTPUT [OTHER]
General form rwfilter {INPUT | SELECTION} PARTITION OUTPUT [OTHER] Example call rwfilter --sensor=S \ --type=in \ start-date=2015/8/5T13 \ end-date=2015/8/5T20 \ protocol= \ --pass=workday-5.rw Parameters to rwfilter can occur in any order, and can be abbreviated to any unique prefix (e.g., ‘--start-date’ can be ‘--start’). The grouping of parameters that we use in this presentation is mainly for explanation. Notice that the date is separated from the hour by a “T” instead of a colon on this slide; this helps to avoid confusion. Input parameters are used to specify an input source rather than the repository. Selection parameters specify the criteria used to “Select” files from the repository. Partitioning parameters specify the criteria used to divide data between PASS and FAIL streams. Output parameters direct where to send records that match and/or fail to match the partitioning criteria Other parameters are used for special purposes, such as syntax checking, dealing with IPv6 data, etc. Any given parameter may be used only once in a call to rwfilter, except --pass, --fail, --all, and input filenames. You must choose one category of input: Selection parameters, input filenames, or one --xargs. In general, rwfilter requires: an input source (the repository, a file or a pipe), at least one partitioning parameter (--protocol=0-255 is a vacuous parameter), an output destination (a pipe, a file, or a statistics specification.) If you miss a required parameter, rwfilter will produce an error message rather than try to guess what you want or go with some hidden default. About the only hidden detail is the location of the repository, and that is very rarely important to analysts. ©2014 Carnegie Mellon University

Selection and Input Criteria
Selection options control access to repository files: --start-date=2009/4/21 --end-date=2009/4/21T03 --sensor=S0 --type=in,inweb Alternatively, use input criteria for a pipe or a file: myfile.rw stdin useful for chaining filters through a pipe with stdin/stdout Selection Partition Output In --start-date and --end-date, the hour isn’t an instant in time; it’s a whole hour. --start-date=2009/4/21T00 --end-date=2009/4/21T01 represents a two-hour period. Most implementations do not use --class, which means --class only has ‘all’ as an available value. You can use the ‘mapsid’ command to list the available sensors. This doesn’t mean the sensors are reporting data, just that they are configured for your implementation of SiLK. `rwfilter --help’ will list the available types in its description of --type. This doesn’t mean the types have data, just that they are configured for your implementation of SiLK `rwfglob’ can be used to find out which sensors and types are recording data, as sensor and type is incorporated into the file names. That command, described in a later module, takes a list of selection parameters and returns a list of file names in the repository that satisfy the criteria. ©2014 Carnegie Mellon University

Basic Partitioning Options
Simple numeric fields: ports, protocol, ICMP Type Specified IP addresses, CIDR blocks Sets of IP addresses Combinations of key fields – Tuple files There are a number of places where partitioning criteria can come from: Hosts or subnets that you specifically need to monitor Firewall rule-sets, if you want to check if network flows are somehow evading your firewalls IDS rules, to look for network traffic that may have triggered an alert Network protocol information, to pull out traffic related to some specific protocol ©2014 Carnegie Mellon University

Simple Numeric Key Fields
--protocol= --sport= --dport= --aport= # source, dest, any --protocol=6,17 # TCP or UDP --protocol=0-5,7-16,18- # not TCP or UDP --protocol=0- # all protocols --dport=80,443 # HTTP or HTTPS --sport= , # X11 or JetDirect --aport=20,21 # FTP --sport= # Well-Known Ports Because these options accept a simple list of integers and integer ranges, most of them are named the same as the fields that they test: protocol, sport, and dport. You should only test the port fields (sport and dport) when protocol is restricted to protocols that use ports, such as TCP and UDP; in particular, the port fields should not be tested when ICMP flows might be examined, because under the hood, icmp-type and icmp-code are usually stored in the dport field. Protocols 6,17 – TCP and UDP (the main protocols that use ports) 0- - all protocols (that is, a partitioning option that always succeeds Ports 80,443 – HTTP and HTTPS – X Windows (X11) – JetDirect 20,21 – FTP – Well Known Ports ©2014 Carnegie Mellon University

Specified IP address or CIDR block
--saddress= --daddress= --any-address= --not-saddress= --not-daddress= --not-any-address= May specify a single: IP address CIDR block /24 SiLK has numerous ways to examine IP addresses, so the partitioning options for addresses are NOT the same as the field names. Even --saddress is not the same as the field name sIP. In addition to addresses and CIDR blocks, these options also allow you to specify wildcards, a syntax that is unique to SiLK. ©2014 Carnegie Mellon University

Specified IP addresses or CIDR blocks
--scidr= --dcidr= --any-cidr= --not-scidr= --not-dcidr= --not-any-cidr= May specify multiple: IP addresses , CIDR blocks /24, /24 mixture , /29 These options permit multiple addresses and CIDR block specifications. Wildcards are not permitted since it would be ambiguous whether the commas separated specifications, or were part of a wildcard. ©2014 Carnegie Mellon University

Sets of arbitrary addresses
--sipset= --dipset= --anyset= --not-sipset= --not-dipset= --not-anyset= Specifies the name of a file storing the IP set: --sipset=internalservers.set --dipset=RussianBizNtwk.set --anyset=TorNodes.set --not-dipset=whitelist.set How these IP sets are created is covered later. ©2014 Carnegie Mellon University

rwfilter output options
--pass-destination= # file to get records that pass --fail-destination= # file to get records that fail --all-destination= # file to get all records --print-statistics # report recs read/pass/fail --print-volume-statistics # report how many # recs/pkts/bytes pass/fail ©2014 Carnegie Mellon University

What Is Going on Here? — 5 rwfilter --sensor=S0 \ --type=in \
--start=2009/4/21T00 \ --end=2009/4/21T \ --daddress= /16 \ --print-volume-stat | Recs| Packets| Bytes| Files| Total| 1436| 2615| | 8| Pass| 1436| 2615| | | Fail| 0| 0| 0| | This queries the repository for an 8-hour period on April 21, 2009, pulling all non-web, incoming flows, and reporting volume statistics on how many went to address block /16 (pass) or not (fail). The total line of the output is all non-web, incoming records during that period. The pass line in the output is the traffic volume going to that network. The fail line of the output is the volume of all the other non-web traffic coming in. ©2014 Carnegie Mellon University

Exercise 4: rwfilter 1) Find all traffic going outbound to external HTTPS servers on April 20, Save these flows in file https0420.rw. Only pull records captured by sensor S0. 2) How many flow records matched the criteria? ©2014 Carnegie Mellon University

HTTPS normally uses port 443
Exercise 4: rwfilter 1) Find all traffic going outbound to external HTTPS servers on April 20, Save these flows in file https0420.rw. Only pull records captured by sensor S0. 2) How many flow records matched the criteria? Hint HTTPS normally uses port 443 ©2014 Carnegie Mellon University

Output Criteria Repository rwfilter pass fail
rwfilter leaves the flows in binary (compact) form. --pass, --fail: direct the flows to a file or a pipe --all: destination for everything pulled from the repository One output is required but more than one can be used (screen not allowed for non-text data). Other useful output --print-statistics or --print-volume-statistics --print-filenames, --print-missing-files rwfilter Repository pass fail A single rwfilter command can provide multiple outputs. You can specify that pass records go to one file or pipe, fail records go to a second file or pipe, and volume statistics go to a third file, all on a single command (although it’s pretty rare to get this complex). --print-filenames is useful not only for ensuring that the files that you expect are being selected, but also as a progress meter displaying the filenames as the files are opened. ©2014 Carnegie Mellon University

What Is This? — 8 rwfilter \ --start-date=2010/12/08 \ --type=outweb \ --bytes= \ --pass=stdout \ | rwfilter \ stdin \ --duration= \ --pass=long-http.rw \ --fail=short-http.rw One day’s outgoing web, but only if 100,000 or more bytes per flow Chain two rwfilter calls One minute or more -> long Less than one minute -> short Hints: Work backwards--How does the second command filter flows? What is the filter criterion for the first command? Could this be accomplished with a single rwfilter? NO. Answer: This pulls very large outbound http flows from the repository (those larger than 100K bytes) and separates them based on overall duration--long or short. This might be a very rough first attempt to separate high-volume web traffic into bursty downloads vs. persistent streams. Answer: Classifies 100,000+-byte web output flows by fast or slow transfer. Bursty vs. Persistent? ©2014 Carnegie Mellon University

Example Typos --port= --destport= --sip= or --dip= No such keywords
--saddress=danset.set Needs addr not filename --start-date=2006/06/12--end-date Space needed --start-date = 2006/06/12 No spaces around equals start-date=2006/06/12 Need dashes ---start-date=2006/06/12 Only two dashes --start-date=2005/11/04:06:00:00 --end-date=2005/05/21:17:59:59 Only down to hour ©2014 Carnegie Mellon University

SiLK Commandments Thou shalt use Sets instead of using several rwfilter commands to pull data for multiple IP addresses Thou shalt store intermediate data on local disks, not network disks. Thou shalt make initial pulls from the repository, store the results in a file, and work on the file from then on. The repository is slower than processing a single file. Thou shalt work in binary for as long as possible. ASCII representations are much larger and slower than the binary representations of SiLK data. Thou shalt filter no more than a week of traffic at a time. The filter runs for excessive length of time otherwise. Thou shalt only run a few rwfilter commands at once. Thou shalt specify the type of traffic to filter. Defaults work in mysterious ways. Thou shalt appropriately label all output. Thou shalt check that SiLK does not provide a feature before building your own. ©2014 Carnegie Mellon University

Lesson II.1 Summary We learned how to display the fields of interest from flow records. Files are chosen from the repository with selection options. Records are chosen from those files with partitioning options. There are lots of ways to partition on IP addresses.

Next Lesson In lesson II.2 we will learn to reduce large numbers of flow records to meaningful information and statistics.

Lesson II.2 Analysis Tools and Categorization

The learner will be able to create a time series of given flow records. The learner will be able to determine all the different values of a given field for given flow records and determine the traffic volumes for those field values. The learner will be able to display the top/bottom n values of a given field as measured by some measure of volume.

Basic SiLK Counting Tools: rwcount, rwstats, rwuniq
“Count [volume] by [key field] and print [summary]” basic bandwidth study: “Count bytes by hour and print the results.” top 10 talkers list: “Count bytes by source IP and print the 10 highest IPs.” user profile: “Count records by dIP-dPort pair and print all the pairs.” potential scanners: “Count unique dIPs by sIP and print the sources that contacted more than 100 destinations.” Many analysis tasks require us to group records together by a key field or fields, calculate a volume for each group, and report or summarize the results. The general request is “Count [volume] by [key field] and print [summary]” Here are some examples: Suppose we have a day’s worth of all outgoing traffic records from our network, and we want to perform a basic bandwidth study. This can translate to, “Count bytes by hour and print the results.” Suppose further that one hour shows a large spike in traffic, and we want to know which internal IP addresses are associated with the spike by listing the “top 10 talkers” for that hour. This translates to, “Count bytes by source IP and print the 10 highest volume IPs.” In another example, an internal IP is reported to be “behaving strangely”, and we want to summarize all of its activity in a given hour. To do this, we might start with a list of who it contacted, and what kind of interactions it made. This can translate to “Count records by destination IP and destination port pair, and print out all of the results” These kinds of queries can get pretty complicated. For example In one hour’s worth of traffic, a virus-infected host was seen scanning over 100 different destination IP addresses. Were any other machines on our network behaving similarly? This can translate to, “Count unique destination IPs by source IP, and print the sources that contacted 100 or more destinations.” While these analyses seem complicated at first glance, they are easy to implement with counting tools. ©2014 Carnegie Mellon University

Categorization—Bins sedan coupe pickup
For motor vehicle trips we could bin trip records by vehicle style – sedan, coupe, SUV, pickup, van highway or city trip personal or business trip We could measure the trips and aggregate in bins total miles fuel consumption oil consumption pollutant emission Trip miles Trip miles Trip miles sedan coupe pickup In this illustration, we are binning by Vehicle Style. All three tools (rwcount, rwuniq, rwstats) use bins. Total miles Total miles Total miles ©2014 Carnegie Mellon University

Bins For flows we could bin by address or address block port protocol
time period We could measure the flows and aggregate in bins count of flow records, packets, bytes count of distinct values of other fields, e.g., addr earliest sTime, latest eTime To bin by address block, we’ll need some help from rwnetmask (not covered in this class). So each bin aggregates measures (records, packets, bytes, etc.) for a different address, or a different address block, or a different port, or a different protocol, or a different time period. ©2014 Carnegie Mellon University

TCP UDP ICMP Bins Packet count Packet count
Value from flow record e.g., packets Packet count TCP UDP ICMP Bin key field e.g., protocol In this example, a single field (protocol) is used as the bin key. The aggregate value field is the packet count. If the flow record under examination is indicates a TCP flow (protocol 6), then the packet count in that record is added to the TCP bin. If the flow record under examination is indicates a UDP flow (protocol 17), then the packet count in that record is added to the UDP bin. If the flow record under examination is indicates a ICMP flow (protocol 1), then the packet count in that record is added to the ICMP bin. Total packets Total packets Total packets Aggregate Value ©2014 Carnegie Mellon University

Basic SiLK Counting Tools: rwcount, rwstats, rwuniq
rwcount: count volume across time periods rwstats: count volume across IP, port, or protocol and create descriptive statistics rwuniq: count volume across any combination of SiLK fields “Key field” = SiLK fields defining bins “Volume” = {Records, Bytes, Packets} and a few others measure aggregate value Each tool reads raw binary flow records as input. There are several basic SiLK counting tools. Each one is suited for different combinations of key fields, volumes or summaries. rwcount is used to count volume across time rwstats is used for counting and summarizing volume across IP, port or protocol rwuniq is the most general of the counting tools, and can be used to count volumes across any combination of key fields For flow records, volume is usually the number of records, the number of bytes, or the number of packets. Rwuniq also supports some other volumes we will discuss later. Key fields are all types of SiLK fields, such as port, protocol, time, or IP address. Each tool takes raw binary flow as input, specified as a file name, or piped in from the output of a previous command such as rwfilter Let’s look at each of these counting tools in more detail ©2014 Carnegie Mellon University

rwcount count records, bytes, and packets by time and display results
rwcount --bin-size=300 fast, easy way of summarizing volumes as a time series great for simple bandwidth studies easy to take output and make a graph with graphing S/W rwcount is used to produce a count of records, bytes and packets over time. It provides a fast, easy way of summarizing volumes as a time series, and it is great for simple bandwidth studies. Let’s look at rwcount’s counting options in more detail. ©2014 Carnegie Mellon University

Period1 Period2 Period3 Time Bins
When binning by time, you must specify the period of time for each bin. This is called the bin-size. It’s the size of the bin’s opening, not the volume of the container. bin-size Period1 Period2 Period3 The time period is called the --bin-size in rwcount, but it is the --bin-time in rwstats and rwuniq. ©2014 Carnegie Mellon University

rwcount The bin key is always time. You choose the period.
The aggregate measures are chosen for you. They are flows (records), bytes, packets. rwfilter --sensor=S0 --start=2009/4/21 \ --type=in --proto=1 \ --pass=stdout \ | rwcount --bin-size=3600 Date|Records| Bytes|Packets| /04/21T13:00:00| 10.00| | 41.00| 2009/04/21T14:00:00| 29.00| | | 2009/04/21T15:00:00| 22.00| | 47.00| 2009/04/21T16:00:00| 10.00| | 23.00| ©2014 Carnegie Mellon University

What Is This? — 9 rwcount MSSP.rw --bin-size=3600
Date| Records| Bytes| Packets| 2010/12/08T00:00:00| | | | 2010/12/08T01:00:00| | | | 2010/12/08T02:00:00| | | | 2010/12/08T03:00:00| | | | 2010/12/08T04:00:00| | | | 2010/12/08T05:00:00| | | | 2010/12/08T06:00:00| | | | ••• Here is an example of the output from a bandwidth study. For our daily traffic file, we invoke rwcount with a bin size of 3600 seconds, which is equal to one hour. Time bins are rounded to each hour, and the counts are displayed on the right. We can see the usual work day pattern as the counts are small during night hours, and swell to higher numbers during the day and into evening. There also appears to be a spike in bytes at five AM. Let’s see if we can find the IP addresses responsible for this spike. For that, we will require rwstats. ©2014 Carnegie Mellon University

Demo: rwcount The shell can help with the arithmetic: $((24*60*60)) You also can find common periods in the Quick Reference Guide. Time series for all outgoing traffic on sensor S0: rwfilter --start=2009/04/21 \ --end=2009/04/23 \ --sensor=S \ --type=out,outweb \ --proto= \ --pass=stdout \ | rwcount --bin-size=$((24*60*60)) 24 hours times 60 minutes per hour times 60 seconds per minute gives the number of seconds in a day. It is necessary to use double parentheses for this feature of the Bash shell. ©2014 Carnegie Mellon University

HINT Exercise 5: rwcount ICMP is Protocol 1
Produce a time-series with 30-minute intervals, analyzing incoming ICMP traffic collected at sensor S0 on April 20, HINT ICMP is Protocol 1 ©2014 Carnegie Mellon University

Exercise 5: rwcount solution
rwfilter --start=2009/4/20 \ --sensor=S0 --type=in \ --protocol=1 \ --pass=stdout \ | rwcount --bin-size=1800 Date| Records| Bytes| Packets| /04/20T13:30:00| 5.05| | 26.48| 2009/04/20T14:00:00| 21.92| | 91.35| 2009/04/20T14:30:00| 8.03| | 60.17| 2009/04/20T15:00:00| 14.58| | 90.54| 2009/04/20T15:30:00| 17.33| | | 2009/04/20T16:00:00| 13.69| | 95.04| 2009/04/20T16:30:00| 12.89| | 85.09| 2009/04/20T17:00:00| 11.50| | 85.59| 2009/04/20T17:30:00| 7.00| | 45.07| ©2014 Carnegie Mellon University

rwuniq rwuniq will display all bins for a particular field or fields.
Output is normally unsorted. --sort-output causes sorting by the key (bin). ©2014 Carnegie Mellon University

Calling rwuniq rwuniq --fields=KEY --value=VOLUME
Choose one or several key fields. Aggregate volume count: records, bytes, or packets. standard output formatting options (see “man rwuniq”) Apply thresholds to bins before outputting: --bytes, --packets, --flows, --sip-distinct, --dip-distinct Specify minimum aggregate value or a range --sort-output by key (rwstats sorts by value) The fields option takes a list of one or more keys. The keys can be any valid specification of SiLK fields. For example a list of names like sIP, sPort and sTime. Rwuniq and rwstats also support a --bin-time option, in seconds, when one or more key fields is a time field. Another option is to specify fields numerically, for example fields one to five for source address, destination address, source port, destination port and protocol. Rwuniq counts the usual volumes of flows, bytes and packets, but it also has a few added options. Use the sip-distinct or dip-distinct flags to count unique source addresses or unique destination addresses by key field combination. Use the stime or etime fields to include a minimum start time, or maximum end time, respectively, for each key field combination. Rwuniq allows you to choose any combination of these volumes for printing. Use the --all-counts option to print everything. If input is not pre-sorted, there is no guarantee that the default behavior of rwuniq will print key fields in sort order. Use the sort-output flag to ensure this is the case. Note that sort-output sorts by key field combination, not volume, so this is NOT an option to obtain top-N lists. Rwuniq does not support printing top or bottom N lists. However, any volume can be restricted to print only key combinations whose counts meet a minimum threshold, or occur within a range. ©2014 Carnegie Mellon University

What Is This? – 10 rwfilter outtraffic.rw \ stime=2010/12/08:18:00: /12/08:18:59: \ saddress= \ --pass=stdout \ | rwuniq --fields=dip,sport \ --all-counts \ --sort-output dIP|sPort | Bytes|Packets| Records| sTime-Earliest| eTime-Latest| | 80 | | | |2010/12/08T18:42:51|2010/12/08T18:58:49| | 80 | | | |2010/12/08T18:53:59|2010/12/08T19:01:47| | 80 | | | |2010/12/08T18:29:11|2010/12/08T18:42:51| | 80 | | | |2010/12/08T18:06:36|2010/12/08T18:32:33| | 80 | | | |2010/12/08T18:43:30|2010/12/08T18:43:30| | 80 | | | |2010/12/08T18:08:55|2010/12/08T18:32:25| | 443 | | | |2010/12/08T18:06:57|2010/12/08T18:51:11| | 80 | | | |2010/12/08T18:25:22|2010/12/08T19:21:34| | 80 | | | |2010/12/08T18:08:59|2010/12/08T18:10:09| | 80 | | | |2010/12/08T18:19:48|2010/12/08T18:26:36| | 80 | | | |2010/12/08T18:20:03|2010/12/08T19:01:30| | 80 | | | |2010/12/08T18:18:43|2010/12/08T18:46:55| | 1024 | | | |2010/12/08T18:50:40|2010/12/08T18:50:40| | 80 | | | |2010/12/08T18:44:42|2010/12/08T18:44:47| | 80 | | | |2010/12/08T18:47:50|2010/12/08T18:58:53| Here is an example of a user profile for the largest offender in the bandwidth study. We use rwuniq with the key fields of destination IP and source port, print all counts for each dIP-sPort pair, and sort the output by key. Sorting the output by the key (bin) made obvious that the same IP address ( ) appears twice with two different ports. ©2014 Carnegie Mellon University

What Is This? – 11 rwuniq outtraffic.rw \
--fields=dip \ values=sip-distinct,records,bytes \ --sip-distinct=400- \ sort-output dIP|sIP-Distin| Bytes| Records| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this example, we are looking for potential scanners in one hour’s worth of anomalous traffic. We run rwuniq using destination IP as a key field, and specifying the sip-distinct flag with a threshold of 400 or more unique addresses. We also set the records and bytes values. Each of these addresses has contacted over 400 destinations. But the last two addresses have record and byte counts that are relatively small compared to the number of addresses contacted– a stronger indication that they might be scanning. ©2014 Carnegie Mellon University

Exercise 6: rwuniq For outgoing flows from S0 on 2009/04/20, write and execute the rwfilter piped to rwuniq commands to list how many TCP flows (records) there were with each different number of packets. Display sorted by the number of packets. Are there any odd results you can explain?

HINT TCP is protocol 6 Exercise 6: rwuniq
For outgoing flows from S0 on 2009/04/20, write and execute the rwfilter piped to rwuniq commands to list how many TCP flows (records) there were with each different number of packets. Display sorted by the number of packets. Are there any odd results you can explain? HINT TCP is protocol 6

TCP is protocol 6 (proto=6)
Exercise 6: rwuniq For outgoing flows from S0 on 2009/04/20, write and execute the rwfilter piped to rwuniq commands to list how many TCP flows (records) there were with each different number of packets. Display sorted by the number of packets. Are there any odd results you can explain? HINT TCP is protocol 6 (proto=6) The TCP 3-way handshake requires 3 packets

Exercise 6: rwuniq Solution
rwfilter --type=out,outweb \ --sensor=S \ --start=2009/4/ \ --proto= \ --pass=stdout \ | rwuniq --fields=packets --sort-output What can you say about flows with 1, 2 and 3 packets? It seems as though 4 packets is an oddity. Do you have an explanation? What can be accomplished with 4 TCP packets? There are, of course, exceptions packets| Records| 1| | 2| | 3| | 4| | 5| | 6| | 7| | 8| | 9| | 10| | 11| | 12| | 13| | 14| | 15| | 16| | 17| | 18| | 19| |

rwstats Like rwuniq, rwstats displays bins for a field or fields, but only displays the top N or bottom N bins. The top/bottom N is determined by some traffic volume measurement, such as flows, packets, or bytes. The bins are displayed sorted by the measurement. It also provides percentages.

Calling rwstats rwstats --overall-stats
Descriptive statistics on byte and packet counts by record See “man rwstats” for details. rwstats --fields=KEY --value=VOLUME --count=N or --threshold=N or --percentage=N [--top or --bottom] Choose one or two key fields. Count one of records, bytes, or packets. Great for Top-N lists and count thresholds standard output formatting options (see “man rwstats”) There are two different ways to call rwstats. Use the --overall-stats option to print descriptive statistics on byte and packet counts by record. This option is useful as a general summary of record volumes, and can help in determining which volumes are unusually small or large. We aren’t going to go into detail for this option; See “man rwstats” for specifics. rwstats also can be called with key and volume options. Choose one, or in a few cases, two key fields, choose one of records, bytes or packets to count, and choose the type of summary to print to the screen. Rwstats is great for producing Top-N lists and for thresholding based on counts. Let’s look at rwstats’ counting options in more detail. (For standard output formatting options, such as delimiters and IP formatting options, see “man rwstats”) Rwstats has three summary options. Use the count flag to specify a list of the top or bottom N keys by volume. Use the threshold flag to print all keys whose counts are either greater or less than a threshold (for example, “all destination ports with more than bytes of traffic”). Finally, use the percentage flag to print all keys whose counts make up at least or at most N percent of all traffic. (For example “all source IP addresses that account for at least 5% of all records”) Use the --top or --bottom flags to indicate the direction of the count or threshold: top-N or bottom-N for counts, or volume greater or less than N for thresholds and percentages. ©2014 Carnegie Mellon University

What Is This? – 12 rwfilter outtraffic.rw \ stime=2010/12/08T18:00: /12/08T18:59:59 \ pass=stdout \ | rwstats --fields=sip \ --values=bytes \ --count=10 INPUT: Records for 1104 Bins and Total Bytes OUTPUT: Top 10 Bins by Bytes sIP| Bytes| %Bytes| cumul_%| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is our top 10 list of Source IPs for the anomalous hour we found in the bandwidth study. We pull out the hour in question and call rwstats using source IP, bytes, and count equal to 10. Since rwstats reports both a count and a cumulative percentage, we can see easily that the top ten addresses account for 92.0 percent of the traffic for that hour, with 41.5 percent attributed to the top address alone. To double check that the spike is not the result of many different sources sending small amounts of traffic, compare the top 10 list for the anomalous hour with the top 10 list for the previous hour. Most of the addresses in the list are the same, but the top address jumped from accounting for only 0.32 percent of traffic in the previous hour to 43 percent of traffic in this hour. This address definitely looks like the culprit. We can use rwuniq to get a user profile. ©2014 Carnegie Mellon University

Incoming flows have type in or inweb
Exercise 7: rwstats What are the top 10 incoming protocols on April 20, 2009, collected on sensor S0? HINT Incoming flows have type in or inweb ©2014 Carnegie Mellon University

Exercise 7: rwstats solution
rwfilter --sensor=S \ --type=in,inweb \ --start=2009/4/ \ --prot=0- --pass=stdout \ | rwstats --fields=protocol --value=records --count=10 INPUT: 5512 Records for 3 Bins and 5512 Total Records OUTPUT: Top 10 Bins by Records pro| Records| %Records| cumul_%| 6| 4476| | | 17| 896| | | 1| 140| | | ©2014 Carnegie Mellon University

--value=distinct:dip
Exercise 8: rwstats Top 9 inside hosts according to how many outside hosts they communicate with on April 20, 2009, collected on sensor S0? HINT Use --value=distinct:dip ©2014 Carnegie Mellon University

Exercise 8: rwstats solution
rwfilter --start-date=2009/4/20 --sensor=S0 \ --type=out,outweb \ proto=0- --pass=stdout \ | rwstats --fields=sip --value=distinct:dip --count=9 INPUT: 5001 Records for 14 Bins OUTPUT: Top 9 Bins by dIP-Distinct sIP|dIP-Distin|%dIP-Disti| cumul_%| | 17| ?| ?| | 11| ?| ?| | 11| ?| ?| | 9| ?| ?| | 5| ?| ?| | 3| ?| ?| | 3| ?| ?| | 3| ?| ?| | 2| ?| ?| The percent-of-total and cumulative-percentage columns will contain question marks when the primary aggregate value is not one of Bytes, Packets, or Records. The --no-percents option will eliminate these columns, which is especially useful when you know they’ll contain just question marks. Remember to “export SILK_IPV6_POLICY=asv4” if you’re not expecting any IPv6 addresses in the output. --no-percents will clean up the question marks. ©2014 Carnegie Mellon University

rwstats in top/bottom mode
rwuniq vs. rwstats - 1 rwuniq both rwstats in top/bottom mode all bins except per thresholds Bin by key --top or --bottom bins Default aggregate value is flows (records). --sort-output by key otherwise unsorted Sorted by primary aggregate value Thresholds or ranges: --bytes, --packets, --flows, --sip-distinct, --dip-distinct Choose which bins have aggregate values significant enough to output. ‑‑count, ‑‑threshold, ‑‑percentage --value=flows is a synonym for --value=records in both tools. Both tools have a legacy option --flows, but there is no such option as --records. ©2014 Carnegie Mellon University

rwstats in top/bottom mode
rwuniq vs. rwstats - 2 rwuniq both rwstats in top/bottom mode --all-counts (bytes, pkts, flows, earliest sTime, and latest eTime) Show volume aggregate value[s]. --no-percents (good when primary aggregate isn’t Bytes, Packets, or Records) --bin-time to adjust sTime and eTime --presorted-input (omit when value includes Distinct fields, even if input is sorted) --values= sTime‑Earliest, eTime‑Latest ‑‑values=Records, Packets, Bytes, sIP-Distinct, dIP-Distinct, Distinct:KEY‑FIELD (KEY-FIELD can’t also be key field in ‑‑fields) --value=flows is a synonym for --value=records in both tools. Both tools have a legacy option --flows, but there is no such option as --records. ©2014 Carnegie Mellon University

Lesson II.2 Summary We learned how to categorize flow records by time or some other field. Display a time series of flows with rwcount. Display all categories (bins) with rwuniq. Display the top or bottom bins, according to some measurement, with rwstats.

Lesson II.3 IP Sets

Given a collection of IP addresses and CIDR blocks, the learner will be able to create an IP Set SiLK-file. Given an IP Set, the learner will be able to display the contents and characteristics of the set. Given an IP Set, the learner will be able to partition flow records based on the presence/absence of IP addresses in the set. Given a sequence of flow records, the learner will be able to extract IP addresses from the records and create an IP Set.

Sets Black bears Trout Fleas Tulips
There are six species in the red circle, six species in the green circle, and three species that are in both circles. Four species are outside of both circles. No species appears more than once in a circle (set). Sets never contain duplicates. Tulips ©2014 Carnegie Mellon University

Address Lists, Blacklists, Whitelists
Too many addresses for the command line? spam block list malicious websites arbitrary list of any type of addresses Create an IP set! From individual IP address in dotted decimal or integer format From CIDR blocks, e.g., /16 From flow records Use it directly within your rwfilter commands. --sipset, --dipset, --anyset --not-sipset, --not-dipset, --not-anyset Sometimes you may need to look for traffic from a large number of addresses, or you may be using the same set of addresses many times. This could occur when trying to work with a Spam block list, a list of malicious websites, or simply a list of servers within your domain. In these situations using the command line to enter these addresses is cumbersome. The SiLK toolset provides a tool to create groupings of these IP addresses called IP Sets. IP Sets can be used directly by rwfilter as source, destination or either as in the anyset option. ©2014 Carnegie Mellon University

Set Tools rwsetbuild: Create a set from text. rwsetcat: Display an IP set as text. rwset: Create sets from binary flow records. rwsetmember: Test if an address is in given IP sets. rwsettool: Perform set algebra (intersection, union, set difference) on multiple IP sets. There are 5 tools commonly used to work with IP Sets. The rwset command allows you to create sets directly from binary flow files. The rwsetbuild command allows you to build sets from text files of IP addresses, CIDR blocks, and optionally, ranges of addresses The rwsetcat command allows you to display the contents of an IP set to a text file or the screen. rwsetmember is a command that quickly tests whether or not an IP address is present in an IP set. rwsettool is an incredibly powerful command that provides many functions, including the ability to create unions, intersections, and differences of IP sets. It also allows you to randomly sample IPs from a given IP set. ©2014 Carnegie Mellon University

Creating a Set from a text file
Start with a text file containing IP addresses IPv4 in dotted quad notation IPV6 in canonical format (e.g. 2001:db8::f00) Run rwsetbuild to make the conversion from text to set $ cat sample.set.txt $ $ rwsetbuild sample.set.txt sample.set $ ls -l sample* -rw-r--r--. 1 pnk pnk 124 Jan 7 17:22 sample.set -rw-r--r--. 1 pnk pnk 32 Jan 7 17:21 sample.set.txt $ rwsetcat sample.set

Exercise 9: Create a set file
In Exercise 1 you created the text file adr1.txt It should contain two IPv4 addresses in dotted quad notation Create a set file from it

In Exercise 1 you created the text file adr1.txt It should contain two IPv4 addresses in dotted quad notation Create a set file from it HINT Use rwsetbuild <text file> <set file>

In Exercise 1 you created the text file adr1.txt It should contain two IPv4 addresses in dotted quad notation Create a set file from it HINT If you run it twice, rwsetbuild will not overwrite the set file You’ll have to delete it first.

Exercise 9: Create a set file solution
$ cd ~/ex1 $ rwsetbuild adr1.txt adr1.set $ ls -l adr1* -rw-rw-r-- 1 demo demo 133 Dec 22 15:03 adr1.set -rw-rw-r-- 1 demo demo 24 Dec 22 14:32 adr1.txt $ rwsetcat adr1.set ©2014 Carnegie Mellon University

Create a Set from Private IP CIDR Blocks
$ cd $HOME $ cp /data/scripts/private_example.set.txt . # copy file $ cat private_example.set.txt # display file /8 # RFC 1918 private /12 # RFC 1918 private /24 # documentation (example.com or example.net) /16 # RFC 1918 private /24 # documentation (example.com or example.net) /24 # documentation (example.com or example.net) $ rwsetbuild private_example.set.txt private_example.set $ rwsetcat private_example.set | head -n $ rwsetcat --count-ips private_example.set

Find Addresses from Traffic NOT in the IP Set
rwfilter --start=2009/4/20 \ --end=2009/4/24 \ --type=in,inweb \ --not-sipset=private_example.set \ --pass=stdout \ | rwset --sip-file=outside_not_private.set rwsetcat --count-ips outside_not_private.set 6237 The bluered repository contains traffic from an air-gapped network, mostly using private addresses, so it is unusual. Is there a way to produce the same set without --not-sipset? Use --sipset with --fail instead of --pass. But that solution doesn’t allow you to combine --sipset with other partitioning options, like --dPort. ©2014 Carnegie Mellon University

Examine the IP Set $ rwsetcat outside_not_private.set | less $ rwsetcat --cidr-blocks outside_not_private.set | less $ rwsetcat --network-structure=8 outside_not_private.set /8| 6237 $ rwsetcat --network-structure=16 outside_not_private.set \ | wc –l 2 $ rwsetcat --network-structure=16 outside_not_private.set /16| /16| 305 $ rwsetcat --network-structure=24 outside_not_private.set \ | wc –l 264 Does the number 6237 look familiar? It is the number of flow records that produced this set. What does it mean that these numbers match? It means that every record had a different source address. ©2014 Carnegie Mellon University

Exercise 10 Sets 1) For April 21, 2009 on sensor S0, make a set-file of addresses of all actual inside hosts. Should we examine incoming or outgoing traffic? 2) Make a set-file of all outside addresses. Can you make both sets with one command? You should examine outgoing traffic to identify ACTUAL inside hosts. This traffic will originate from actual inside hosts. Incoming traffic is destined for addresses that the outside sender hopes exist; they may be simply addresses that are being probed. You can make both sets with one command depending upon how you define “outside addresses.” It can be done if your definition is: external address to which outgoing traffic is directed, but not necessarily addresses from which inbound traffic originates. ©2014 Carnegie Mellon University

HINT Pipe rwfilter to rwset Exercise 10 Sets
1) For April 21, 2009 on sensor S0, make a set-file of addresses of all actual inside hosts. Should we examine incoming or outgoing traffic? 2) Make a set-file of all outside addresses. Can you make both sets with one command? HINT Pipe rwfilter to rwset You should examine outgoing traffic to identify ACTUAL inside hosts. This traffic will originate from actual inside hosts. Incoming traffic is destined for addresses that the outside sender hopes exist; they may be simply addresses that are being probed. You can make both sets with one command depending upon how you define “outside addresses.” It can be done if your definition is: external address to which outgoing traffic is directed, but not necessarily addresses from which inbound traffic originates. ©2014 Carnegie Mellon University

Set Exercise 10 solution rwfilter --start-date=2009/4/21 \ --sensor=S0 \ --type=out,outweb \ --proto=0- \ --pass=stdout \ | rwset --sip-file=insidehosts.set \ --dip-file=outsidehosts.set This is a correct solution depending on your definition of outside hosts. This solution creates a set of outside hosts to whom inside hosts sent traffic. It does not show all the outside hosts that sent traffic to the inside. ©2014 Carnegie Mellon University

HINT How big are the set files? What can you say about the files?
Exercise 11 Sets Examine the two set-files from Exercise 9. HINT How big are the set files? What can you say about the files? How many addresses in each set? What are they? ©2014 Carnegie Mellon University

Set Exercise 11 solution ls –l insidehosts.set rwfileinfo insidehosts.set rwsetcat insidehosts.set --count rwsetcat insidehosts.set | less ls –l outsidehosts.set rwfileinfo outsidehosts.set rwsetcat outsidehosts.set --count rwsetcat outsidehosts.set | less “ls” tells you almost nothing interesting: who owns the set, which file permissions are established, the date of last modification, and the size of the set in bytes. “rwfileinfo” doesn’t tell you much more; in particular, it does not tell you how many addresses are in the set. “rwsetcat” is the best tool for examining IP Sets. ©2014 Carnegie Mellon University

Use --network-struc=N
Exercise 12 Sets Which /16 networks are on the inside? Which /8 networks are on the outside? Bonus question How many /24 networks are on the outside? HINT Use --network-struc=N ©2014 Carnegie Mellon University

Use --network-struc=N
Exercise 12 Sets Which /16 networks are on the inside? Which /8 networks are on the outside? Bonus question How many /24 networks are on the outside? HINT Use --network-struc=N Where N comes from CIDR notation /N ©2014 Carnegie Mellon University

Exercise 12 solution rwsetcat --network-struc=16 \ insidehosts.set
outsidehosts.set Bonus question rwsetcat --network-struc=24 \ outsidehosts.set \ | wc -l /16 networks on the inside: 1 (with 7 hosts) /16| 7 /8 networks on the outside: 8 (with 17,142 hosts) /8| 17122 /8| 1 /24 networks on the outside: 11 ©2014 Carnegie Mellon University

Set-like Files: Bags Wouldn’t it be nice to count something per address and associate the two? Yes, it would, it exists and it is called a Bag rwbag rwbagbuild rwbagcat rwbagtool

Bag Example rm -f sf.bag rwfilter --type=out,outweb \ --sensor=S0 \
--start=2009/4/ \ --proto= \ --pass=stdout \ | rwbag --sip-flows=sf.bag rwbagcat sf.bag | | | | | | | | | | | | | | | | | | | | | | | | | | | |

Set-like Files: Prefix Map (PMap)
How do I work with, say, service names like HTTP and HTTPS rather than 80 and 443? Use a PMap, short for Prefix Map rwpmapbuild rwpmapcat rwpmaplookup

Pmap Example rwfilter --start=2009/4/21 \ --sensor=S0 \ --proto=0- \ --pass=stdout \ | rwuniq --pmap-file=pname:/data/scripts/protocols.pmap \ --fields=src-pname,proto \ --values=bytes --sort-out src-pname|pro| Bytes| ICMP| 1| | TCP| 6| | UDP| 17| |

Set-like Files: Tuple rwfilter … -tuple-file=email-ports.txt …
Is there a way to search for multiple, independent field values without resorting to multiple rwfilter commands? Yes, it is called a Tuple and it can be used in addition to or instead of other partitioning parameters (use it instead of, say, proto=6 dport=25,58,143,158,209,366,465,587) rwfilter … -tuple-file= -ports.txt …

Tuple Example sIP| dIP|sPort|dPort|pro| packets|
rwfilter --start=2009/4/20 \ --sensor=S0 \ --tuple-file=/data/scripts/legacy_services.tuple \ --type=in,inweb --pass=stdout \ | rwcut --fields=1-5,packets sIP| dIP|sPort|dPort|pro| packets| | | 25|60045| 6| | | | 1720| 53| 6| | | | 3389| 443| 6| | | | 3389| 443| 6| | | | 1719| 25| 6| |

Comparison of IP Set, Bag, Tuple, PMap
Semantics presence volume conditionals categories Columns 1 2 1–5 1st Column IP Addr various Flow-Label Field IP Addr or Proto/Port 2nd Column — measure Flow-Label Field or none Label Used for Partitioning yes no Used for Field Output Binary/Text binary text Combine set algebra arithmetic Usual Role input, interim, output interim, output input, interim input All of these SiLK structures can be thought of as tables. Semantics: An IP address is present in an IP Set or it is not. Every address in the set shares some set of characteristics. Bags typically use IP Address as the key (1st column) and have a measurement of traffic volume for the value (2nd column), although this isn’t always so. A Tuple-file contains a collection of rows, each containing a tuple of flow record fields from the flow label (source/destination IP address, source/destination port, transport-layer protocol), providing a condition for passing a partitioning test. PMaps either have a key of IP address or protocol/port. Although the key must be unique in each row, the Label value may appear in multiple rows. The labels represent categories. ©2014 Carnegie Mellon University

Lesson II.3 Summary An IP Set is a collection of IP addresses. There are no duplicates in a set. An IP address is either in a given set or it is not. rwsetbuild creates an IP Set from a text file. rwset creates an IP Set from flow records. IP sets can be used for partitioning flow records with rwfilter. rwsetcat displays the contents or summaries of an IP Set.

Part II Summary—Basic SiLK
rwsiteinfo rwcut rwfilter rwcount rwstats rwuniq rwsetbuild rwsetcat rwset rwsetmember

Additional Learning Resources

A look ahead There is more to the SiLK Analysis Tool Suite than we’ve discussed TCP Flags Application Label PySiLK Plug-ins Rayon

Analyst’s Handbook Full Title: Using SiLK for Network Traffic Analysis Analyst’s Handbook for SiLK Versions and Later Ron Bandes Timothy Shimeall Matt Heckathorn Sidney Faber Note that E1 uses SiLK as of November, 2016

What is the Handbook? From Page 1 of the Handbook
This handbook provides a tutorial introduction to network traffic analysis using the System for Internet Level Knowledge (or SiLK) tool suite. … SiLK tools provide network security analysts with the means to understand, query, and summarize both recent and historical traffic data represented as network flow records. The SiLK tools provide network security analysts with a relatively complete high-level view of traffic across an enterprise network, subject to placement of sensors.

What This Handbook Doesn’t Cover
From Page 3 of the Handbook This handbook is not an exhaustive description of all the tools in the SiLK tool suite or of all the options in the described tools. Rather, it offers concepts and examples to allow analysts to accomplish needed work while continuing to build their skills and familiarity with the tools.

Who is it for? The Analyst’s Handbook is useful for beginning, intermediate and advanced analysts using the SiLK tool suite for network flow analysis. It includes reviews of networking and Unix commands relevant for anyone using SiLK for network analysis. It provides readable descriptions of the SiLK tools and how they are used The numerous examples will often form the basis of an analytic script you need Once you know where “the good stuff” is located, you will refer to it often

What’s in it? Table of Contents Network and Unix review
SiLK Repository Essential SiLK tools Using the larger SiLK tool suite Using PySiLK for advanced analysis Additional information on SiLK

Network Profiling Using Flow
AKA “The Network Profiling Report” Brief description Sample table Sample script Also described in podcast: Using Network Flow Data to Profile Your Network and Reduce Vulnerabilities

From the Abstract This report provides a step-by-step guide for profiling—discovering public-facing assets on a network—using network flow (netflow) data. Netflow data can be used for forensic purposes, for finding malicious activity, and for determining appropriate prioritization settings. The goal of this report is to create a profile to see a potential attacker’s view of an external network. Readers will learn how to choose a data set, find the top assets and services with the most traffic on the network, and profile several services. A case study provides an example of the profiling process.

Table 1: Example Profiling Spreadsheet
Internal IP Protocol Internal Port Internal Name External IP External Port External Name Comments

Example Script to find probable web servers
$ rwfilter sample.rw --type=outweb \ --sport=80,443, protocol= \ --packets=4- --ack-flag=1 --pass=stdout \ | rwstats --fields=sip --percentage=1 --bytes INPUT: Records for 24 Bins and Total Bytes OUTPUT: Top 7 bins by Bytes (1% == ) sIP| Bytes| %Bytes| cumul_%| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mention that the scripts use many “legacy” options such as --bytes

Online Print Resources
Using SiLK for Network Traffic Analysis Analyst's Handbook for SiLK Versions and Later (4.9MB pdf) a tutorial introduction to network traffic analysis using the SiLK tool suite SiLK Tool Suite Quick Reference (497.4KB pdf) A quick reference guide to the core analysis tools Updated June 2016 for SiLK 3.12 SiLK Analysis Suite - overview of the analysis tools Be careful on these web pages… the descriptions are of the latest released version of the tools, which may not be the same as you are using. Check the version with rwfilter --version

PySiLK: SiLK in Python (html, 400.8KB pdf) document containing the PySiLK and SiLK Python manual pages provided as a single reference for manipulating SiLK Flow data from within Python The SiLK Reference Guide (html, 1.7MB pdf) every SiLK manual page presented in a single document (not to be confused with the Quick Reference Guide – QRG) SiLK Tooltips Wiki (html) tips and tricks to use with the SiLK analysis suite. These documents point out features and uses of the tools that very helpful and not immediately obvious.

Network Profiling Using Flow A step-by-step guide for profiling—discovering public-facing assets on a network—using network netflow data. Network Profiling with SiLK This FloCon presentation describes how to use SiLK to create an inventory of assets on a network and their characteristics and associated purposes. view.cfm?assetid=50813

YAF – Yet Another Flowmeter A description of the network flow event generator used with SiLK. YAF Application Labeling yaf can examine packet payloads and determine the application protocol in use within a flow

Live DVD with SiLK Tools and Repository
The NetSA Live DVD is a Fedora 22 (64-bit) respin, and contains the following tools from the CERT NetSA Security Suite: Analysis Pipeline fixbuf netsa-python Rayon SiLK super_mediator YAF

Docker Image Available on docker hub at: If you have docker installed: docker pull certcc/silklive

SiLK Reference Data 2 Sets of publically available SiLK Reference Data: FCCX-15 is from a Cyber Exercise conducted by the Software Engineering Institute at Carnegie Mellon University in June LBNL-05 is derived from anonymized enterprise packet header traces obtained from Lawrence Berkeley National Laboratory and ICSI, and is used here with their permission.

Online Video Resources
The SEI YouTube Channel has several SiLK related videos available. For details see: Or do a Google search for “SEI youtube channel” and ignore the Stockholm Environment Institute The following videos are available for public access and relevant for analysts using SiLK: Understanding the SiLK Repository (4:11 minutes) This tutorial focuses on gaining an understanding of the SiLK repository. rwsiteinfo: Part of the SiLK Analysis Tool Suite (7:04 minutes) This tutorial covers rwsiteinfo, one of the tools in the SiLK Analysis Tool Suite. rwfileinfo: Part of the SiLK Analysis Tool Suite (9:17 minutes) This tutorial covers rwfileinfo, one of the tools in the SiLK Analysis Tool Suite. Pretty Printing in SiLK (4:17 minutes) This tutorial covers the Pretty Printing options available in many of the SiLK Analysis tools.

SEI Podcast Series: Network Flow and Beyond with Tim Shimeall (24:03 minutes) Webinar: Building Analytics for Network Flow Records (59:51 minutes) d= &g=98fe816e-5ce acc3-86d01acbed45&sid=# Using Network Flow to Gain Cyber Situational Awareness (60:13 minutes) 6NNpzyo5SOikogTjmACDFghLQP&index=5

SEI Cyber Minute Not network flow or SiLK related, but of general interest 26NNpzBahjG1-TVj1Qs8oYOHfjJ (upcoming video on “Fighting Phishing with Network Flow Analysis”)

All our Netflow Tools http://tools.netsa.cert.org/

Questions? Duke

Extra slides

Building Analytics for Network Flow Records
Example Analytics

Profiling Top Five Services and Protocols
Explore: Characterize traffic on our network by protocols and services What protocols are being used? What services are we requesting? What services are we providing? Changes over time? Anomalies? Model: Input: Pre-retrieve traffic for some time period, typically a full days worth. Use rwfilter with rwstats to generate top 5 statistics Output: Multiple tables containing summarizations of traffic Test, Analyze, and Refine: Include test cases for how to handle large sets of data Reliability / performance / interpretable results Would a list of top 10’s provide more value for the performance hit? Is generating a table of all protocols and services then filtering text, rather than SiLK binary, feasible? Inbound vs. Outbound traffic?

Profiling Top Five Services and Protocols Script
#!/bin/bash echo Initial dataset: rwfilter --type=out,outweb --start-date=2011/09/28:00 \ end-date=2011/09/28:23 --protocol=0- --pass=sample.rw echo -e “\nTop protocols:” rwstats sample.rw --fields=protocol --count=5 echo -e “\nTop services being requested:” rwstats sample.rw --count=5 --fields=dport echo -e “\nTop services being provided:” rwstats sample.rw --count=5 --fields=sport # Network Profiling Using Flow(Section 3 Script) Available at /data/scripts/top5ServicesProtocols.sh

Profiling Active Addresses
Explore: Identify addresses that have actively talked during a given time period How many active addresses are using TCP? What about other protocols? Are there addresses that are merely passing through our network (transiting) Awareness of network address topology. Changes over time? Anomalies? Model: Input: Pre-retrieve traffic for some time period, typically a full days worth. Use rwfilter with rwset tools to generate and understand address lists Output: Multiple tables containing summarizations of active addresses, Multiple set files Test, Analyze, and Refine: Include test cases for how to handle large sets of data Reliability / performance / interpretable results What about separating out UDP and ICMP from other IP protocols? What about traffic heading into the network from an internal host?

Profiling Active Addresses Script
#!/bin/bash echo Number of TCP talkers: rwfilter sample.rw --type=out,outweb --protocol=6 --packets=4- --ack-flag=1 \ pass=stdout | rwset --sip-file=tcp_talkers.set rwsetcat tcp_talkers.set --count echo -e “\nNumber of talkers on other protocols:" rwfilter sample.rw --type=out --protocol=0-5,7- --pass=stdout \ | rwset --sip-file=other_talkers.set rwsetcat other_talkers.set --count rwsettool --union tcp_talkers.set other_talkers.set --output-path=talkers.set rwsetcat talkers.set --network-structure echo -e “\n\nClass C network blocks:" rwsetcat talkers.set --network-structure=C echo -e "Transit traffic:" rwfilter sample.rw --type=out,outweb --not-sipset=talkers.set \ --pass=stdout | rwtotal --sip-first-8 --summation --skip-zeroes --no-titles \ | cut -f 2 -d “|” rwfilter sample.rw --type=out,outweb --dipset=talkers.set --pass=stdout \ |rwtotal --sip-first-8 --summation --skip-zeroes --no-titles | cut -f 2 -d “|” # Network Profiling Using Flow(Section 4 Script) Available at /data/scripts/activeAddresses.sh

Profiling Assets by Service
Explore: Identify assets in our network by popular services What assets are running web servers? DNS Servers? Telnet servers? What assets are telnet clients? Changes over time? Policy Violations? Model: Input: Pre-retrieve traffic for some time period, typically a full days worth. Use rwfilter, rwstats, rwuniq, and rwset tools to generate asset lists and statistics Output: Multiple tables containing summarizations of traffic, Multiple set files Test, Analyze, and Refine: Include test cases for how to handle large sets of data Reliability / performance / interpretable results What additional services are of interest? Do we have to limit our output in some manner? (e.g.: > 1% of packets)

Profiling Assets by Service Script
#! /bin/bash echo "######## Web Servers ########" rwfilter sample.rw --type=outweb --sport=80,443, protocol=6 \ --packets=4- --ack-flag=1 --pass=stdout | rwstats --fields=sip --percentage=1 \ --bytes --no-titles | cut -f 1 -d "|" | rwsetbuild > web_servers.set echo Potential Web Servers: --packets=4- --ack-flag=1 --sipset=web_servers.set --pass=stdout | \ rwuniq --fields=sip,sport --bytes --sort-output echo -e "\n######## Web Clients ########" rwfilter sample.rw --type=out,outweb --protocol=6 --ack-flag=1 --packets=4- \ --dport=80,8000,8080,443,1935,1755,554 --pass=stdout | \ rwset --sip-file=tcpclients.set rwfilter sample.rw --type=out,outweb --protocol=17 --dport=1755,554 \ --pass=stdout | rwset --sip-file=udpclients.set rwsettool --union tcpclients.set udpclients.set --output-path=webclients.set rm tcpclients.set rm udpclients.set **********************************SNIP******************************************* # Network Profiling Using Flow(Section 5 Script) Available at /data/scripts/assetsByService.sh

Understanding Host Roles
Explore: Characterize hosts that seem to act as servers Of the hosts communicating on TCP port 25 (SMTP), how much non- SMTP traffic does each generate? Changes over time? After event? Model: Input: Assume small population of interesting hosts (specified as IP set) Pre-retrieve traffic of interest (as rw file) Use rwfilter with rwuniq to pull out SMTP vs non-SMTP flow counts Output: Table of behavior per IP address Test, Analyze, and Refine: Include test cases for addresses with known and unknown roles Reliability / performance / interpretable results How about non-SMTP activity? (e.g., POP, IMAP) How about non-SMTP related to ? (e.g. DNS)

Initial Host Characterization Script
#!/bin/bash rm -f mor -saddr.txt more-nomail-saddr.txt more-nomail.rw rwfilter in_month.rw --sipset=interest.set --pass=stdout \ | rwfilter stdin --protocol=6 --aport=25 \ --fail=more-nomail.rw --pass=stdout \ | rwuniq --field=sIP --no-titles --ip-format=zero-padded \ --sort-output --output-path=mor -saddr.txt rwuniq more-nomail.rw --field=sIP --ip-format=zero-padded \ --no-titles --sort-output --output-path=more-nomail-saddr.txt echo ' sIP| mail|| not mail|' \ ; join mor -saddr.txt more-nomail-saddr.txt \ | sort -t'|' -nrk2,2 \ | head -n 5 # Using SiLK for Network Traffic Analysis (Example 3.37) Available at /data/scripts/rwuniq- .sh

Identifying Backwards Traffic
Explore: Identify hosts for which sensors record traffic that appears reversed (possible forged addresses). Of the hosts sending or receiving TCP traffic, for which do sensors record traffic that is “inbound” but from local hosts, or “outbound” and going to local hosts? Changes over time? Related to event? Model: Assume we have hosts of interest, expressed as IP set Assume we have date/time range of interest, expressed as parameters Use rwfilter to pull traffic into files for later analysis Test, Analyze, and Refine: Test against normal traffic and traffic engineered to be reverse Reliability / Interpretable results Traffic not completely reversed? Traffic observed in both groups?

Initial Backwards Traffic Script
#!/bin/bash START=2009/4/20T12 END=2009/4/20T13 SENNAME=S0 rm -f strange_in.rw strange_out.rw rwfilter --sensor=$SENNAME --type=in,inweb --start-date=$START \ --end-date=$END --protocol=6 --bytes-per-packet=65- \ --sipset=mynetwork.set --flags-all=SAF/SAFR,SAR/SAFR,SAFR/SAFR \ --packets=4- --pass=strange_in.rw rwfilter --sensor=$SENNAME --type=out,outweb --start-date=$START \ --end-date=$END --dipset=mynetwork.set --not- sipset=mynetwork.set\ --pass=strange_out.rw exit 0 # Using SiLK for Network Traffic Analysis (Example 4.22) Available at /data/scripts/backdoor-filt.sh

Finding Scanners Explore: Identify external addresses that are mapping our network Much existing work in literature (Gates, Jung, etc.) Simple approach exploits attacker workflow (minimal advance knowledge) Of quick TCP traffic with relatively few bytes per packet, which have flags that could reflect scanning? Are the sources frequent enough? Model: Assume a date of interest, expressed as parameters Use rwfilter to isolate TCP traffic with few bytes per packet, then to split out flag combinations. Use rwbag to count sources per bag combinations Use rwbagtool to manipulate counts and isolate frequent sources Use rwbagtool to produce set of IP addresses for sources Test, Analyze, and Refine: Use test cases with known scans and without known scans False positives and false negatives Trends over time? Gates, C. et. al., “Detecting Scans at the ISP level”, Jung, J. et. al., “Fast Portscan Detection Using Sequential Hypothesis Testing”,

Initial Scanner Detection Script
#!/bin/bash rm -f fastfile.rw fast-{low,high}.{set,bag} scan.set rwfilter --start=2009/04/20 --sensor=S0 --type=in,inweb --bytes-per=40-65 \ --protocol=6 --duration= pass=fastfile.rw rwfilter fastfile.rw --flags-all=S/SRF --packets=1-3 --pass=stdout \ | rwbag --sip-flows=fast-low.bag rwfilter fastfile.rw --flags-all=SAF/SARF,SR/SRF --pass=stdout \ | rwbag --sip-flows=fast-high.bag rwbagtool fast-high.bag --maxcounter=10 --coverset --output-path=fast-high.set rwbagtool fast-low.bag --mincounter=10 --coverset --output-path=fast-low.set rwsettool --difference fast-low.set fast-high.set --output-path=scan.set exit 0 # Using SiLK for Network Traffic Analysis (Example 4.33) Available at /data/scripts/rwbag_scanners.sh

How well does compression work?
FILE=compress-none.rw for RECS in do rm -r $FILE rwfilter --start-date=2009/4/20:11 \ --type=all \ --protocol=0- \ --compress=none \ --max-pass=$RECS \ --pass=$FILE ls -l $FILE done

No compression vs compression
-rw-r--r--. 1 pnk pnk 264 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 352 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 440 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 528 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 616 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 704 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 792 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 880 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 968 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 1056 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 1144 Jan 5 22:02 compress-none.rw -rw-r--r--. 1 pnk pnk 256 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 272 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 282 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 290 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 298 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 303 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 324 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 333 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 344 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 355 Jan 5 22:02 compress-best.rw -rw-r--r--. 1 pnk pnk 364 Jan 5 22:02 compress-best.rw

Visualization

One of the best results of visualization
Flow Visualization Visualization has many uses Analysis Explanation Discovery One of the best results of visualization is to speed up whatever you are doing

Popular types of visualization:
Bar Chart Time Series Scatter Plot Histogram Link diagram (directed graph) Heat Map Other Timelines Geographic maps Pie charts

Software to do visualization: Rayon
Rayon was written to work and play well with SiLK and Python It fits in with the Unix pipe mode of scripting It doesn’t (yet?) handle everything we want to do Viz Type Application Bar Chart rycategories Time Series rytimeseries Scatter Plot ryscatterplot Histogram Link diagram GraphViz Heat Map ryhilbert

What does our data look like?

Lets take a closer look 2009/04/20

2009/04/20 2009/04/22 2009/04/23 2009/04/24

Network Traffic Analysis - SiLK

Similar presentations

Presentation on theme: "Network Traffic Analysis - SiLK"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Network Traffic Analysis - SiLK

Similar presentations

Presentation on theme: "Network Traffic Analysis - SiLK"— Presentation transcript:

Similar presentations

About project

Feedback