Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vytautas Krakauskas LITNET CERT Swedbank SIRT

Similar presentations


Presentation on theme: "Vytautas Krakauskas LITNET CERT Swedbank SIRT"— Presentation transcript:

1 Vytautas Krakauskas LITNET CERT Swedbank SIRT
Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT

2 Problems Limited storage capacity Large data set processing time

3 Storage capacity Steadily increasing network traffic
Up to six months of history for incident handling I/O is the major bottleneck

4 Processing time Currently no SMP support in nfdump
Important if I/O bottleneck is resolved

5 Processing with Nfdump

6 Distributed processing

7 The idea Distribute nfcap files between multiple nodes
Process the files using nfdump Combine the output and return to nfsen Nfsen and nfdump usage should feel the same

8 1. File distribution nfcapd stores files on a temporary file system
due to "random" write of stat header copy to HDFS at the end of each interval bonus: limited backup while system is being tested Redundant copies on multiple nodes higher redundancy for faster processing and better reliability lower redundancy for larger storage capacity

9 Modified architecture

10 2. Processing Process using nfdump
I/O through stdin/stdout Each node works only with locally stored files Currently based on the first block Aggregate when possible based on: stats type, aggregation options, filters Copy the results back to the HDFS for the combiner

11 3. Combining Combine the results as a single stream
a custom tool (nfcat) some information is lost (e.g. ident) nfdump does the final processing single instance (a bottleneck) Displays the results

12 Modified architecture

13 Comparison Limited to nfdump Original Distributed
Additional delays when using nfsen Original single nfdump instance files on a local file system Distributed Two nodes processes per node: 2 HDFS replication factor: 2

14 Comparison Top10 IPs, ordered by flows 1-18 files (5-90 minute period)
Filter “proto icmp”

15 Comparison

16 Conclusions Overhead has a significant impact for short periods
Initialization Job scheduling Combining and re-processing Limited speed gains due to aggregation Filtering is essential for achieving good speed gains Still needs some issues to be addressed

17 Thank you!

18 Patches (nfdist branch)
The code Patches (nfdist branch)

19 Comparison: bad case


Download ppt "Vytautas Krakauskas LITNET CERT Swedbank SIRT"

Similar presentations


Ads by Google