Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT.

Similar presentations


Presentation on theme: "Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT."— Presentation transcript:

1 Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT

2 Problems Limited storage capacity Large data set processing time

3 Storage capacity Steadily increasing network traffic Up to six months of history for incident handling I/O is the major bottleneck

4 Processing time Currently no SMP support in nfdump Important if I/O bottleneck is resolved

5 Processing with Nfdump

6 Distributed processing

7 The idea 1.Distribute nfcap files between multiple nodes 2.Process the files using nfdump 3.Combine the output and return to nfsen 4.Nfsen and nfdump usage should feel the same

8 1. File distribution nfcapd stores files on a temporary file system – due to "random" write of stat header – copy to HDFS at the end of each interval – bonus: limited backup while system is being tested Redundant copies on multiple nodes – higher redundancy for faster processing and better reliability – lower redundancy for larger storage capacity

9 Modified architecture

10 2. Processing Process using nfdump – I/O through stdin/stdout Each node works only with locally stored files – Currently based on the first block Aggregate when possible based on: – stats type, aggregation options, filters Copy the results back to the HDFS for the combiner

11 3. Combining Combine the results as a single stream – a custom tool (nfcat) – some information is lost (e.g. ident) nfdump does the final processing – single instance (a bottleneck) Displays the results

12 Modified architecture

13 Comparison Limited to nfdump – Additional delays when using nfsen Original – single nfdump instance – files on a local file system Distributed – Two nodes – processes per node: 2 – HDFS replication factor: 2

14 Comparison Top10 IPs, ordered by flows 1-18 files (5-90 minute period) Filter “proto icmp”

15 Comparison

16 Conclusions Overhead has a significant impact for short periods – Initialization – Job scheduling – Combining and re-processing Limited speed gains due to aggregation Filtering is essential for achieving good speed gains Still needs some issues to be addressed

17 Thank you!

18 The code – https://github.com/vytautas/nfdist Patches (nfdist branch) – https://github.com/vytautas/nfdump – https://github.com/vytautas/nfsen

19 Comparison: bad case


Download ppt "Nfsen + Hadoop Vytautas Krakauskas LITNET CERT Swedbank SIRT."

Similar presentations


Ads by Google