Presentation is loading. Please wait.

Presentation is loading. Please wait.

NIMD 1 Project Argus Massive Data NIMD PI Meeting December 2, 2004.

Similar presentations


Presentation on theme: "NIMD 1 Project Argus Massive Data NIMD PI Meeting December 2, 2004."— Presentation transcript:

1 NIMD 1 Project Argus Massive Data NIMD PI Meeting December 2, 2004

2 NIMD 2 Massive Structured Data Static data –Focus on 10 10 to 10 12 records –Typical record size 100 to 1,000 bytes –Typical collection size between terabyte and petabyte –Smaller than large collections including unstructured data because field size is much smaller Streaming data –1,000 to 5,000 records per second –Approx 100M to 400M records per day –Static data corresponds to a few years of stream

3 NIMD 3 Approximate Structured Matching Range or Point Query Exact Match Near Match No Match Distance

4 NIMD 4 Data Matching and Retrieval Matcher finds data that matches query exactly or is close to it Different versions for different data volumes VolumeTime complexity In-memory10 6 to 10 8 Logarithmic Disk-based10 7 to 10 10 Low power Distributed10 9 to 10 12 Same as underlying matcher

5 NIMD 5 Disk-Matcher Experiments Retrieval Time (msec) 100 10 1 10 2 10 3 10 4 10 5 10 6 Number of Records Range queries Approximate queries Exact queries Available memory lg n n 0.15 lg n n 0.5

6 NIMD 6 Monitoring Streaming Data Rete Network Generator Query Rete Networks Data Tables Analyst Identified Threats Intermediate Tables Data Streams Query Table Stream Anomaly Monitoring Do_queries Scheduler

7 NIMD 7 Monitoring Streaming Data Monitoring structured data streams for anomalies, hazards or alerts posted by analysts. Alert profiles = continuous persistent queries (10 5 ) Daily stream volumes target 10 8 + records. System is optimized for very high selectivity queries –“ Needle in a field of haystacks ” challenge –Alert profiles can be anything (relational, aggregation, … ) Functions atop DBMS (now), or full DYNAMiX matcher (coming soon) Based on modified Rete algorithm

8 NIMD 8 Old ResultsNew Incremental Results (n+Δn) (m+Δm) = n m + Δn m + n Δm + Δn Δm When Δn and Δm are very small compared to n and m, rete time complexity of incremental join is worse case O(n+m), and using b-trees it goes to O( log n+ log m+Δn+Δm) Adapted Rete Algorithm

9 NIMD 9 Finding Novel Patterns in Data Primary topic of Hypothesis Generation and Tracking paper Scales well for massive data because algorithms are near linear in number of records, rather than n 2

10 NIMD 10 Need for Suitable Data Most suitable data is classified or proprietary Fabricated data does not have “right” distribution –Risk of tailoring solution to fabricated characteristics Ideal is real data processed to be unclassified, but still retaining relevant characteristics of original


Download ppt "NIMD 1 Project Argus Massive Data NIMD PI Meeting December 2, 2004."

Similar presentations


Ads by Google