Hash Function comparison for PSAMP purposes: results and suggestions Maurizio Molina,

Slides:



Advertisements
Similar presentations
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Advertisements

Tuning Skype Redundancy Control Algorithm for User Satisfaction Te-Yuan Huang, Kuan-Ta Chen, Polly Huang Proceedings of the IEEE Infocom Conference Rio.
G. Alonso, D. Kossmann Systems Group
Evaluation of Header Field Entropy for Hash-Based Packet Selection Evaluation of Header Field Entropy for Hash-Based Packet Selection Christian Henke,
Every Bit Counts – Fast and Scalable RFID Estimation Muhammad Shahzad and Alex X. Liu Dept. of Computer Science and Engineering Michigan State University.
1 Statistical Tests of Returns to Scale Using DEA Rajiv D. Banker Hsihui Chang Shih-Chi Chang.
CPSC 335 Computer Science University of Calgary Canada.
Sampling and Flow Measurement Eric Purpus 5/18/04.
Location Resident Services Emmanouil Koukoumidis Princeton University Group Talk on 04/15/09 1.
Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
Mean for sample of n=10 n = 10: t = 1.361df = 9Critical value = Conclusion: accept the null hypothesis; no difference between this sample.
PSY 307 – Statistics for the Behavioral Sciences
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
1 We draw a random sample, size n Sample mean is x-bar Sample standard deviation is s Estimate of population standard deviation is s. Real population standard.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
- 1 - Summary of P-box Probability bound analysis (PBA) PBA can be implemented by nested Monte Carlo simulation. –Generate CDF for different instances.
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
T T Population Variance Confidence Intervals Purpose Allows the analyst to analyze the population confidence interval for the variance.
 a fixed measure for a given population  ie: Mean, variance, or standard deviation.
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Statistical Process Control
Lin Yingpei (Huawei Technologies) doc.: IEEE /1438r0 Submission November 2013 Slide 1 Traffic Observation and Study on Virtual Desktop Infrastructure.
Draft-molina-flow-selection-00 Maurizio Molina,. 2 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Motivation, Background (1/2) Flow selection.
by B. Zadrozny and C. Elkan
Biostatistics: Measures of Central Tendency and Variance in Medical Laboratory Settings Module 5 1.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Uncertainty & Error “Science is what we have learned about how to keep from fooling ourselves.” ― Richard P. FeynmanRichard P. Feynman.
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Effect Size Calculation for Meta-Analysis Robert M. Bernard Centre for the Study of Learning and Performance Concordia University February 24, 2010 February.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
Estimation: Confidence Intervals Based in part on Chapter 6 General Business 704.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Section 9.3: Confidence Interval for a Population Mean.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Sampling and Filtering Techniques for IP Packet Selection - Update - draft-ietf-psamp-sample-tech-04.txt Tanja Zseby, FhG FOKUS Maurizio Molina, NEC Europe.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
§ 5.3 Normal Distributions: Finding Values. Probability and Normal Distributions If a random variable, x, is normally distributed, you can find the probability.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
T T Population Sample Size Calculations Purpose Allows the analyst to analyze the sample size necessary to conduct "statistically significant"
Chapter 7 Statistical Inference: Estimating a Population Mean.
DNPC08 Review of Standard LDZ System Charges 6 September 2010.
Math 4030 – 9a Introduction to Hypothesis Testing
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Sampling and Filtering Techniques for IP Packet Selection - Update - draft-ietf-psamp-sample-tech-02.txt Tanja Zseby, FhG FOKUS Maurizio Molina, NEC Europe.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Tracking Millions of Flows In High Speed Networks for Application Identification Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu and Bin Liut.
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
1 Probability and Statistics Confidence Intervals.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 4.1 Hash Functions and Data Integrity A cryptographic hash function can provide assurance of data integrity. ex: Bob can verify if y = h K (x) h is a.
A New Class of Mobility Models for Ad Hoc Wireless Networks Rahul Amin Advisor: Dr. Carl Baum Clemson University SURE 2006.
Module 9.4 Random Numbers from Various Distributions -MC requires the use of unbiased random numbers.
Flow sampling in IPFIX: Status and suggestion for its support Maurizio Molina,
ESTIMATING WEIGHT Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257 RG712.
Confidence Intervals and Sample Size
MSA / Gage Capability (GR&R)
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Chapter 9: Inferences Involving One Population
PSAMP MIB Status: Document Changes
Rutgers Intelligent Transportation Systems (RITS) Laboratory
CONCEPTS OF ESTIMATION
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
Chapter 13 - Confidence Intervals - The Basics
Chapter 14 - Confidence Intervals: The Basics
Ambient Monitoring Initial Report
Presentation transcript:

Hash Function comparison for PSAMP purposes: results and suggestions Maurizio Molina,

2 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Motivation, Background In PSAMP, hash functions operating on portion of packet header and/or payload are useful for two reasons: –Emulate random sampling: Sampling ID –Generate a “compact” packet identifier: Digest ID It’s necessary that PSAMP indicates which hash function to use, for consistent packet sampling and identification But requirements are different in the two cases, so the criteria leading to the choice of the “best” hash function are different –And the choice of the “best” hash function can be different as well! We compared 4 hash functions: –IPSX –BOB –MMH –CRC32

3 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Hash functions for random sampling emulation Requirements: –Good uniformity of distribution: the Sampling ID must be uniformly distributed over the Hash Range (the space of the possible Hash results) Ideally, also when the hash input is not uniform at all! –Computation Speed: it must operate at line rate

4 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Testing for Uniformity of distribution: method (1/2) Subdivide the Hash range in N bins, evaluate the fraction of Hash results falling in each bin –Ideally: 1/N - but in reality…. Repeat the experiment 60 times,and calculate confidence intervals 1/N

5 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Testing for Uniformity of distribution: method (2/2) Metrics: –Std deviation of averages –Average of conf. Interval size The lower the metric, the better! 1/N better worse better worse 1/N

6 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Testing for Uniformity of distribution: results Performances of the 4 Hash functions are very close….. –Both with real and synthetic input packet traces

7 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Testing for speed: results IPSX is much faster (6.69 times faster than BOB)!

8 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Hash functions for random sampling emulation: Conclusion IPSX, which is the simplest and fastest, has uniformity of distribution comparable to the other ones –IPSX is the preferred one! –MUST for IPSX, MAY for BOB (second in rank)

9 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Hash functions for compact pkt identifier generation Requirements: –Low collision probability of digest ID Ideally, coll. prob. should be low also when the hash inputs are very similar (or “slowly variant”) –Computation Speed, but more relaxed wrt the random sampling emulation case, as this Hash will likely operate only on sampled packets

10 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Testing for collision probability: results We excluded IPSX because its fixed input Key size (16 bytes) is a limitation for achieving small collision probabilities Results: –BOB and CRC32 exploit better than mmh the longer keys –BOB and CRC32 have similar performances

11 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Testing for speed: results BOB is the best, but the difference from CRC32 is small !

12 © NEC Europe Ltd., 2002 Network Laboratories, Heidelberg Hash functions for compact pkt identifier generation: Conclusion Were it for these results only, we should indicate BOB as the preferred one But differences with CRC32 seem small, while CRC32 is more established –In draft-ietf-psamp-sample-tech-04.txt we indicated CRC32 as the preferred one! –MUST for CRC32, MAY for BOB (first in rank, but “new”) Discussion: does this “close” the issue too early (tests were limited…)? –Alternative: Indicate two MUSTs (CRC32 and BOB)?