Finding Self-similarity in Opportunistic People Networks Yung-Chih Chen 1 Ling-Jyh Chen 1, Yung-Chih Chen 1, Tony Sun 2 Paruvelli Sreedevi 1, Kuan-Ta Chen.

Slides:



Advertisements
Similar presentations
Network and Service Assurance Laboratory Analysis of self-similar Traffic Using Multiplexer & Demultiplexer Loaded with Heterogeneous ON/OFF Sources Huai.
Advertisements

Departments of Medicine and Biostatistics
1 Location-Aided Routing (LAR) in Mobile Ad Hoc Networks Young-Bae Ko and Nitin H. Vaidya Yu-Ta Chen 2006 Advanced Wireless Network.
Sampling Distributions (§ )
Finding Self-similarity in People Opportunistic Networks Ling-Jyh Chen, Yung-Chih Chen, Paruvelli Sreedevi, Kuan-Ta Chen Chen-Hung Yu, Hao Chu.
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
On Using Probabilistic Forwarding to Improve HEC-based Data Forwarding in Opportunistic Networks Ling-Jyh Chen 1, Cheng-Long Tseng 2 and Cheng-Fu Chou.
2  Something “feels the same” regardless of scale 4 What is that???
Chen Chu South China University of Technology. 1. Self-Similar process and Multi-fractal process There are 3 different definitions for self-similar process.
1 Self-Similar Ethernet LAN Traffic Carey Williamson University of Calgary.
By Libo Song and David F. Kotz Computer Science,Dartmouth College.
CMPT 855Module Network Traffic Self-Similarity Carey Williamson Department of Computer Science University of Saskatchewan.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
On the Self-Similar Nature of Ethernet Traffic - Leland, et. Al Presented by Sumitra Ganesh.
October 14, 2002MASCOTS Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University
無線區域網路中自我相似交通流量之 成因與效能評估 The origin and performance impact of self- similar traffic for wireless local area networks 報 告 者:林 文 祺 指導教授:柯 開 維 博士.
Opportunistic Networking (aka Pocket Switched Networking)
CS 6401 Network Traffic Characteristics Outline Motivation Self-similarity Ethernet traffic WAN traffic Web traffic.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
A Machine Learning-based Approach for Estimating Available Bandwidth Ling-Jyh Chen 1, Cheng-Fu Chou 2 and Bo-Chun Wang 2 1 Academia Sinica 2 National Taiwan.
Traffic modeling and Prediction ----Linear Models
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
Network Traffic Modeling Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, March 6.
Inference for regression - Simple linear regression
Detecting Node encounters through WiFi By: Karim Keramat Jahromi Supervisor: Prof Adriano Moreira Co-Supervisor: Prof Filipe Meneses Oct 2013.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
HSRP 734: Advanced Statistical Methods July 10, 2008.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
Random Sampling, Point Estimation and Maximum Likelihood.
Quantitative Skills 1: Graphing
Switch-and-Navigate: Controlling Data Ferry Mobility for Delay-Bounded Messages Liang Ma*, Ting He +, Ananthram Swami §, Kang-won Lee + and Kin K. Leung*
Repeated measures ANOVA in SPSS Cross tabulations Survival analysis.
A Hybrid Routing Approach for Opportunistic Networks Ling-Jyh Chen 1, Chen-Hung Yu 2, Tony Sun 3, Yung-Chih Chen 1, and Hao-hua Chu 2 1 Academia Sinica.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Identify.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
On Exploiting Transient Contact Patterns for Data Forwarding in Delay Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering.
PPWEB: A Peer-to-Peer Approach for Web Surfing On the Go Ling-Jyh Chen, Ting-Kai Huang Institute of Information Science, Academia Sinica, Taiwan Guang.
Chapter 12 Survival Analysis.
Borgan and Henderson:. Event History Methodology
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
1 Self Similar Traffic. 2 Self Similarity The idea is that something looks the same when viewed from different degrees of “magnification” or different.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 On the Levy-walk Nature of Human Mobility Injong Rhee, Minsu Shin and Seongik Hong NC State University Kyunghan Lee and Song Chong KAIST.
Self-generated Self-similar Traffic Péter Hága Péter Pollner Gábor Simon István Csabai Gábor Vattay.
An Evaluation of Routing Reliability in Non-Collaborative Opportunistic Networks Ling-Jyh Chen, Che-Liang Chiou, and Yi-Chao Chen Institute of Information.
01/20151 EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 24, 2015 Dr. N. Birkett, School of Epidemiology, Public.
Sampling and estimation Petter Mostad
The field of statistics deals with the collection,
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
01/20151 EPI 5344: Survival Analysis in Epidemiology Quick Review from Session #1 March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health &
Chapter 9 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
Two Types of Empirical Likelihood Zheng, Yan Department of Biostatistics University of California, Los Angeles.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
DATA ANALYSIS AND MODEL BUILDING LECTURE 7 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
1 Interesting Links. On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCORE Murad S. Taqqu BU Analysis.
Chapter 15: Exploratory data analysis: graphical summaries CIS 3033.
Estimating the Value of a Parameter Using Confidence Intervals
Oliver Schulte Machine Learning 726
Evaluation of Load Balancing Algorithms and Internet Traffic Modeling for Performance Analysis By Arthur L. Blais.
Mark E. Crovella and Azer Bestavros Computer Science Dept,
Ling-Jyh Chen and Ting-Kai Huang
CPSC 641: Network Traffic Self-Similarity
Kaplan-Meier survival curves and the log rank test
Presentation transcript:

Finding Self-similarity in Opportunistic People Networks Yung-Chih Chen 1 Ling-Jyh Chen 1, Yung-Chih Chen 1, Tony Sun 2 Paruvelli Sreedevi 1, Kuan-Ta Chen 1 Chen-Hung Yu 3, Hao-Hua Chu 3 1 Academia Sinica, Taiwan 2 UCLA, USA 3 National Taiwan University, Taiwan

Motivation Investigate fundamental properties of opportunistic networks Better understand network connectivity Solve the long been ignored censorship issue

Contribution Point out and recover censorship within mobility traces of opportunistic networks –Propose Censorship Removal Algorithm –Recover censored measurements Prove the inter-contact time process as self-similar for future research on opportunistic networks

Outline Trace Description Censorship Issue –Survival Analysis –Censorship Removal Algorithm Self-similarity

Trace Description UCSD campus trace*UCSD campus trace* –77 days, 275 nodes involved –Client-based trace PDAs record Wi-Fi based APs nearby Dartmouth College trace**Dartmouth College trace** –1,777 days, 5148 nodes involved –Interface-based trace APs maintain the association log for each wireless interface –77 days extracted for comparison Wireless Topology Discovery (WTD Project) *UCSD: Wireless Topology Discovery (WTD Project) RAWDAD **Dartmouth: RAWDAD

Basic Terms What is Contact ? –Two nodes are of their wireless radio range –Associated to the same AP at the same time What is Inter-contact Time ? –Period between two consecutive contacts Used to observe Network Connectivity –Distribution of inter-contact time Disconnection duration Reconnection frequency

Basic Terms (Con’t) Inter-contact time = 3 weeks (Weeks) Inter-contact time 7 weeks Inter-contact time ?? Observation End In the last case, the inter-contact time has been censored as 6 weeks. Case A Case B Case C

Censorship Inter-contact time samples end after the termination of the observation. Censored measurements are inevitable. UCSD Trace Dartmouth College Trace Censored Data

Survival Analysis Important in biostatistics, medicine, … –Estimate patients’ time to live/death –Map to censored inter-contact time samples Censored samples should have the same likelihood distribution as the uncensored’s. –Kaplan-Meier Product Limit Estimator –Kaplan-Meier Estimator (a.k.a. Survival Function or Product Limit Estimator)

Kaplan-Meier Estimator Suppose there are N samples ( t 1 <t 2 <t 3 …<t N ) At time t i : –d i uncensored samples (complete samples) –n i events (censored/uncensored) The survival function is:

Kaplan-Meier Estimator – An Example 10 inter-contact time samples: 1, 2 +, 3 +, 3.5 +, 4, 5 +,9, 9.5 +, 10, (in weeks, + for censorship) i-c time interval nini d i (death) c i (censored) Survival function S(t) 01000S(0)=1 (0,1]1010S(1)= 1* 9/10=0.9 (1,4]613S(4)=0.9*5/6=0.75 (4,9]411S(9)=0.75*3/4=0.56 (9,10]211S(10)=0.56 *1/2=0.28 (10,11]101S(11)=0.28*1/1= 0.28

Survival Analysis (Con’t) Inter-contact time dist.  power-law dist. Power-law dist. Survival Curve

Censorship Removal Algorithm Based on the survival function S(t) –t 1 < t 2 < t 3 …<t N (N : total sample number) –Death Ratio during t i ~ t i+1 : D( t i ) = S( t i-1 )-S( t i ) S( ti ) –Ci: # of censored samples at t i – Iteratively select Ci*D( t i ) samples from Ci Uniformly distribute their estimated inter-contact time by S( t i ) Mark them as uncensored samples –Terminate when all the censored samples are removed

Censorship Removal Algorithm (Con’t) Recovered inter-contact time measurements UCSD TraceDartmouth Trace

Censorship Removal Algorithm (Con’t) Compare the recovered values to their exact values in original trace. 80.4% censored measurements are recovered. Pr (T>t) 77 days (with censorship) 1,177 days (with exact values) Inter-contact time

Outline Trace Description Censorship Issue –Survival Analysis –Censorship Removal Algorithm Self-similarity

Self-Similarity What is self-similarity? –By definition, a self-similar object is exactly or approximately similar to part of itself. In opportunistic network, we focus on the network connectivity With recovered measurements, we prove inter- contact time series as a self-similar process –Reconnection/disconnection –Similar mobility pattern in people opp. networks

Self-Similarity A self-similar series –Distribution should be heavy-tailed –Examined by three statistical analyses Variance-Time Plot, R/S Plot, Periodogram Plot Estimated by a specific parameter : Hurst H should be in the range of 0.5~1 –Results of three methods should be in the 95% confidence interval of Whittle estimator

Self-Similarity (Con’t) Previous works show inter-contact time dist. as power-law dist. XA random variable X is called heavy-tailed: X –If P[X>x] ~ cx - α, with 0 ∞ –α can be found by log-log plot –Survival curves show the α for UCSD: 0.26 Dartmouth: 0.47

Self-Similarity (Con’t) Variance-Time MethodVariance-Time Method –Variance decreases very slowly, even when the size grows large The Hurst estimates are –UCSD: –Dartmouth: UCSD Dartmouth

Self-Similarity (Con’t) Rescaled Adjusted Range (R/S) methodRescaled Adjusted Range (R/S) method –Keep similar properties when the dataset is divided into several sub-sets The Hurst estimates are –UCSD: –Dartmouth: UCSD Dartmouth

Self-Similarity (Con’t) Periodogram MethodPeriodogram Method –Use the slope of power spectrum of the series as frequency approaches zero The Hurst estimates are –UCSD: –Dartmouth: UCSD Dartmouth

Self-Similarity (Con’t) Whittle EstimatorWhittle Estimator –Usually being considered as a more robust method –Provide a confidence interval Results of the three graphical methods are in the 95% confidence interval. Aggregation level (UCSD) Aggregation level (Dartmouth) Hurst Estimate 95% Confidence Interval

Conclusion Two major properties exists in modern opportunistic networks: –Censorship –Self-similarity Using CRA, we could recover censored inter-contact time to have more accurate datasets. With recovered datasets, we prove that inter-contact time series is self-similar.

Thank You !