The University of Texas at Arlington Topics in Random Processes CSE 5301 – Data Modeling Guest Lecture: Dr. Gergely Záruba.

Slides:



Advertisements
Similar presentations
Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.
Advertisements

Channel Allocation Protocols. Dynamic Channel Allocation Parameters Station Model. –N independent stations, each acting as a Poisson Process for the purpose.
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
SE503 Advanced Project Management Dr. Ahmed Sameh, Ph.D. Professor, CS & IS Project Uncertainty Management.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Sampling Distributions (§ )
Random number generation Algorithms and Transforms to Univariate Distributions.
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Chapter 7(7b): Statistical Applications in Traffic Engineering Chapter objectives: By the end of these chapters the student will be able to (We spend 3.
Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.
Random-Variate Generation. Need for Random-Variates We, usually, model uncertainty and unpredictability with statistical distributions Thereby, in order.
Prof. Bart Selman Module Probability --- Part d)
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Evaluating Hypotheses
Probability By Zhichun Li.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Chapter Sampling Distributions and Hypothesis Testing.
Probability Distributions Random Variables: Finite and Continuous A review MAT174, Spring 2004.
Operations Management
Statistical inference Population - collection of all subjects or objects of interest (not necessarily people) Sample - subset of the population used to.
The Central Limit Theorem For simple random samples from any population with finite mean and variance, as n becomes increasingly large, the sampling distribution.
R. Kass/S07 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Introduction to Statistical Inferences
Simulation Output Analysis
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
Sampling and Confidence Interval
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
Traffic Modeling.
© 2003 Prentice-Hall, Inc.Chap 6-1 Business Statistics: A First Course (3 rd Edition) Chapter 6 Sampling Distributions and Confidence Interval Estimation.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Random Sampling, Point Estimation and Maximum Likelihood.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Physics 270 – Experimental Physics. Standard Deviation of the Mean (Standard Error) When we report the average value of n measurements, the uncertainty.
1 Statistical Distribution Fitting Dr. Jason Merrick.
CS433 Modeling and Simulation Lecture 15 Random Number Generator Dr. Anis Koubâa 24 May 2009 Al-Imam Mohammad Ibn Saud Islamic University College Computer.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Lab 3b: Distribution of the mean
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Medium Access Control Sub Layer
COMP155 Computer Simulation September 8, States and Events  A state is a condition of a system at some point in time  An event is something that.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
CS3502: Data and Computer Networks Local Area Networks - 1 introduction and early broadcast protocols.
Sampling and estimation Petter Mostad
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Topic 5: Continuous Random Variables and Probability Distributions CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text,
CS3502: Data and Computer Networks Local Area Networks - 1 introduction and early broadcast protocols.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Lecture 5 Introduction to Sampling Distributions.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Continuous Random Variables. Probability Density Function When plotted, continuous treated as discrete random variables can be “binned” form “bars” A.
R. Kass/W04 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Confidence Intervals Cont.
Sampling Distributions and Estimation
The Gaussian Probability Distribution Function
Chapter 7: Sampling Distributions
Sampling Distributions (§ )
Further Topics on Random Variables: 1
Continuous Random Variables: Basics
Presentation transcript:

The University of Texas at Arlington Topics in Random Processes CSE 5301 – Data Modeling Guest Lecture: Dr. Gergely Záruba

Poission Process The Poission process is very hand when modeling birth/death processes, like arrival of calls to a telephone system. We can approach this by discretizing time, and assuming that in any time slot an event is happening with probability p. However two events cannot happen in the same time slot. Thus we need to reduce the size of the time slots to infinitesimally small slots. Thus, we need to talk about the rate (events/time). Number of events in a given time duration is thus Poisson distributed The time between consecutive events (inter-arrival time) is inverse-exponentially distributed. 2

Heavy Tails A pdf is a function that is non-negative and for which the indefinit integral is 1. We have looked at pdf-s that decay exponentially (e.g., gaussian, inverse-exponential) Can we create a pdf-like function that decays like 1/x ? - Although the function converges to 0, the integral of the function has a logarithmic flavor and thus will never integrate to 1. Can we create a pdf-like function that decays like x^-2? - The integral of the function has 1/x flavor and thus can work as a pdf. However, when calculating the mean the integral turns the function again to a divergent function. Thus this pdf has an infinite mean. Similarly, we can see, that a x^-3 like decay can result in a pdf with a finite mean but infinite variance. These latter two pdfs are good examples of heavy tailed distributions! Unfortunately, many processes have heavy-tailed features. 3

Heavy Tail’s Impact on CLT For the Central Limit Theorem to work, the distributions from which the random numbers added are generated have to have finite means and variances. Thus, if the underlying distribution from which the samples come is heavy tailed, then we cannot use significance testing and confidence intervals. Unfortunately, it has been shown that Internet traffic can be modeled by a Pareto distribution most closely. Thus if traffic is modeled this way, then significance of results cannot be established. 4

Random Numbers in Computers Truly random numbers in proper digital logic do not exist. We can use algorithms that generate sequences of numbers that have statistical properties as if they were generated by a uniform distribution. Pseudo-random and quazi-random number generators. Pseudo-random number generators generate a random looking integer between 1 and MAX_RAND. They are actually sequences of numbers, where the seed is is the first number in the series. This lack of randomness is actually great for computer simulations: our experiments can be repeated deterministically. We set the seed usually once at the beginning. Different experiments are created by setting different seeds. 5

Random Variate Generation So, we can generate uniform random numbers. By dividing by MAX_RAND we have a semi-continous random between (0,1). How can we use this to generate random numbers of different distributions? We need to obtain the cdf of the distribution and invert it. Then substituting the (0,1) uniform random into the inverse cdf we get the properly distributed random number. As an example, the inverse exponential distribution’s inverse cdf is: -log(1-u)/lambda (where u is the (0,1) random) 6

How about Generating Normal? Unfortunately, the normal distribution does not have a closed form for its inverse cdf (and the normal is not the only such distribution) We could sum up a large number (>30) of uniform randoms and scale it to the proper mean and variance. There are other approximations, usually based on trigonometric functions. 7

Resolution of Random Numbers and Their Impact When we create rand()/MAX_RAND, we end up with a floating point number. However not only is this number a rational number (we cannot have irrational numbers in computers), its values are never closer than 1/MAX_RAND. Thus, when substituting into the inverse cdf, the shape of the inverse cdf will determine how close the final random variates’ values will be. This is especially significant of a problem for heavy tailed distributions, where theoretically there are larger chances to create very small or very large numbers, which may be prohibited from generation by the resolution of the (0,1) random. 8

Case Study: Aloha The Aloha networking protocol was used to connect computers distributed over the Hawaiian islands with a shared bandwidth wireless network. The medium (the radio bandwidth) had to be shared by many radios. The task is to come up with effective and fair ways to share. Aloha was the first approach and it is characterized by the lack of an approach. As soon as data is ready to be transmitted, a radio node is going to transmit it. This is a straight forward application of a Poission process. 9

Case Study: Aloha Let us assume, that any time a node transmits, it is going to transmit for T duration. Let us assume that the arrival of this transmission events follows a Poission process (it is a straight forward assumption). Let us assume that a node starts transmitting at time t0. In order for this to be successful, no other node should transmit in the time period (t0-T,t0+T), i.e., the collision domain is 2*T. What is the utilization of this network? The rate of arrival is l. This is our best guess (the mean of the Poisson) of how many arrivals happen in a time frame. What I then the probability that none of the other nodes transmit for 2T duration? It is a Poisson with (2T,0). 10

Case Study: Aloha Thus it is: l^0 * e^(-2l) / 0! Which is e^-2l Thus the total utilization is: l*e^-2l After differentiating this we will find its maximum to be at l=0.5 where the value is 1/(2e)= 18% This is not great. We can easily double the performance by synchronizing the network (slotted-Aloha) and making sure that nodes can only start transmitting at time instances that are integer multiples of T. 11

Revisiting Confidence Intervals We have a random process that generates random numbers with an underlying mean  and standard deviation . We take N samples. Based on this N samples we can calculate the sample mean m and unbiased sample standard deviation s. We want to claim a C% confidence (e.g., C=0.95) We want to say something about the relationships of m and  more precisely we want the Pr(m-e C The interval (m-e,m+e) is our confidence interval The error e is: e= z(  )*s/  N if we have taken enough samples (N>30) or if we know that the underlying distribution is Gaussian. (otherwise, we have to calculate with t(  ) ). z(  ) is the  -th percentile of the standard normal (e.g., for a 95% confidence it would be 1.96) 12

Being Confident the Relative Way Unlike social science statisticians, using computer simulations we usually do not have a limit on our sample size. We can always run more simulations (time permitting) with different seeds. We can draw our data points in figures with confidence bars over them, but this can result in messy figures. Sometimes it is better to just show our best estimate but making sure that we bound the error. Thus we can talk about figures having a confidence value and bound relative errors. Our relative error is the error e divided by the best guess m. e r =e/m We should run enough simulations (large enough N) to claim a large confidence that our relative error is less than a small number (in engineering 95% confidence with 5% relative error is usually an acceptable minimum) 13

Being Confident; the Recipe Make sure that simulation does not model heavy tailed distributions and that you have an argument that the underlying random variable has a finite mean ad variance. Set N 0 =30 (if you do not know that the underlying is a Gaussian) Run the simulation N 0 times with N 0 different seeds. Calculate the mean m 0, the standard deviation s 0, the error e 0, and the relative error e r0 based on a chosen confidence level C. If e r0 less than the target relative error e rt (e.g., 5%) then you are done with the data point. If not, you need to estimate how many simulation runs you need to be confident. This can be done based on the current data. Assume that the mean and the standard deviation are not going to change to the worse, then you can calculate the target error (e t0 < m 0 * e rt ). Turning the error’s function around: N= (z(  )*s/e)^2 we can estimate the number of samples needed by: N 1 = (z(  )*s 0 /e t0 )^2 Throw out the data points, chose N 1 different seeds (different than before) and repeat the experiment from bullet point 3. Iterate as long as bullet point 4 checks out (which is likely going to be your second iteration). 14

Case Study: Aloha, a Simulation Look at the main code in the Aloha simulation example. The code corresponds to the previous recipe. Try to enter simulation parameters that will not result in confidence after 30 runs. 15