Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.

Similar presentations


Presentation on theme: "Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential."— Presentation transcript:

1 Model Fitting Jean-Yves Le Boudec 0

2 Contents 1

3 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential model seems appropriate How can we fit the model, in particular, what is the value of  ? 2

4 Least Square Fit of Virus Infection Data 3 Least square fit  = 0.5173 Mean doubling time 1.34 hours Prediction at +6 hours: 100 000 hosts

5 Least Square Fit of Virus Infection Data In Log Scale 4 Least square fit  = 0.39 Mean doubling time 1.77 hours Prediction at +6 hours: 39 000 hosts

6 Compare the Two 5 LS fit in natural scale LS fit in log scale

7 Which Fitting Method should I use ? Which optimization criterion should I use ? The answer is in a statistical model. Model not only the interesting part, but also the noise For example 6  = 0.5173

8 How can I tell which is correct ? 7  = 0.39

9 Look at Residuals = validate model 8

10 9

11 Least Square Fit = Gaussian iid Noise Assume model (homoscedasticity) The theorem says: minimize least squares = compute MLE for this model This is how we computed the estimates for the virus example 10

12 Least Square and Projection Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example 11 Data point Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise

13 Confidence Intervals 12

14 13

15 Robustness to « Outliers » 14

16 A Simple Example Least Square L1 Norm Minimization 15

17 Mean Versus Median 16

18 2. Linear Regression Also called « ANOVA » (Analysis of Variance ») = least square + linear dependence on parameter A special case where computations are easy 17

19 Example 4.3 What is the parameter  ? Is it a linear model ? How many degrees of freedom ? What do we assume on  i ? What is the matrix X ? 18

20 19

21 Does this model have full rank ? 20

22 Some Terminology x i are called explanatory variable Assumed fixed and known y i are called response variables They are « the data » Assumed to be one sample output of the model 21

23 Least Square and Projection 22 Data point Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise

24 Solution of the Linear Regression Model 23

25 Least Square and Projection The theorem gives H and K 24 residuals Predicted response Estimated parameter Manifold Where the data point would lie if there would be no noise data

26 The Theorem Gives  with Confidence Interval 25

27 SSR Confidence Intervals use the quantity s s 2 is called « Sum of Squared Residuals » 26 residuals Predicted response data

28 Validate the Assumptions with Residuals 27

29 Residuals Residuals are given by the theorem 28 residuals Predicted response data

30 Standardized Residuals The residuals e i are an estimate of the noise terms  i They are not (exactly) normal iid The variance of e i is ???? A: 1- H i,i Standardized residuals are not exactly normal iid either but their variance is 1 29

31 Which of these two models could be a linear regression model ? A: both Linear regression does not mean that y i is a linear function of x i Achtung: There is a hidden assumption Noise is iid gaussian -> homoscedasticity 30

32 31

33 3. Linear Regression with L1 norm minimization = L1 norm minimization + linear dependency on parameter More robust Less traditional 32

34 This is convex programming 33

35 34

36 Confidence Intervals No closed form Compare to median ! Boostrap: How ? 35

37 36

38 4. Choosing a Distribution Know a catalog of distributions, guess a fit Shape Kurtosis, Skewness Power laws Hazard Rate Fit Verify the fit visually or with a test (see later) 37

39 Distribution Shape Distributions have a shape By definition: the shape is what remains the same when we Shift Rescale Example: normal distribution: what is the shape parameter ? Example: exponential distribution: what is the shape parameter ? 38

40 Standard Distributions In a given catalog of distributions, we give only the distributions with different shapes. For each shape, we pick one particular distribution, which we call standard. Standard normal: N(0,1) Standard exponential: Exp(1) Standard Uniform: U(0,1) 39

41 Log-Normal Distribution 40

42 41

43 Skewness and Curtosis 42

44 Power Laws and Pareto Distribution 43

45 Complementary Distribution Functions Log-log Scales 44 Pareto LognormalNormal

46 Zipf’s Law 45

47 46

48 Hazard Rate Interpretation: probability that a flow dies in next dt seconds given still alive Used to classify distribs Aging Memoriless Fat tail Ex: normal ? Exponential ? Pareto ? Log Normal ? 47

49 The Weibull Distribution Standard Weibull CDF: Aging for c > 1 Memoriless for c = 1 Fat tailed for c <1 48

50 Fitting A Distribution Assume iid Use maximum likelihood Ex: assume gaussian; what are parameters ? Frequent issues Censoring Combinations 49

51 Censored Data We want to fit a log normal distrib, but we have only data samples with values less than some max Lognormal is fat tailed so we cannot ignore the tail Idea: use the model and estimate F0 and a (truncation threshold) 50

52 51

53 Combinations We want to fit a log normal distrib to the body and pareto to the tail Model: MLE satisfies 52

54 53

55 5. Heavy Tails Recall what fat tail is Heavier than fat: 54

56 Heavy Tail means Central Limit does not hold Central limit theorem: a sum of n independent random variables with finite second moment tends to have a normal distribution, when n is large explains why we can often use normal assumption But it does not always hold. It does not hold if random variables have infinite second moment. 55

57 Central Limit Theorem for Heavy Tails 56 One Sample of 10000 points Pareto p = 1 normal qqplothistogramcomplementary d.f. log-log

58 57 1 sample, 10000 pointsaverage of 1000 samples p=1 p=1.5 p=2 p=2.5 p=3

59 Convergence for heavy tailed distributions 58

60 Importance of Second Moment 59

61 RWP with Heavy Tail Stationary ? 60

62 Evidence of Heavy Tail 61

63 Testing Heavy Tail Assume you have very large data set Else no statement can be made One can look at empirical cdf in log scale 62

64 Taqqu’s method A better method (numerically safer is as follows). Aggregate data multiple times 63

65 We should have and If  ≈ log ( m 2 / m 1 ) then measure p =  /  p est = average of all p’s 64

66 65 Example log ( 2) / p log ( 2)

67 Evidence of Heavy Tail 66 p = 1.08 ± 0.1

68 A Load Generator: Surge Designed to create load for a web server Used in next lab Sophisticated load model It is an example of a benchmark, there are many others – see lecture 67

69 User Equivalent Model Idea: find a stochastice model that represents user well User modelled as sequence of downloads, followed by “think time” Tool can implement several “user equivalents” Used to generate real work over TCP connections 68

70 Characterization of UE 69 Weibull dsitributions

71 Successive file requests are not independent Q: What would be the distribution if they were independent ? A: geometric 70

72 Fitting the distributions Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram) What other method could one use ? A: maximum likelihood with numerical optimization – issue is non iid-ness 71


Download ppt "Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential."

Similar presentations


Ads by Google