Computer Performance Modeling Dirk Grunwald Prelude to Jain, Chapter 12 Laws of Large Numbers and The normal distribution.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Yaochen Kuo KAINAN University . SLIDES . BY.
Sampling: Final and Initial Sample Size Determination
Chapter 7 Introduction to Sampling Distributions
Sampling Distributions
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
1 Sampling Distributions Chapter Introduction  In real life calculating parameters of populations is prohibitive because populations are very.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
Chapter 6 Continuous Random Variables and Probability Distributions
CHAPTER 6 Statistical Analysis of Experimental Data
Part III: Inference Topic 6 Sampling and Sampling Distributions
Continuous Random Variables and Probability Distributions
Introduction to Probability and Statistics Chapter 7 Sampling Distributions.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Continuous Probability Distribution  A continuous random variables (RV) has infinitely many possible outcomes  Probability is conveyed for a range of.
Standard error of estimate & Confidence interval.
Chapter 6: Normal Probability Distributions
Fall 2013Biostat 5110 (Biostatistics 511) Week 7 Discussion Section Lisa Brown Medical Biometry I.
SAMPLING DISTRIBUTION
Chapter 4 Continuous Random Variables and Probability Distributions
Simulation Output Analysis
Chapter 6: Probability Distributions
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
Random Sampling, Point Estimation and Maximum Likelihood.
Ch5 Continuous Random Variables
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
1 Chapter 5 Continuous Random Variables. 2 Table of Contents 5.1 Continuous Probability Distributions 5.2 The Uniform Distribution 5.3 The Normal Distribution.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Continuous Random Variables.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Confidence Interval Estimation For statistical inference in decision making:
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Using the Tables for the standard normal distribution.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 11 Review Important Terms, Symbols, Concepts Sect Graphing Data Bar graphs, broken-line graphs,
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Continuous Random Variables Chapter 6.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Sampling and estimation Petter Mostad
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
SAMPLING DISTRIBUTION. 2 Introduction In real life calculating parameters of populations is usually impossible because populations are very large. Rather.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Sampling distributions BPS chapter 10 © 2006 W. H. Freeman and Company.
Chapter 6 Continuous Random Variables Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Virtual University of Pakistan
Confidence Intervals Cont.
Normal Probability Distributions
Sampling and Sampling Distributions
Analysis of Economic Data
Using the Tables for the standard normal distribution
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Counting Statistics and Error Prediction
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Chapter 8 Estimation.
Continuous Random Variables: Basics
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Computer Performance Modeling Dirk Grunwald Prelude to Jain, Chapter 12 Laws of Large Numbers and The normal distribution

Weak Law of Large Numbers u Assume we conduct a random experiment n times, and let S n be the number of times that event A occurs. u We intuitively assume that u Let A be an event with probability P[A] and suppose we perform a Bernoulli sequence of n trials, where a success corresponds to an occurrence of event A. u S n has Binomial distribution with E[S n ]=nP[A] and Var[S n ]=nP[A](1-P[A])

Weak Law of Large Numbers u Thus, E[S n /n]=1/n E[S n ]=P[A] u Var[S n /n]=1/n 2 Var[S n ] = P[A](1-P[A]) / n u Now, apply Chebyshev’s inequality... u We can make the right hand side arbitrarily small by increasing n. u This shows that P[A] can be estimated from S n /n.

How large should n be? u If we apply Chebyshev’s inequality, u Clearly, p(1-p) has maximum at 1/2. Hence, no matter the value of p, we need to be certain

How large should n be? u If we know approximately what the value of p is, we see that this will be satisfied if or u If we have no idea what p is, then will be a (very) sloppy bound.

Applying the Weak Law u Assuming that each terminal in an interactive system has the same probability p of being in use during the peak period of the day. u We want to know how many observations n need to be made such that u In other words, we want to know that we’ve approximated p within 0.1 with reasonable (95%) confidence.

Applying the Weak Law u If the first 100 observations indicate that p is ~0.2, how many more trials are needed? u Using the sloppy bound… u But, knowing that p = ~0.2, we can use

The Weak Law & Central Limit u The weak law is a powerful tool, but it’s fairly imprecise. u We’re about to look at the Central Limit Theorem, which show us that 62 samples would be enough to estimate p. u To understand the central limit theorem, we need to visit the normal distribution.

Normal Distribution u A continuous random variable X is normal with parameters  and  >0 if it has the density function u We indicate this by writing X~N( ,  2 ). u A standard normal is N(0,1), where the standarrd normal density is

Standard Normal Distribution u The corresponding standard normal distribution function is therefore: u The standard normal is important, because you can calculate every normal using it. u If X~N( ,  2 ),then u You look this up in the CRC, Stats books or in Mathematica

Properties of Normal Distributions u Suppose X 1, X 2, …, X n are n independent R.V.’s such that X 1 ~ N(  1,  1 2 ), etc. Then, Y= X 1 + X 2 + …+ X n is normally distributed with mean  1+  2+…+  n and variance  1 2+  2 2+…+  n 2 u The normal distribution is symmetric around the mean: f(  +x)= f(  -x) u This symmetry is needed when looking up the standard normal in tables.

The Standard Normal ++--  -2  +2  Area under curve is

Sample Normal Table

Example: Normal Distribution u Suppose the number of buffers for a message system X~N(100,100). u Calculate the probability that the number of buffers in use does not exceed 120. Well, 120 is two standard deviations out. Thus, looking up N(0,1) for the value 2 is u What’s the probability it’s between 80 and 120 buffers? This is F(2)-F(-2) = ( ), or u What’s P[X>=130]? This is 3 standard deviations out, tables show it’s or

Central Limit Theorem u Suppose X 1, X 2, …, X n are n identical independent R.V.’s with mean  and variance  2. Let S n = X 1 + X 2 + …+ X n. Then, for each x<y: u In other words, regardless of the underlying distribution, S n ~N( n , n  2 ) for some sufficiently large n.

But wait, it gets better! u The terms of S n don’t even have to have the same distribution, within some reasonable constraints. u This is the basis for the observation the sum of many random variables (height, IQ, grades in classes) tend to be normally distributed.

Limiting Samples using the Central Limit Theorem u Recall that a binomial distribution has mean np and variance npq, where q=(1-p). Now, consider an experiment where we determine if a sample obeys a property, and we want to determine P[A] from S n, much as in the Law of Weak Numbers. u We saw that allowed us to approximate the n needed to estimate that S n =p with some precision.

We can transform this into a something handled by the Central Limit Theorem

Applying the Central Limit Theorem u So, we can conclude.. u Or, that where..

Applying the Central Limit Theorem u Now, we look up the value of r that makes this equation hold. The definition of r now yields the following estimate for n. u The right hand side is because pq is maximal at p=1/2.

Example: Estimating Needed Samples u Same problem as earlier. Sample terminals 100 times, computer S n, and then assume that S n /n approximates p. u For  =0.05, we find that r=1.96 (using the normal tables). Using the previous equation, we find that n=96, assuming the approximate value of p=0.2. u However, this approximates p to the range We can do better.

Example: Estimating Needed Samples u “If we make 500 observations to estimate p and let  =0.05, what is the value of  ?” u In other words, what is the maximum error in the estimate at the 5% level of uncertainty? u As before,  =0.05 implies r=1.96. u This has values in the range , depending on p.

So who cares? u The central limit theorem and the normal distribution will be combined to produce a powerful tool -- the confidence interval (see Jain, chapter 13). u This tool will be used to determine when l Observations are “statisically different” l We’ve know we’ve made enough observations in an experiment l We can placate thesis committees and bosses.

Quantiles Area under graph is 0.20

Quantiles

Measures of Central Tendencies u Mean l Expectation for the distribution u Median l The Q50, or 50% quantile. l Half the samples have value less than this. u Mode l Most common value

Selecting the Right Central Metric

Detecting Skew

Harmonic Mean u Useful for analyzing benchmarks u “Suppose you drive your car one mile at 20 miles per hour and a second mile at 60 miles per hour. What is your average speed for the two miles?” l Not 40 MPH - we didn’t drive for equal times! l It took 3 minutes to drive first mile It took 1 minutes to drive second mile It took 4 minutes to drive a total of two miles u Therefore, average speed was 30 MPH

Harmonic Mean u If we define the harmonic mean to be... u Then, for our example, we get..

Harmonic Mean u Program A and B each require the 1,000,000 instructions on computer X. u Program A makes better use of caches, and executes at 2,000,000 inst/sec. (0.5 sec) u Program B only runs at 500,000 inst/sec (2sec) u What is the average instruction execution rate of program A and B?

Harmonic Mean u Harmonic mean should be used for summarizing performance expressed as a rate. It corresponds accurately with computation time that will actually be consumed by running real programs. Harmonic mean, when applied to a rate, is equivalent to calculating the total number of operations divided by the total time. u But, this only holds if the programs run for the same number of instructions, or the cars drive for the same distance at different speeds

Generalized (Weighted) Harmonic Mean u Suppose a sequence of programs of path lengths l 1, l 2, …, l n instructions run at the rates s 1, s 2, …., s n. u Then, the generalized harmonic mean is

Generalized Harmonic Mean u A company wants to measure the MIPS of one of their computers, based on the performance of three computers. Runs/DayInst/RunMIPS/Run f i l i S i u Use G.H.M with (f i *l i ) instructions, executed at rate S i. u Average MIPS is 3.974

Geometric Mean u The arithmetic mean is used if the sum of a quantity is of interest. The geometric mean is used if the product is of interest. u For example, l First level cache has 10% miss rate l Second level cache has 5% miss rate l You’d use the G.M. to compute the “average” miss rate.

Example of Geometric Mean u Given these improvements in different protocol layers: u Improvements in earlier layers influence performance of later layers. u Geometric mean of the improvements is 13%