Known Probability Distributions

Slides:



Advertisements
Similar presentations
JMB Chapter 6 Part 1 v4 EGR 252 Spring 2012 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
Advertisements

Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.
JMB Chapter 6 Part 1 v2 EGR 252 Spring 2009 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
EGR Ch. 8 Part 1 and 2 Spring 2009 Slide 1 Fundamental Sampling Distributions  Introduction to random sampling and statistical inference  Populations.
Chapter 6 Continuous Random Variables and Probability Distributions
CHAPTER 6 Statistical Analysis of Experimental Data
Class notes for ISE 201 San Jose State University
Chapter 4 Continuous Random Variables and Probability Distributions
The role of probability in statistics In statistical inference, we want to make general statements about the population based on measurements taken from.
JMB Chapter 6 Lecture 3 EGR 252 Spring 2011 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
JMB Ch6 Lecture 3 revised 2 EGR 252 Fall 2011 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
JMB Ch6 Lecture2 Review EGR 252 Spring 2011 Slide 1 Continuous Probability Distributions Many continuous probability distributions, including: Uniform.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Continuous probability distributions
JMB Chapter 5 Part 2 EGR Spring 2011 Slide 1 Multinomial Experiments  What if there are more than 2 possible outcomes? (e.g., acceptable, scrap,
JMB Chapter 5 Part 1 EGR Spring 2011 Slide 1 Known Probability Distributions  Engineers frequently work with data that can be modeled as one of.
EGR Ch. 8 9th edition 2013 Slide 1 Fundamental Sampling Distributions  Introduction to random sampling and statistical inference  Populations and.
Some Common Discrete Random Variables. Binomial Random Variables.
Chap 5-1 Discrete and Continuous Probability Distributions.
President UniversityErwin SitompulPBST 9/1 Lecture 9 Probability and Statistics Dr.-Ing. Erwin Sitompul President University
Theoretical distributions: the other distributions.
Theoretical distributions: the Normal distribution.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Chapter 4 Applied Statistics and Probability for Engineers
Chapter 3 Applied Statistics and Probability for Engineers
Fundamental Sampling Distributions
MECH 373 Instrumentation and Measurements
Covariance/ Correlation
Known Probability Distributions
Discrete Random Variables
Continuous Probability Distributions Part 2
Normal Probability Distributions
Continuous Distributions
Chapter 6. Continuous Random Variables
Populations and Samples
Multinomial Experiments
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Uniform and Normal Distributions
Populations and Samples
Chapter 5 Sampling Distributions
Multinomial Experiments
Chapter 8: Fundamental Sampling Distributions and Data Descriptions
Some Discrete Probability Distributions Part 2
Continuous Probability Distributions Part 2
Populations and Samples
Some Discrete Probability Distributions
Known Probability Distributions
Continuous Probability Distributions Part 2
Some Discrete Probability Distributions Part 2
Continuous Probability Distributions Part 2
Continuous Probability Distributions Part 2
Chapter 8: Fundamental Sampling Distributions and Data Descriptions
Some Discrete Probability Distributions
Distributions Discrete and Continuous
Multinomial Experiments
Multinomial Experiments
Multinomial Experiments
Known Probability Distributions
Multinomial Experiments
Known Probability Distributions
Continuous Probability Distributions
Known Probability Distributions
Hypergeometric Distribution
Multinomial Experiments
Continuous Probability Distributions Part 2
Geometric Poisson Negative Binomial Gamma
Multinomial Experiments
Known Probability Distributions
Presentation transcript:

Known Probability Distributions Engineers frequently work with data that can be modeled as one of several known probability distributions. Being able to model the data allows us to: model real systems design predict results Key discrete probability distributions include: binomial / multinomial negative binomial hypergeometric Poisson EGR 252 - 6

Discrete Uniform Distribution Simplest of all discrete distributions All possible values of the random variable have the same probability, i.e., f(x; k) = 1/ k, x = x1 , x2 , x3 , … , xk Expectations of the discrete uniform distribution draw the uniform distribution EGR 252 - 6

Binomial & Multinomial Distributions Bernoulli Trials Inspect tires coming off the production line. Classify each as defective or not defective. Define “success” as defective. If historical data shows that 95% of all tires are defect-free, then P(“success”) = 0.05. Signals picked up at a communications site are either incoming speech signals or “noise.” Define “success” as the presence of speech. P(“success”) = P(“speech”) Administer a test drug to a group of patients with a specific condition. P(“success”) = ___________ Bernoulli Process n repeated trials the outcome may be classified as “success” or “failure” the probability of success (p) is constant from trial to trial repeated trials are independent. EGR 252 - 6

Binomial Distribution Example: Historical data indicates that 10% of all bits transmitted through a digital transmission channel are received in error. Let X = the number of bits in error in the next 4 bits transmitted. Assume that the transmission trials are independent. What is the probability that Exactly 2 of the bits are in error? At most 2 of the 4 bits are in error? more than 2 of the 4 bits are in error? The number of successes, X, in n Bernoulli trials is called a binomial random variable. remember, YOU define what a “success” is … e.g., votes, defects, errors EGR 252 - 6

Binomial Distribution The probability distribution is called the binomial distribution. b(x; n, p) = , x = 0, 1, 2, …, n where p = _________________ q = _________________ For our example, b(x; n, p) = _________________ p = probability of success (error in transmission) q = probability of failure = 1-p b(x; n, p) = (4 choose x)(0.1)x(0.9)4-x , x = 0,1,2,3,4 EGR 252 - 6

For Our Example … What is the probability that exactly 2 of the bits are in error? At most 2 of the 4 bits are in error? b(x; n, p) = (4 choose x)(0.1)x(0.9)4-x , x = 0,1,2,3,4 P(X = 2) = (4 choose 2) (0.1)2(0.9)2 = 0.0486 P(X < 2) = P(0) + P(1) + P(2) = (4 choose 0) (0.1)0(0.9)4 + (4 choose 1) (0.1)1(0.9)3 + (4 choose 2) (0.1)2(0.9)2 = 0.9963 EGR 252 - 6

Your turn … What is the probability that more than 2 of the 4 bits are in error? b(x; n, p) = (4 choose x)(0.1)x(0.9)4-x , x = 0,1,2,3,4 P(X > 2) = P(3) + P(4) = (4 choose 3) (0.1)3(0.9)1 + (4 choose 4) (0.1)4(0.9)0 = 0.0037 P(X < 2) = P(0) + P(1) + P(2) = (4 choose 0) (0.1)0(0.9)4 + (4 choose 1) (0.1)1(0.9)3 + (4 choose 2) (0.1)2(0.9)2 = 0.9963 EGR 252 - 6

Expectations of the Binomial Distribution The mean and variance of the binomial distribution are given by μ = np σ2 = npq Suppose, in our example, we check the next 20 bits. What are the expected number of bits in error? What is the standard deviation? μ = ___________ σ2 = __________ , σ = __________ μ = np = 20(0.1) = 2 σ2 = npq = 20(0.1)(0.9)= 1.8, σ = 1.34 EGR 252 - 6

Another example A worn machine tool produces 1% defective parts. If we assume that parts produced are independent, what is the mean number of defective parts that would be expected if we inspect 25 parts? What is the expected variance of the 25 parts? μ = np = 25(0.01) = 0.25 σ2 = npq = 25(0.01)(0.99)= 0.2475** NOTE: 0.2475 ≠ 0.25 EGR 252 - 6

Helpful Hints … Sometimes it helps to draw a picture. Suppose we inspect the next 5 parts … P(at least 3)  P(2 ≤ X ≤ 4)  P(less than 4)  Appendix Table A.1 (pp. 742-747) lists Binomial Probability Sums, ∑rx=0b(x; n, p) EGR 252 - 6

Your turn … Use Table A.1 to determine 1. b(x; 15, 0.4) , P(X ≤ 8) = ______________ 2. b(x; 15, 0.4) , P(X < 8) = ______________ 3. b(x; 12, 0.2) , P(2 ≤ X ≤ 5) = ___________ 4. b(x; 4, 0.1) , P(X > 2) = ______________ 0.9050 0.7869 0.9806 – 0.2749 = 0.7507 1 – 0.9963 = 0.0037 EGR 252 - 6

Multinomial Experiments What if there are more than 2 possible outcomes? (e.g., acceptable, scrap, rework) That is, suppose we have: n independent trials k outcomes that are mutually exclusive (e.g., ♠, ♣, ♥, ♦) exhaustive (i.e., ∑all k pi = 1) Then f(x1, x2, …, xk; p1, p2, …, pk, n) = EGR 252 - 6

Example Look at problem 5.22, pg. 152 f( __, __, __; ___, ___, ___, __) =_________________ = __________________________________ x1 = _______ p1 = x2 = p2 = n = _____ x3 = p3 = f( 5,2,1; 0.5, 0.25, 0.25, 8) = (8 choose 5,2,1)(0.5)5(0.25)2(0.25)1 = 8!/(5!2!1!)* )(0.5)5(0.25)2(0.25)1 = 21/256 or 0.082031 EGR 252 - 6

Hypergeometric Distribution Example*: Automobiles arrive in a dealership in lots of 10. Five out of each 10 are inspected. For one lot, it is know that 2 out of 10 do not meet prescribed safety standards. What is probability that at least 1 out of the 5 tested from that lot will be found not meeting safety standards? *from Complete Business Statistics, 4th ed (McGraw-Hill) Note the difference between binomial (assumes sampling “with replacement”) and hypergeometric (sampling “without replacement”) Also, binomial assumes independence, while hypergeometric does not. EGR 252 - 6

This example follows a hypergeometric distribution: A random sample of size n is selected without replacement from N items. k of the N items may be classified as “successes” and N-k are “failures.” The probability associated with getting x successes in the sample (given k successes in the lot.) Where, k = number of “successes” = 2 n = number in sample = 5 N = the lot size = 10 x = number found = 1 or 2 EGR 252 - 6

Hypergeometric Distribution In our example, = _____________________________ P(X > 1) = 0.556 + 0.222 = 0.778 EGR 252 - 6

Expectations of the Hypergeometric Distribution The mean and variance of the hypergeometric distribution are given by What are the expected number of cars that fail inspection in our example? What is the standard deviation? μ = ___________ σ2 = __________ , σ = __________ μ = nk/N = 5*2/10 = 1 σ2 = (5/9)(5*2/10)(1-2/10) = 0.444 σ = 0.667 EGR 252 - 6

Your turn … A worn machine tool produced defective parts for a period of time before the problem was discovered. Normal sampling of each lot of 20 parts involves testing 6 parts and rejecting the lot if 2 or more are defective. If a lot from the worn tool contains 3 defective parts: What is the expected number of defective parts in a sample of six from the lot? What is the expected variance? What is the probability that the lot will be rejected? N = 20 n = 6 k = 3 μ = nk/N = 6*3/20 =18/20=0.9 σ2 = (14/19)(6*3/20)(1-3/20) = 0.5637 P(X>2) = 1 – [P(0)+P(1)] = 1-0.7982=0.2018 = P(2)+P(3) = [3 choose 2]*[17 choose 4] / [20 choose 6] + [3 choose 3]*[17 choose 3] / [20 choose 6] = 0.1842 + 0.0175 = 0.2018 EGR 252 - 6

Binomial Approximation Note, if N >> n, then we can approximate this with the binomial distribution. For example: Automobiles arrive in a dealership in lots of 100. 5 out of each 100 are inspected. 2 /10 (p=0.2) are indeed below safety standards. What is probability that at least 1 out of 5 will be found not meeting safety standards? Recall: P(X ≥ 1) = 1 – P(X < 1) = 1 – P(X = 0) Hypergeometric distribution Binomial distribution h(0;100,5,20) = (20 choose 0)(80 choose 5)/(100 choose 5) = 0.3913 1-P(0) = 1- 0.3913 = 0.6807 From Table A1, n=5, p=0.2 b(0;5,0.2) = 0.3277 1-P(0) = 1-.3277 = 0.6723 NOTE: If N = 200, then the hypergeometric distribution yields P(X > 1) = 0.676 Comparing to example 5.14, we can see that the binomial approximation gets very close as N gets very large relative to n. (Compare to example 5.15, pg. 155) EGR 252 - 6

Negative Binomial Distribution Example: Historical data indicates that 30% of all bits transmitted through a digital transmission channel are received in error. An engineer is running an experiment to try to classify these errors, and will start by gathering data on the first 10 errors encountered. What is the probability that the 10th error will occur on the 25th trial? EGR 252 - 6

This example follows a negative binomial distribution: Repeated independent trials. Probability of success = p and probability of failure = q = 1-p. Random variable, X, is the number of the trial on which the kth success occurs. The probability associated with the kth success occurring on trial x is given by, Where, k = “success number” = 10 x = trial number on which k occurs = 25 p = probability of success (error) = 0.3 q = 1 – p = 0.7 EGR 252 - 6

Negative Binomial Distribution In our example, = _____________________________ b*(15;10,0.1) = (24 choose 9)(.3)10(.7)15 = 0.037 EGR 252 - 6

Geometric Distribution Example: In our example, what is the probability that the 1st bit received in error will occur on the 5th trial? This is an example of the geometric distribution, which is a special case of the negative binomial in which k = 1. The probability associated with the 1st success occurring on trial x is given by = __________________________________ (0.3)(0.7)4 = 0.072 EGR 252 - 6

Your turn … A worn machine tool produces 1% defective parts. If we assume that parts produced are independent: What is the probability that the 2nd defective part will be the 6th one produced? What is the probability that the 1st defective part will be seen before 3 are produced? How many parts can we expect to produce before we see the 1st defective part? (Hint: see Theorem 5.4, pg. 161) b*(6:2,0.01) = (5 choose 1)(.01)2(.99)4 = 0.00048 P(X<3) = P(1)+P(2) = (0.01)*(0.99)1-1 + (0.01)*(0.99)2-1 = 0.0199 μ = 1/p = 1/0.01 = 100 EGR 252 - 6

Poisson Process The number of occurrences in a given interval or region with the following properties: “memoryless” P(occurrence) during a very short interval or small region is proportional to the size of the interval and doesn’t depend on number occurring outside the region or interval. P(X>1) in a very short interval is negligible memoryless  number in one interval is independent of the number in a different interval EGR 252 - 6

Poisson Process Examples: Number of bits transmitted per minute. Number of calls to customer service in an hour. Number of bacteria in a given sample. Number of hurricanes per year in a given region. memoryless  number in one interval is independent of the number in a different interval EGR 252 - 6

Poisson Process Example An average of 2.7 service calls per minute are received at a particular maintenance center. The calls correspond to a Poisson process. To determine personnel and equipment needs to maintain a desired level of service, the plant manager needs to be able to determine the probabilities associated with numbers of service calls. What is the probability that fewer than 2 calls will be received in any given minute? EGR 252 - 6

Poisson Distribution The probability associated with the number of occurrences in a given period of time is given by, Where, λ = average number of outcomes per unit time or region = 2.7 t = time interval or region = 1 minute EGR 252 - 6

Our Example The probability that fewer than 2 calls will be received in any given minute is … P(X < 2) = P(X = 0) + P(X = 1) = __________________________ The mean and variance are both λt, so μ = _____________________ Note: Table A.2, pp. 748-750, gives Σt p(x;μ) P(x=0) = e-2.72.70/0! + e-2.72.71/1! = 0.2487 μ = 2.7 EGR 252 - 6

Poisson Distribution If more than 6 calls are received in a 3-minute period, an extra service technician will be needed to maintain the desired level of service. What is the probability of that happening? μ = λt = _____________________ P(X > 6) = 1 – P(X < 6) = _____________________ μ = 2.7*3 = 8.1 ≈ 8 see page 668 , with μ = 8 and r = 6, P(X < 6) = 0.3134 P = 1-0.3134 = 0.6866 EGR 252 - 6

Poisson Distribution EGR 252 - 6

Poisson Distribution The effect of λ on the Poisson distribution EGR 252 - 6

Continuous Probability Distributions Many continuous probability distributions, including: Uniform Normal Gamma Exponential Chi-Squared Lognormal Weibull Uniform Normal – Gamma Exponential Chi-Squared Lognormal Weibull - EGR 252 - 6

Uniform Distribution Simplest – characterized by the interval endpoints, A and B. A ≤ x ≤ B = 0 elsewhere Mean and variance: and draw distribution EGR 252 - 6

Example A circuit board failure causes a shutdown of a computing system until a new board is delivered. The delivery time X is uniformly distributed between 1 and 5 days. What is the probability that it will take 2 or more days for the circuit board to be delivered? interval = [1,5] f(x) = 1/(B-A) = 1/(5-1) = ¼, 1 < x < 5 (0 elsewhere) First: show the distribution and demonstrate the “intuitive” answer Then – P(X>2) = ∫25 (1/4)dx = 0.75 EGR 252 - 6

Normal Distribution The “bell-shaped curve” Also called the Gaussian distribution The most widely used distribution in statistical analysis forms the basis for most of the parametric tests we’ll perform later in this course. describes or approximates most phenomena in nature, industry, or research Random variables (X) following this distribution are called normal random variables. the parameters of the normal distribution are μ and σ (sometimes μ and σ2.) note: nonparametric tests are distribution-free, assume no underlying distribution (see ch 16) EGR 252 - 6

Normal Distribution The density function of the normal random variable X, with mean μ and variance σ2, is all x. (μ = 5, σ = 1.5) properties of the curve: peak is both the mean and the mode and occurs at x = μ curve is symmetrical about a vertical axis through the mean total area under the curve and above the horizontal axis = 1. points of inflection are at x = μ + σ EGR 252 - 6

Standard Normal RV … Note: the probability of X taking on any value between x1 and x2 is given by: To ease calculations, we define a normal random variable where Z is normally distributed with μ = 0 and σ2 = 1 EGR 252 - 6

Standard Normal Distribution Table A.3: “Areas Under the Normal Curve” EGR 252 - 6

Examples P(Z ≤ 1) = P(Z ≥ -1) = P(-0.45 ≤ Z ≤ 0.36) = draw the area on the picture … 1. P(Z < 1) = 0.8413 2. P(Z ≥ -1) = 0.1587 3. P(-0.45 ≤ Z ≤ 0.36) = P(Z < 0.36) – P(Z< -0.45) = 0.6406 – 0.3246 = 0.316 EGR 252 - 6

Your turn … Use Table A.3 to determine (draw the picture!) 1. P(Z ≤ 0.8) = 2. P(Z ≥ 1.96) = 3. P(-0.25 ≤ Z ≤ 0.15) = 4. P(Z ≤ -2.0 or Z ≥ 2.0) = 1. P(Z ≤ 0.8) = 0.7881 2. P(Z ≥ 1.96) = 1 – 0.975 = 0.025 (=P(Z < -1.96)) Note symmetry!! 3. P(-0.25 ≤ Z ≤ 0.15) = 0.5596 – 0.4013 = 0.1583 4. P(Z ≤ -2.0 or Z ≥ 2.0) = 2 * 0.0228 = 0.0456 EGR 252 - 6

The Normal Distribution “In Reverse” Example: Given a normal distribution with μ = 40 and σ = 6, find the value of X for which 45% of the area under the normal curve is to the left of X. If P(Z < k) = 0.45, k = ___________ Z = _______ X = _________ Will X be greater than 40? k = -0.125 Z = -0.125 = (X – 40)/6 Z = (-0.125*6)+40 = 39.25 EGR 252 - 6

Normal Approximation to the Binomial If n is large and p is not close to 0 or 1, or if n is smaller but p is close to 0.5, then the binomial distribution can be approximated by the normal distribution using the transformation: NOTE: add or subtract 0.5 from X to be sure the value of interest is included (draw a picture to know which) Look at example 6.15, pg. 191 EGR 252 - 6

Look at example 6.15, pg. 191 p = 0.4 n = 100 μ = ____________ σ = ______________ if x = 30, then z = _____________________ and, P(X < 30) = P (Z < _________) = _________ μ= np = 100*0.4 = 40 σ = sqrt(npq) = sqrt(100*0.4*0.6) = 4.899 draw the picture! z = ((30-0.5) – 40)/4.899 = -2.14 P(Z < -2.14) = 0.0162 EGR 252 - 6

Your Turn Refer to the previous example, DRAW THE PICTURE!! What is the probability that more than 50 survive? What is the probability that exactly 45 survive? x > 50, z = ((50 + 0.5) – 40)/4.899 = 2.14 P(Z > 2.14) = P(Z < -2.14) = 0.0162 (by symmetry) 2. x = 45, z1 = (45.5-40)/4.899 = 1.12 z2 = (44.5 – 40)/4.899 = 0.9816 P(X = 45) = P(z < 1.12) – P(z < 0.98) = 0.8686 – 0.8365 = 0.0321 NOTE: b(45;100,0.4) = (100 choose 45)*0.4450.655 = 0.0478 EGR 252 - 6

Gamma & Exponential Distributions Recall the Poisson Process Number of occurrences in a given interval or region “Memoryless” process Sometimes we’re interested in the time or area until a certain number of events occur. For example An average of 2.7 service calls per minute are received at a particular maintenance center. The calls correspond to a Poisson process. What is the probability that up to a minute will elapse before 2 calls arrive? How long before the next call? EGR 252 - 6

Gamma Distribution The density function of the random variable X with gamma distribution having parameters α (number of occurrences) and β (time or region). x > 0. μ = αβ σ2 = αβ2 Describes the time until a specified # of Poisson events occurs. EGR 252 - 6

Exponential Distribution Special case of the gamma distribution with α = 1. x > 0. Describes the time until or time between Poisson events. μ = β σ2 = β2 EGR 252 - 6

Example An average of 2.7 service calls per minute are received at a particular maintenance center. The calls correspond to a Poisson process. What is the probability that up to a minute will elapse before 2 calls arrive? β = ________ α = ________ P(X ≤ 1) = _________________________________ β = 1/λ = 1/(2.7) = 0.3704 α = 2 P(X < 1) = 0∫1 (1/ β2) x e-x/ β dx = 2.72 0∫1 x e -2.7x dx = [-2.7xe-2.7x – e-2.7x]01 = 1 – e-2.7 (1 + 2.7) = 0.7513 EGR 252 - 6

Example (cont.) What is the expected time before the next call arrives? β = ________ α = ________ μ = _________________________________ Exponential distribution μ = β = 1/2.7 = 0.3407 min. EGR 252 - 6

Your turn … Look at problem 6.40, page 205. EGR 252 - 6 α = 2 β = 3 α = 2 β = 3 P(x>9) = = 1/9 9∫∞ x e-x/ 3 dx = [-x/3 * e-x/3 - e-x/3]9∞ = 4 e-3 = 0.1991 EGR 252 - 6

Chi-Squared Distribution Special case of the gamma distribution with α = ν/2 and β = 2. x > 0. where ν is a positive integer. single parameter,ν is called the degrees of freedom. μ = ν σ2 = 2ν Note: this will become important when we start talking about statistical inference. Stay tuned! EGR 252 Ch. 6 EGR 252 - 6 52 52

Lognormal Distribution When the random variable Y = ln(X) is normally distributed with mean μ and standard deviation σ, then X has a lognormal distribution with the density function, Uses – reliability and maintainability; environmental engineers – concentration of pollutants, particle size in emissions; long term rate of return on stock investments EGR 252 Ch. 6 EGR 252 - 6 53 53

Example Look at problem 6.72, pg. 207 … Since ln(X) has normal distribution with μ = 5 and σ = 2, the probability that X > 50,000 is, P(X > 50,000) = __________________________ P(X > 50,000) = 1 – P(X < 50,000) = 1 – P (Z < (ln(50,000) – 5)/ 2) = 1 – P (Z < 2.91) = 1 - 0.9982 = 0.0018 EGR 252 - 6 EGR 252 Ch. 6 54 54

Wiebull Distribution Used for many of the same applications as the gamma and exponential distributions, but does not require memoryless property of the exponential Uses – EGR 252 Ch. 6 EGR 252 - 6 55 55

Example Designers of wind turbines for power generation are interested in accurately describing variations in wind speed, which in a certain location can be described using the Weibull distribution with α = 0.02 and β = 2. A designer is interested in determining the probability that the wind speed in that location is between 3 and 7 mph. P(3 < X < 7) = ___________________________ P(3 < X < 7) = F(7) – F(3) = [1 – e – (0.02)7^2] – [1 – e – (0.02)3^2] = 0.45996 = 0.46 EGR 252 - 6 EGR 252 Ch. 6 56 56

Populations and Samples Population: “a group of individual persons, objects, or items from which samples are taken for statistical measurement” Sample: “a finite part of a statistical population whose properties are studied to gain information about the whole” (Merriam-Webster Online Dictionary, http://www.m-w.com/, October 5, 2004) EGR 252 - 6

Examples Population Samples Students pursuing undergraduate engineering degrees Cars capable of speeds in excess of 160 mph. Potato chips produced at the Frito-Lay plant in Kathleen Freshwater lakes and rivers Samples Samples: 1000 engineering students selected at random from all engineering programs in the US. 50 cars selected at random from among those certified as having achieved 160 mph or more during 2003. 10 chips selected at random every 5 minutes as the conveyor passes the inspector. 4 samples taken from randomly selected locations in randomly selected and representative freshwater lakes and rivers OTHERS? EGR 252 - 6

Basic Statistics (review) 1. Sample Mean: Example: At the end of a team project, team members were asked to give themselves and each other a grade on their contribution to the group. The results for two team members were as follows: = ___________________ XQ = 87.5 XS = 85 Q S 92 85 95 88 75 78 EGR 252 - 6

Basic Statistics (review) 1. Sample Variance: For our example: SQ2 = ___________________ SS2 = ___________________ Q S 92 85 95 88 75 78 S2Q = 7.593857 S2S = 7.25718 EGR 252 - 6

Your Turn Work in groups of 4 or 5. Find the mean, variance, and standard deviation for your group of the (approximate) number of hours spent working on homework each week. EGR 252 - 6

Sampling Distributions If we conduct the same experiment several times with the same sample size, the probability distribution of the resulting statistic is called a sampling distribution Sampling distribution of the mean: if n observations are taken from a normal population with mean μ and variance σ2, then: EGR 252 - 6

Central Limit Theorem Given: Then, X : the mean of a random sample of size n taken from a population with mean μ and finite variance σ2, Then, the limiting form of the distribution of is _________________________ The standard normal distribution n(z;0,1) EGR 252 - 6

Central Limit Theorem If the population is known to be normal, the sampling distribution of X will follow a normal distribution. Even when the distribution of the population is not normal, the sampling distribution of X is normal when n is large. NOTE: when n is not large, we cannot assume the distribution of X is normal. EGR 252 - 6

Example: The time to respond to a request for information from a customer help line is uniformly distributed between 0 and 2 minutes. In one month 48 requests are randomly sampled and the response time is recorded. What is the probability that the average response time is between 0.9 and 1.1 minutes? μ =______________ σ2 = ________________ μX =__________ σX2 = ________________ Z1 = _____________ Z2 = _______________ P(0.9 < X < 1.1) = _____________________________ f(x) = ½, 0<x<2 (uniform dist) μ = (b-a)/2 = 1 σ2 = (b-a)2/12 = 1/3 z = (x-μ) (σ/sqrt(n)) μx = 1 σx2 = (1/3)/48 = 1/144 Z1 = (.9-1)/(1/12) = -1.2 Z2 = (1.1-1)/(1/12) = 1.2 P(Z2) – P(Z1) = 0.8849-.1151 =0.7698 EGR 252 - 6

Sampling Distribution of the Difference Between two Averages Given: Two samples of size n1 and n2 are taken from two populations with means μ1 and μ2 and variances σ12 and σ22 Then, See example 8.8, pg 213 and example 8.9, pg 214 EGR 252 - 6

Sampling Distribution of S2 Given: S2 is the variance of of a random sample of size n taken from a population with mean μ and finite variance σ2, Then, has a χ2 distribution with ν = n - 1 EGR 252 - 6

χ2 Distribution χα2 represents the χ2 value above which we find an area of α, that is, for which P(χ2 > χα2 ) = α. EGR 252 - 6

Example Look at example 8.10, pg. 256: μ = 3 σ = 1 n = 5 s2 = ________________ χ2 = __________________ If the χ2 value fits within an interval that covers 95% of the χ2 values with 4 degrees of freedom, then the estimate for σ is reasonable. (See Table A.5, pp. 755-756) from book (& excel) – s2 = 0.815 χ2 = (n-1)s2 / σ2 = (4)(0.815)/1 = 3.26 looking for X2 values that cover 95%, meaning α values between 0.025 and 0.975 from table A.5 => Χ0.0252 =11.143 Χ0.9752 = 0.484 EGR 252 - 6

Your turn … If a sample of size 7 is taken from a normal population (i.e., n = 7), what value of χ2 corresponds to P(χ2 < χα2) = 0.95? (Hint: first determine α.) NOTE the figure associated with table A.5!! (These values cover areas > the X2 value …) ν= 7-1 = 6; α = 0.05 X2 = 12.592 EGR 252 - 6

t- Distribution Recall, by CLT: is n(z; 0,1) Assumption: _____________________ (Generally, if an engineer is concerned with a familiar process or system, this is reasonable, but …) assumption: we know σ EGR 252 - 6

What if we don’t know σ? New statistic: Where, and follows a t-distribution with ν = n – 1 degrees of freedom. EGR 252 - 6

Characteristics of the t-Distribution Look at fig. 8.13, pg. 259 Note: Shape: _________________________ Effect of ν: __________________________ See table A.4, pp. 753-754 shape – symmetrical about 0 effect of ν – variance, as seen in the width of the curve, depends on sample size note – as ν increases, curve looks more like a normal distribution (hence, the CLT) Table A.4 – critical values of t for several values of α and df. Note that the table yields the right tail of the distribution. EGR 252 - 6

Using the t-Distribution Testing assumptions about the value of μ Example: problem 8.52, pg. 265 What value of t corresponds to P(t < tα) = 0.95? x-bar = 0.475 s2 = sum(x – x-bar)2/(n-1) = 0.0336 s = 0.1832 μ= 0.5 t = (0.475-5)/(0.1832/sqrt(8)) = -0.39 Looking in table a.4, pg. 672 for an α value associated with (n-1) = 7 degrees of freedom and a t-value of 0.39 (by symmetry), we see it is somewhere between 0.4 and 0.3  call it 0.35. P(xbar < 0.5) = P(T<-.39) ≈ 0.35 … inconclusive EGR 252 - 6

Comparing Variances of 2 Samples Given two samples of size n1 and n2, with sample means X1 and X2, and variances, s12 and s22 … Are the differences we see in the means due to the means or due to the variances (that is, are the differences due to real differences between the samples or variability within each samples)? See figure 8.16, pg. 262 EGR 252 - 6

F-Distribution Given: Then, S12 and S22, the variances of independent random samples of size n1 and n2 taken from normal populations with variances σ12 and σ22, respectively, Then, has an F-distribution with ν1 = n1 - 1 and ν2 = n2 – 1 degrees of freedom. (See table A.6, pp. 757-760) EGR 252 - 6

Example Problem 8.55, pg. 266 S12 = ___________________ F = _____________ f0.05 (4, 5) = _________ NOTE: Note: if the two population variances are equal, then F=S12 / S22 so we are testing the hypothesis that F=S12 / S22 = 1 S12 =(5*SUMSQ(A1:A5)-SUM(A1:A5)^2)/(5*4) = 15750 S22 =(6*SUMSQ(B1:B6)-SUM(B1:B6)^2)/(5*6) = 10920 F = 15750/10920 =1.44 f0.05 (4, 5) = 5.19 EGR 252 - 6