# Random Sampling and Data Description

## Presentation on theme: "Random Sampling and Data Description"— Presentation transcript:

Random Sampling and Data Description
Chapter 6 Random Sampling and Data Description

Learning Objectives Compute and interpret the sample mean, sample variance, sample standard deviation, sample median, and sample range Explain the concepts of sample mean, sample variance, population mean, and population variance Construct and interpret visual data displays Explain the concept of random sampling Construct and interpret normal probability plots

Data Summary and Display
Essential to good statistical thinking Focus on important features of the data Provide insight about the type of model that should be used Computer has become an important tool in the presentation and analysis of data User enters the data and then selects the types of analysis Packages are available for both mainframe computers as well as personal computers

Sample Mean Useful to describe data features numerically
Can characterize the location or central tendency Refer to this arithmetic mean as the sample mean Where the n observations in the sample are denoted by x1, x2,…, xn Sample mean as a reasonable estimate of the population mean, 

Sample Standard Deviation
Sample mean does not provide all of the information Variability in the data may be described by the sample variance or the sample standard deviation Sample standard deviation, s, is the positive square root of the sample variance

Sample Range Difference between the largest and smallest observations, or the sample range, is a useful measure of variability Sample range r=max(xi)-min (xi) As the variability in sample data increases, the sample range increases

Sample Median and Sample Mode
Two more measure of central tendency Median divides the data into two equal parts, half below the median and half above If the number of data points is even, the median is halfway between the two central values If the number data points is odd, the median is the central value Mode is the most frequently occurring data point (s)

Example The data below are the joint temperatures of the O-rings (degrees F) for each test firing or actual lunch of the space shuttle rocket motor (from Presidential Commission on the Space Shuttle Challenger Accident): Compute the sample mean, sample median, sample range, and sample standard deviation

Solution Sample mean is 65.85,or Sample median is 67
……[67.5]…… Sample range is 53, or r = 84-31 Sample standard deviation is 12.16

Random Sampling Interested to work with a sample of observations selected from a population Relationship between the population and the sample Impossible or impractical to observe the entire population Use a probability distribution as a model for a population Sample from the population to make decisions about the population

Understand Random Sampling
Wish to reach a conclusion about the proportion of people who earn at least \$35,000 in a specific year Let p represent the unknown value of this proportion Impractical to question every individual Make inference regarding the true proportion p Select a random sample Use the observed proportion of people is computed by dividing the number of individuals in the sample by the total n Many random samples are possible Value of will vary. That is, is a random variable

Statistic Random sample is called a statistic
Statistic is any function of the observations in a random sample Sample mean , the sample variance S2, and the sample standard deviation s are statistics

Data Display Graphical displays of sample data are very powerful
Many techniques

Stem-and-Leaf Diagrams
Stem-and-leaf diagram is a good way to represent the data Steps Divide each data point into two parts: a stem and a leaf List the stem values in a vertical column Record the leaf for each observation beside its stem Write the units for stems and leaves on the display

Example Consider 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 Select as stem values the numbers 2,3, and 4 Record the leaf for each observation beside its stem Last column in the diagram is a frequency count of the no. of leaves associated with each stem Frequency 6 3 1 24

Frequency Distributions
More compact summary of data than a stem-and-leaf Must divide the range of the data into intervals Called class intervals, cells, or bins Number of bins depends on the number of observations Equal to the square root of the number of observations

Histograms Visual display of the frequency distribution
Gives insight about possible choices of probability distribution Stages for constructing 1) Label the bin boundaries on a horizontal scale 2) Mark and label the vertical scale with the frequencies 3) Draw a rectangle where height is equal to the frequency corresponding to that class

Cumulative Frequency Plot
Variation of the histogram Useful in data interpretation Height of each bar is the total number of observations that are less or equal to the upper limit of the class Illustrated in the right graph

Example Consider the following data on the motor fuel octane ratings of several blends of gasoline. Construct a frequency distribution and histogram Use 8 classes

Solution Illustrated in the right

Probability Plots Graphical method for determining whether sample data points conform to a hypothesized distribution Very simple and can be constructed quickly Uses special graph paper, known as probability paper Focus primarily on normal probability plot

Constructing a Probability Plot
Sample data points are first ranked from smallest to largest x1, x2,..., xn is arranged x(1),x(2),…, x(n) Plotted against their observed cumulative frequency [(j -0.5)/n] on the probability paper Plotted points fall approximately along a line Constructed on ordinary graph paper by plotting the standardized normal scores zj against x(j) Standardized normal scores satisfy [(j-0.5)/n]= P(Zzj)=(zj)

Example A soft-drink bottler is studying the internal pressure of 1-liter glass bottles. A random sample of 16 bottles is tested, and the pressure strength (psi) are obtained. The data are shown below. Does it seem reasonable to conclude that pressure strength is normally distributed?

Solution Use the steps to construct a probability plot
Assumption of normality appears reasonable Data falls along a straight line

Next Agenda Discusses point estimation of parameters
Introduces some of the important properties of estimators, the method of maximum likelihood, sampling distributions, and the central limit theorem