Presentation is loading. Please wait.

Presentation is loading. Please wait.

STAT 311 Chapter 1 - Overview and Descriptive Statistics

Similar presentations


Presentation on theme: "STAT 311 Chapter 1 - Overview and Descriptive Statistics"— Presentation transcript:

1 STAT 311 Chapter 1 - Overview and Descriptive Statistics Chapter 2 - Probability Chapter 3 - Discrete Random Variables and Probability Distributions Chapter 4 - Continuous Random Variables and Probability Distributions Chapter 5 - Joint Probability Distributions and Random Samples Chapter 6 - Point Estimation Note that these are textbook chapters, although Lecture Notes may be referenced.

2 Chapter 1 Overview and Descriptive Statistics
1.1 - Populations, Samples and Processes 1.2 - Pictorial and Tabular Methods in Descriptive Statistics 1.3 - Measures of Location 1.4 - Measures of Variability

3

4 311 / 312 IN A NUTSHELL

5 “Probability Theory” makes theoretical predictions of the occurrence of events where randomness is present, via known mathematical models.

6 “Probability Theory” makes theoretical predictions of the occurrence of events where randomness is present, via known mathematical models.

7 “Probability Theory” makes theoretical predictions of the occurrence of events where randomness is present, via known mathematical models.

8 What is “random variation” in the distribution of a population?
Examples: Toasting time, Temperature settings, etc. of a population of toasters… POPULATION 1: Little to no variation (e.g., product manufacturing) In engineering situations such as this, we try to maintain “quality control”… i.e., “tight tolerance levels,” high precision, low variability. O O O O O But what about a population of, say, people?

9 What is “random variation” in the distribution of a population?
Example: Body Temperature (F) POPULATION 1: Little to no variation (e.g., clones) Density Most individual values ≈ population mean value Very little variation about the mean! 98.6 F

10 What is “random variation” in the distribution of a population?
Example: Body Temperature (F) Examples: Gender, Race, Age, Height, Annual Income,… POPULATION 2: Much variation (more common) Density Much more variation about the mean!

11 Example Click on image for full .pdf article
Links in article to access datasets

12 Random Sample “sampling frame” Study Question:
Women in U.S. who have given birth Study Question: How can we estimate “mean age at first birth” of women in the U.S.? POPULATION “Random Variable” X = Age at first birth Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. How is this accomplished? Hospital records, etc. “sampling frame” How is this accomplished? That is, the Population Distribution of X ~ N(, ). standard deviation σ  and  are “population characteristics” i.e., “parameters” (fixed, unknown) mean μ = ??? Random Sample {x1, x2, x3, x4, … , x400}

13 Other possible parameters: “Sampling Distribution” ~ ???
Women in U.S. who have given birth Study Question: How can we estimate “mean age at first birth” of women in the U.S.? Other possible parameters: standard deviation median minimum maximum POPULATION “Random Variable” X = Age at first birth is an example of a “sample characteristic” = “statistic.” (numerical info culled from a sample) This is called a “point estimate“ of  from the one sample. Can it be improved, and if so, how? Choose a bigger sample, which should reduce “variability.” Average the sample means of many samples, not just one. (introduces “sampling variability”) “Sampling Distribution” ~ ??? ????????? Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. That is, the Population Distribution of X ~ N(, ). standard deviation σ  and  are “population characteristics” i.e., “parameters” (fixed, unknown) ??? How big??? mean μ = ??? Random Sample {x1, x2, x3, x4, … , x400} FORMULA mean

14 mean Without knowing every value in the population, it is not possible to determine the exact value of  with 100% “certainty.” HOWEVER… mean

15 Random Sample Women in U.S. who have given birth Study Question:
How can we estimate “mean age at first birth” of women in the U.S.? POPULATION “Random Variable” X = Age at first birth Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. That is, the Population Distribution of X ~ N(, ). standard deviation σ  and  are “population characteristics” i.e., “parameters” (fixed, unknown) For concreteness, suppose  = 1.5 mean μ = ??? Random Sample {x1, x2, x3, x4, … , x400} FORMULA mean

16 95% CONFIDENCE INTERVAL FOR µ
mean 25.453 25.747 μ Without knowing every value in the population, it is not possible to determine the exact value of  with 100% “certainty.” HOWEVER… BASED ON OUR SAMPLE DATA, the true value of μ is between and , with 95% “confidence” (…akin to “probability”). This is called an “interval estimate“ of  from the sample. Used in “Statistical Inference” via “Hypothesis Testing”… (Stat 312)

17 Women in U.S. who have given birth Arithmetic Mean
Geometric Mean Harmonic Mean Each of these gives an estimate of  for a particular sample. Any general sample estimator for  is denoted by the symbol Likewise for and Study Question: How can we estimate “mean age at first birth” of women in the U.S.? POPULATION “Random Variable” X = Age at first birth Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. That is, the Population Distribution of X ~ N(, ). standard deviation σ  and  are “population characteristics” i.e., “parameters” (fixed, unknown) mean μ = ??? Random sample of size n {x1, x2, x3, x4, … , xn} FORMULA mean

18 “PARAMETER ESTIMATION”
Women in U.S. who have given birth Study Question: How can we estimate “mean age at first birth” of women in the U.S.? POPULATION Extending these ideas to other parameters of a population gives rise to the general theory of… “Random Variable” X = Age at first birth Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. “PARAMETER ESTIMATION” That is, the Population Distribution of X ~ N(, ). standard deviation σ  and  are “population characteristics” i.e., “parameters” (fixed, unknown) (Stat 311) mean μ = ??? Random Sample {x1, x2, x3, x4, … , xn} FORMULA mean

19 How do we estimate these?
What do we want to know about this population? POPULATION How is… “Random Variable” X (age, income level, …) … distributed? composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). How do we estimate these? Suppose we know that X follows a known “probability distribution” in the population… Suppose we know that X follows a “normal distribution” (a.k.a. “bell curve”) in the population. but with parameters unknown vals. That is, the Population Distribution of X ~ N(, ). That is, the Population Distribution of X ~ N(, ). That is, the Population Distribution of X ~ Dist(1, 2,…). SAMPLE For a particular , want to define a corresponding “parameter estimator” standard deviation σ  and  are “population characteristics” i.e., “parameters” (fixed, unknown) heavily skewed tail Ideal properties… Unbiased estimator of  Minimum Variance among all such unbiased estimators mean μ = ??? i.e., “MVUE”

20 Quantitative [measurement]
What do we want to know about this population? POPULATION How is… “Random Variable” X (age, income level, …) … distributed? composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52. There are two general types Quantitative and Qualitative Quantitative [measurement] length mass temperature pulse rate # puppies shoe size 10 10½ 11

21 Quantitative [measurement]
What do we want to know about this population? POPULATION How is… “Random Variable” X (age, income level, …) … distributed? composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52. There are two general types Quantitative and Qualitative Quantitative [measurement] length mass temperature pulse rate # puppies shoe size CONTINUOUS (can take their values at any point in a continuous interval) DISCRETE (only take their values in disconnected jumps)

22 Qualitative [categorical]
What do we want to know about this population? POPULATION How is… “Random Variable” X (age, income level, …) … distributed? composed of “units” (people, rocks, toasters,...) To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). Qualitative [categorical] video game levels (1, 2, 3,...) income level (low, mid, high) zip code PIN # color (Red, Green, Blue) “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52. There are two general types Quantitative and Qualitative ORDINAL, RANKED (ordered labels) NOMINAL (unordered labels) IMPORTANT SPECIAL CASE: Binary (or Dichotomous) “Pregnant?” (Yes / No) Coin toss (Heads / Tails) Treatment (Drug / Placebo)

23 Let X = “Number of Successes in the sample.”
POPULATION Define a new parameter  = P(Success) Point estimator Suppose we intend to select a random sample of size n from this population of Success and Failures… Discrete Random Variable Random Variable Random sample of size n Let X = “Number of Successes in the sample.” (0, 1, 2, …, n) Then a natural estimator for  could be … in such a way that the “Success or Failure” outcome of any selected individual conveys no information about the “Success or Failure” outcome of any other selected individual. the sample proportion of Success Ex: n = 500 tosses, X= 285 Heads  That is, the “Success or Failure” outcomes between any two individuals are independent. (Think of tossing a coin n times.)


Download ppt "STAT 311 Chapter 1 - Overview and Descriptive Statistics"

Similar presentations


Ads by Google