CHAPTER 2 2 2.1 - Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables.

Presentation on theme: "CHAPTER 2 2 2.1 - Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables."— Presentation transcript:

CHAPTER 2 2 2.1 - Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables (Numerical vs. Categorical) 2 2.2, 2.3 - Exploratory Data Analysis  G raphical Displays  D escriptive Statistics M Measures of Center (mode, median, mean) easures of Spread (range, variance, standard deviation)

Quantitative [measurement]  length  mass  temperature  pulse rate  # puppies  shoe size 2 POPULATION – composed of “units” (people, rocks, toasters,...) “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... Quantitative and Qualitative Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). 1010½11 What do we want to know about this population?

Quantitative [measurement]  length  mass  temperature  pulse rate  # puppies  shoe size CONTINUOUS (can take their values at any point in a continuous interval) DISCRETE (only take their values in disconnected jumps) 3 POPULATION – composed of “units” (people, rocks, toasters,...) “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... Quantitative and Qualitative Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). What do we want to know about this population?

Qualitative [categorical]  video game levels (1, 2, 3,...)  income level (1 = low, 2 = mid, 3 = high)  zip code  ID #  color (Red, Green, Blue) ORDINAL, RANKED 1 2 3 IMPORTANT CASE: Binary (or Dichotomous) Gender (Male / Female) “Pregnant?” (Yes / No) Coin toss (Heads / Tails) Treatment (Drug / Placebo) 1, “Success” 0, “Failure” X = 4 NOMINAL POPULATION – composed of “units” (people, rocks, toasters,...) “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... Quantitative and Qualitative Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). What do we want to know about this population?

Qualitative [categorical]  video game levels (1, 2, 3,...)  income level (1 = low, 2 = mid, 3 = high)  zip code  ID #  color (Red, Green, Blue) ORDINAL, RANKED 1 2 3 IMPORTANT CASE: Binary (or Dichotomous) Gender (Male / Female) “Pregnant?” (Yes / No) Coin toss (Heads / Tails) Treatment (Drug / Placebo) 1, “Success” 0, “Failure” X = 5 NOMINAL POPULATION – composed of “units” (people, rocks, toasters,...) “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... Quantitative and Qualitative Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). Another way… define X using “indicator variables”: What do we want to know about this population? Note that I 1 + I 2 + I 3 = 1 Example: Excel file of patient blood types Note that each patient row sums to 1, i.e., O + A + B + AB = 1. Note that each patient row sums to 1, i.e., O + A + B + AB = 1.

X “Population Distribution of X” (somewhat idealized)   “Population Distribution of X” (somewhat idealized) X   POPULATION – composed of “units” (people, rocks, toasters,...) Important Fact: To make certain calculations simpler, we assume that populations are “arbitrarily large” (or indeed, infinite). “Random Variable” X = any numerical value that can be assigned to each unit of a population “Random” refers to the notion that this value is unknown until actually observed (usually as part of an outcome of an experiment to test a specific hypothesis). [Contrast this with the idea of a “nonrandom” variable with no empirical error, e.g., X = # cards in a deck = 52.] There are two general types......... Quantitative and Qualitative Population mean 6 Population “standard deviation”  (“mu”) and  (“sigma”) are examples of parameters – nonrandom “population characteristics” whose exact values cannot be directly measured, but can (hopefully) be estimated from known “sample characteristics” – statistics.

POPULATION – composed of “units” (people, rocks, toasters,...) = value of X for 1 st individual 7 x1x1 = value of X for 2 nd individual x2x2 x3x3 x4x4 x5x5 x6x6 …etc…. xnxn SAMPLE of size n How do we infer information about the population variable X? Random variable X (Example: X = Age) “Population Distribution of X” (somewhat idealized) X  

x 1 + x 2 + x 3 + x 4 + x 5 + x 6 + … + x n 8 “Population Distribution of X” (somewhat idealized) X   Random variable X (Example: X = Age) x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 …etc…. xnxn SAMPLE of size n x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 xnxn n x = “Parameter Estimation” “Statistical Inference” There are many potential random samples of a fixed size n, each with its own estimate of µ. It will eventually become important to understand the structure of their variability. Sample mean An example of a statistic Sample mean An example of a statistic POPULATION – composed of “units” (people, rocks, toasters,...)

Statistics are numerical values that are culled from a random sample of measurements taken from a specific population, in an effort to “summarize” its overall distribution, and estimate certain parameters (i.e., numerical characteristics) of that population. Statistics – as a discipline – consists of a collection of formal testing procedures, designed to infer a conclusion regarding a specific hypothesis about the population, based on the sample data. Statistics is sometimes referred to as the “search for sources of random variation” in a system. How much of a signal is genuinely significant information to be detected, and how much is random “noise”? The “classical scientific method” provides a general framework for conducting formal statistical analysis. 9

Download ppt "CHAPTER 2 2 2.1 - Basic Definitions and Properties  P opulation Characteristics = “Parameters”  S ample Characteristics = “Statistics”  R andom Variables."

Similar presentations