The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.

Slides:



Advertisements
Similar presentations
Chapter 4 Probability and Probability Distributions
Advertisements

Business Statistics for Managerial Decision
Chapter 5 ~ Probability Distributions (Discrete Variables)
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Chapter 5 Basic Probability Distributions
Probability and Probability Distributions
CHAPTER 6 Random Variables
Chapter 6: Random Variables
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 4 and 5 Probability and Discrete Random Variables.
Chapter 7: Random Variables
Chapter 5 Sampling Distributions
Chapter 5: Probability Distributions (Discrete Variables)
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 6: Random Variables Section 6.1 Discrete and Continuous Random Variables.
Probability The definition – probability of an Event Applies only to the special case when 1.The sample space has a finite no.of outcomes, and 2.Each.
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
+ Section 6.1 & 6.2 Discrete Random Variables After this section, you should be able to… APPLY the concept of discrete random variables to a variety of.
Applied Business Forecasting and Regression Analysis Review lecture 2 Randomness and Probability.
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 6 Random Variables 6.1 Discrete and Continuous.
Chapter 8: Probability: The Mathematics of Chance Lesson Plan Probability Models and Rules Discrete Probability Models Equally Likely Outcomes Continuous.
Chapter 6 Random Variables
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 6: Random Variables Section 6.1 Discrete and Continuous Random Variables.
Multivariate Data Summary. Linear Regression and Correlation.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Unit 3: Random Variables Unit 3.4 Discrete and Continuous Random Variables.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 6 Random Variables 6.1 Discrete and Continuous.
Random Variables Ch. 6. Flip a fair coin 4 times. List all the possible outcomes. Let X be the number of heads. A probability model describes the possible.
Multivariate data. Regression and Correlation The Scatter Plot.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
AP STATISTICS Section 7.1 Random Variables. Objective: To be able to recognize discrete and continuous random variables and calculate probabilities using.
The p-value approach to Hypothesis Testing
Chapter 8: Probability: The Mathematics of Chance Probability Models and Rules 1 Probability Theory  The mathematical description of randomness.  Companies.
Multivariate Data Summary. Linear Regression and Correlation.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Chapter 6: Random Variables
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
Comparing k Populations
Multivariate Data Summary
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 5 Sampling Distributions
Probability Key Questions
Warmup Consider tossing a fair coin 3 times.
Chapter 6: Random Variables
Chapter 6: Random Variables
12/6/ Discrete and Continuous Random Variables.
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 7: Random Variables
AP Statistics Chapter 16 Notes.
Probability.
Discrete & Continuous Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Chapter 6: Random Variables
Presentation transcript:

The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Situation We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count x ij = the number of subjects for which R = i and C = j. R = rows, C = columns

Example Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: < The categories for Chlosterol are: <

Table: two-way frequency Serum Cholesterol Systolic Blood pressure < Total < Total

3 dimensional bargraph

Example This comes from the drug use data. The two variables are: 1. Age (C) and 2.Antidepressant Use (R) measured for a sample of n = 33,957 subjects.

Two-way Frequency Table Percentage antidepressant use vs Age

The  2 statistic for measuring dependence amongst two categorical variables Define = Expected frequency in the (i,j) th cell in the case of independence.

Columns 12345Total 1x 11 x 12 x 13 x 14 x 15 R1R1 2x 21 x 22 x 23 x 24 x 25 R2R2 3x 31 x 32 x 33 x 34 x 35 R3R3 4x 41 x 42 x 43 x 44 x 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 N

Columns 12345Total 1E 11 E 12 E 13 E 14 E 15 R1R1 2E 21 E 22 E 23 E 24 E 25 R2R2 3E 31 E 32 E 33 E 34 E 35 R3R3 4E 41 E 42 E 43 E 44 E 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 n

Justification 12345Total 1E 11 E 12 E 13 E 14 E 15 R1R1 2E 21 E 22 E 23 E 24 E 25 R2R2 3E 31 E 32 E 33 E 34 E 35 R3R3 4E 41 E 42 E 43 E 44 E 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 n Proportion in column j for row i overall proportion in column j

and 12345Total 1E 11 E 12 E 13 E 14 E 15 R1R1 2E 21 E 22 E 23 E 24 E 25 R2R2 3E 31 E 32 E 33 E 34 E 35 R3R3 4E 41 E 42 E 43 E 44 E 45 R4R4 TotalC1C1 C2C2 C3C3 C4C4 C5C5 n Proportion in row i for column j overall proportion in row i

The  2 statistic E ij = Expected frequency in the (i,j) th cell in the case of independence. x ij = observed frequency in the (i,j) th cell

Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent. Both were measured for a sample of n = 1237 cases

Serum Cholesterol Systolic Blood pressure < Total < Total Observed frequencies

Serum Cholesterol Systolic Blood pressure < Total < Total Expected frequencies In the case of independence the distribution across a row is the same for each row The distribution down a column is the same for each column

Standardized residuals The  2 statistic

Properties of the  2 statistic 1.The  2 statistic is always positive. 2.Small values of  2 indicate that Rows and Columns are independent. In this case will be in the range of (r – 1)(c – 1). 3.Large values of  2 indicate that Rows and columns are not independent. 4.Later on we will discuss this in more detail (when we study Hypothesis Testing).

Example This comes from the drug use data. The two variables are: 1. Role (C) and 2.Antidepressant Use (R) measured for a sample of n = 33,957 subjects.

Two-way Frequency Table Percentage antidepressant use vs Role

Calculation of  2 The Raw data Expected frequencies

The Residuals The calculation of  2

Probability Theory Modelling random phenomena

Some counting formulae Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Example: the number of ways you can order the three letters A, B, and C is 3! = 3(2)(1) = 6 ABC ACB BAC BCA CAB CBA

Definition 0! = 1 Reason mathematical consistency. In many of the formulae given later, this definition leads to consistency.

Permutations the number of ways that you can choose k objects from n objects in a specific order: Example: the number of ways you choose two letters from the four letters A, B, D, C in a specific order is

ABBAACCAADDA BCCBBDDBCDDC

Example: Suppose that we have a committee of 10 people. We want to choose a chairman, a vice-chairman, and a treasurer for the committee. The chairman is chosen first, the vice chairman second and the treasures third. How many ways can this be done.

Example: How many ways can we order n objects. Answern! or Choose n objects from n objects in a specific order This is what is meant by the statement that the definition 0! = 1 leads to mathematical consistency

Combinations the number of ways that you can choose k objects from n objects (order irrelevant) is:

Example: the number of ways you choose two letters from the four letters A, B, D, C {A,B}{A,C}{A,D } {B,C} {B,D}{C,D}

Example: Suppose we have a committee of 10 people and we want to choose a sub-committee of 3 people. How many ways can this be done

Example: Random sampling Suppose we have a club of N =1000 persons and we want to choose sample of k = 250 of these individuals to determine there opinion on a given issue. How many ways can this be performed? The choice of the sample is called random sampling if all of the choices has the same probability of being selected

Important Note: 0! is always defined to be 1. Also are called Binomial Coefficients

Reason: The Binomial Theorem

Binomial Coefficients can also be calculated using Pascal’s triangle

Random Variables Probability distributions

Definition: A random variable X is a number whose value is determined by the outcome of a random experiment (random phenomena)

Examples 1.A die is rolled and X = number of spots showing on the upper face. 2.Two dice are rolled and X = Total number of spots showing on the two upper faces. 3.A coin is tossed n = 100 times and X = number of times the coin toss resulted in a head. 4.A person is selected at random from a population and X = weight of that individual.

5.A sample of n = 100 individuals are selected at random from a population (i.e. all samples of n = 100 have the same probability of being selected). X = the average weight of the 100 individuals.

In all of these examples X fits the definition of a random variable, namely: –a number whose value is determined by the outcome of a random experiment (random phenomena)

Random variables are either Discrete –Integer valued –The set of possible values for X are integers Continuous –The set of possible values for X are all real numbers –Range over a continuum.

Examples Discrete –A die is rolled and X = number of spots showing on the upper face. –Two dice are rolled and X = Total number of spots showing on the two upper faces. –A coin is tossed n = 100 times and X = number of times the coin toss resulted in a head.

Examples Continuous –A person is selected at random from a population and X = weight of that individual. –A sample of n = 100 individuals are selected at random from a population (i.e. all samples of n = 100 have the same probability of being selected). X = the average weight of the 100 individuals.

Probability distribution of a Random Variable

The probability distribution of a discrete random variable is describe by its : probability function p(x). p(x) = the probability that X takes on the value x.

Examples Discrete –A die is rolled and X = number of spots showing on the upper face. –Two dice are rolled and X = Total number of spots showing on the two upper faces. x p(x)1/6 x p(x) 1/362/363/364/365/366/365/364/363/362/361/36

Graphs To plot a graph of p(x), draw bars of height p(x) above each value of x. Rolling a die

Rolling two dice

Note: 1.0  p(x) 

The probability distribution of a continuous random variable is described by its : probability density curve f(x).

i.e. a curve which has the following properties : 1. f(x) is always positive. 2. The total are under the curve f(x) is one. 3. The area under the curve f(x) between a and b is the probability that X lies between the two values.

An Important discrete distribution The Binomial distribution Suppose we have an experiment with two outcomes – Success(S) and Failure(F). Let p denote the probability of S (Success). In this case q=1-p denotes the probability of Failure(F). Now suppose this experiment is repeated n times independently.

Let X denote the number of successes occuring in the n repititions. Then X is a random variable. It’s possible values are 0, 1, 2, 3, 4, …, (n – 2), (n – 1), n and p(x) for any of the above values of x is given by:

X is said to have the Binomial distribution with parameters n and p.

Summary: X is said to have the Binomial distribution with parameters n and p. 1.X is the number of successes occuring in the n repititions of a Success-Failure Experiment. 2.The probability of success is p. 3.

Examples: 1.A coin is tossed n = 5 times. X is the number of heads occuring in the 5 tosses of the coin. In this case p = ½ and x p(x)

Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment

Discrete Random Variables Discrete Random Variable: A random variable usually assuming an integer value. a discrete random variable assumes values that are isolated points along the real line. That is neighbouring values are not “possible values” for a discrete random variable Note: Usually associated with counting The number of times a head occurs in 10 tosses of a coin The number of auto accidents occurring on a weekend The size of a family

Continuous Random Variables Continuous Random Variable: A quantitative random variable that can vary over a continuum A continuous random variable can assume any value along a line interval, including every possible value between any two points on the line Note: Usually associated with a measurement Blood Pressure Weight gain Height

Probability Distributions of a Discrete Random Variable

Probability Distribution & Function Probability Distribution: A mathematical description of how probabilities are distributed with each of the possible values of a random variable. Notes: The probability distribution allows one to determine probabilities of events related to the values of a random variable. The probability distribution may be presented in the form of a table, chart, formula. Probability Function: A rule that assigns probabilities to the values of the random variable

Example In baseball the number of individuals, X, on base when a home run is hit ranges in value from 0 to 3. The probability distribution is known and is given below: PX()the random variable equals 2 p()  Note: This chart implies the only values x takes on are 0, 1, 2, and 3. If the random variable X is observed repeatedly the probabilities, p(x), represents the proportion times the value x appears in that sequence.

A Bar Graph

Comments: Every probability function must satisfy: 1.The probability assigned to each value of the random variable must be between 0 and 1, inclusive: 2.The sum of the probabilities assigned to all the values of the random variable must equal 1: 3.

Mean and Variance of a Discrete Probability Distribution Describe the center and spread of a probability distribution The mean (denoted by greek letter  (mu)), measures the centre of the distribution. The variance (  2 ) and the standard deviation (  ) measure the spread of the distribution.  is the greek letter for s.

Mean of a Discrete Random Variable The mean, , of a discrete random variable x is found by multiplying each possible value of x by its own probability and then adding all the products together: Notes: The mean is a weighted average of the values of X. The mean is the long-run average value of the random variable. The mean is centre of gravity of the probability distribution of the random variable

 2 Variance and Standard Deviation Variance of a Discrete Random Variable: Variance,  2, of a discrete random variable x is found by multiplying each possible value of the squared deviation from the mean, (x   ) 2, by its own probability and then adding all the products together: Standard Deviation of a Discrete Random Variable: The positive square root of the variance:

Example The number of individuals, X, on base when a home run is hit ranges in value from 0 to 3.

Computing the mean: Note: is the long-run average value of the random variable is the centre of gravity value of the probability distribution of the random variable

Computing the variance: Computing the standard deviation:

The Binomial distribution 1.We have an experiment with two outcomes – Success(S) and Failure(F). 2.Let p denote the probability of S (Success). 3.In this case q=1-p denotes the probability of Failure(F). 4.This experiment is repeated n times independently. 5.X denote the number of successes occuring in the n repititions.

The possible values of X are 0, 1, 2, 3, 4, …, (n – 2), (n – 1), n and p(x) for any of the above values of x is given by: X is said to have the Binomial distribution with parameters n and p.

Summary: X is said to have the Binomial distribution with parameters n and p. 1.X is the number of successes occurring in the n repetitions of a Success-Failure Experiment. 2.The probability of success is p. 3. The probability function

Example: 1.A coin is tossed n = 5 times. X is the number of heads occurring in the 5 tosses of the coin. In this case p = ½ and x p(x)

Computing the summary parameters for the distribution – ,  2, 

Computing the mean: Computing the variance: Computing the standard deviation:

Example: A surgeon performs a difficult operation n = 10 times. X is the number of times that the operation is a success. The success rate for the operation is 80%. In this case p = 0.80 and X has a Binomial distribution with n = 10 and p = 0.80.

Computing p(x) for x = 1, 2, 3, …, 10

The Graph

Computing the summary parameters for the distribution – ,  2, 

Computing the mean: Computing the variance: Computing the standard deviation: