# Correlations Revisited

## Presentation on theme: "Correlations Revisited"— Presentation transcript:

Correlations Revisited

Probability I think you're begging the question, said Haydock, and I can see looming ahead one of those terrible exercises in probability where six men have white hats and six men have black hats and you have to work it out by mathematics how likely it is that the hats will get mixed up and in what proportion. If you start thinking about things like that, you would go round the bend. Let me assure you of that! Agatha Christie The Mirror Crack's

Misunderstanding of probability may be the greatest of all impediments to scientific literacy.
Stephen Jay Gould

The Personal Probability Interpretation
Personal probability of an event = the degree to which a given individual believes the event will happen. Sometimes subjective probability used because the degree of belief may be different for each individual. Restrictions on personal probabilities: Must fall between 0 and 1 (or between 0 and 100%). Must be coherent.

Probability Definitions and Relationships
Sample space: All the possible outcomes that can occur. Simple event: one outcome in the sample space; a possible outcome of a random circumstance. Event: a collection of one or more simple events in the sample space; often written as A, B, C, and so on.

Assigning Probabilities
A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. A probability simply is a number between 0 and 1 that is assigned to a possible outcome of a random circumstance. For the complete set of distinct possible outcomes of a random circumstance, the total of the assigned probabilities must equal 1.

Classical Approach A mathematical index of the relative frequency of likelihood of the occurrence of a specific event. Based on games of chance The specific conditions of the game are known.

Determining the probability of an Outcome (Classical)
A Simple Lottery Choose a three-digit number between 000 and Player wins if his or her three-digit number is chosen. Suppose the 1000 possible 3-digit numbers (000, 001, 002, 999) are equally likely. In long run, a player should win about 1 out of 1000 times. Probability = of winning. This does not mean a player will win exactly once in every thousand plays.

Example: Probability of Simple Events
Random Circumstance: A three-digit winning lottery number is selected. Sample Space: {000,001,002,003, ,997,998,999}. There are 1000 simple events. Probabilities for Simple Event: Probability any specific three-digit number is a winner is 1/ Assume all three-digit numbers are equally likely. Event A = last digit is a 9 = {009,019, ,999}. Since one out of ten numbers in set, P(A) = 1/10. Event B = three digits are all the same = {000, 111, 222, 333, 444, 555, 666, 777, 888, 999}. Since event B contains 10 events, P(B) = 10/1000 = 1/100.

Estimating Probabilities from Observed Categorical Data - Empirical Approach
Assuming data are representative, the probability of a particular outcome is estimated to be the relative frequency (proportion) with which that outcome was observed.

Methods of sampling Simple random selection Systematic Stratified
Every member of the population has an equal chance of being selected. Systematic Every Xth person. Stratified Random sampling by subgroup. Why?

Determining the probability of an Outcome – Empirical Approach
Observe the Relative Frequency of random circumstances The Probability of Lost Luggage “1 in 176 passengers on U.S. airline carriers will temporarily lose their luggage.” This number is based on data collected over the long run. So the probability that a randomly selected passenger on a U.S. carrier will temporarily lose luggage is 1/176 or about

Proportions and Percentages as Probabilities
The proportion of passengers who lose their luggage is 1/176 or about (6 out of 1000). About 0.6% of passengers lose their luggage. The probability that a randomly selected passenger will lose his/her luggage is about The probability that you will lose your luggage is about Last statement is not exactly correct – your probability depends on other factors (how late you arrive at the airport, etc.).

Example: Probability of Male versus Female Births
Long-run relative frequency of males born in the United States is about (512 boys born per 1000 births) Table provides results of simulation: the proportion is far from .512 over the first few weeks but in the long run settles down around .512.

Nightlights and Myopia
Assuming these data are representative of a larger population, what is the approximate probability that someone from that population who sleeps with a nightlight in early childhood will develop some degree of myopia? Note: = 79 of the 232 nightlight users developed some degree of myopia. So we estimate the probability to be 79/232 = 0.34.

Note: P(A) + P(AC) = 1 Complementary Events
One event is the complement of another event if the two events do not contain any of the same simple events and together they cover the entire sample space. Notation: AC represents the complement of A. Note: P(A) + P(AC) = 1 Example:A Simple Lottery (cont) A = player buying single ticket wins AC = player does not win P(A) = 1/1000 so P(AC) = 999/1000

Mutually Exclusive Events
Two events are mutually exclusive if they do not contain any of the same simple events (outcomes). Example; A Simple Lottery A = all three digits are the same. B = the first and last digits are different The events A and B are mutually exclusive.

Independent and Dependent Events
Two events are independent of each other if knowing that one will occur (or has occurred) does not change the probability that the other occurs. Two events are dependent if knowing that one will occur (or has occurred) changes the probability that the other occurs.

Example Independent Events
Customers put business card in restaurant glass bowl. Drawing held once a week for free lunch. You and Vanessa put a card in two consecutive wks. Event A = You win in week 1. Event B = Vanessa wins in week 2 Events A and B refer to to different random circumstances and are independent.

Example: Dependent Events
Event A = Alicia is selected to answer Question 1. Event B = Alicia is selected to answer Question 2. Events A and B refer to different random circumstances, but are A and B independent events? P(A) = 1/50. If event A occurs, her name is no longer in the bag; P(B) = 0. If event A does not occur, there are 49 names in the bag (including Alicia’s name), so P(B) = 1/49. Knowing whether A occurred changes P(B). Thus, the events A and B are not independent.

Joint and Marginal Probabilities
These probabilities refer to the proportion of an event as a fraction of the total.

Unions and intersections
P{AÈB} ¹ P{A} + P{B} because A and B do overlap. P{AÈB} = P{A} + P{B} - P{AÇB}. AÇB is the intersection of A and B; it includes everything that is in both A and B, and is counted twice if we add P{A} and P{B}. Similar comments to previous Slide.

Conditional Probability
Consider two events A and B. What is the probability of A, given the information that B occurred? P(A | B) = ? Example: What is the probability that a women is married given that she is years old?

Probability Problems P(Married | 18-29) = 7842/ 22,512

Conditional probability and independence
If we know that one event has occurred it may change our view of the probability of another event. Let A = {rain today}, B = {rain tomorrow}, C = {rain in 90 days time} It is likely that knowledge that A has occurred will change your view of the probability that B will occur, but not of the probability that C will occur. We write P(B|A) ¹ P(B), P(C|A) = P(C). P(B|A) denotes the conditional probability of B, given A. We say that A and C are independent, but A and B are not. Note that for independent events P(AÇC) = P(A)P(C).

Consider the classic data set on the next Slide consisting of forecasts and observations of tornados (Finley, 1884). Let F = {Tornado forecast} T = {Tornado observed} Use the frequencies in the table to estimate probabilities – it’s a large sample, so estimates should not be too bad.

Forecasts of tornados Make sure they know what each number in the table represents.

P(T) = 51/2803 = P(TÇF) = 28/2803 P(T|F) = 28/100 = P(T|Fc) = 23/2703 = Knowledge of the forecast changes P(T). F and T are not independent. P(F|T) = 28/51 = P(T|F), P(F|T) are often confused but are different quantities, and can take very different values.

Continuous and discrete random variables
A continuous random variable is one which can (in theory) take any value in some range, for example crop yield, maximum temperature. A discrete variable has a countable set of values. They may be counts, such as numbers of accidents categories, such as much above average, above average, near average, below average, much below average binary variables, such as dropout/no dropout

Probability distributions
If we measure a random variable many times, we can build up a distribution of the values it can take. Imagine an underlying distribution of values which we would get if it was possible to take more and more measurements under the same conditions. This gives the probability distribution for the variable.

Continuous probability distributions
Because continuous random variables can take all values in a range, it is not possible to assign probabilities to individual values. Instead we have a continuous curve, called a probability density function, which allows us to calculate the probability a value within any interval. This probability is calculated as the area under the curve between the values of interest. The total area under the curve must equal 1.

Normal (Gaussian) distributions
Normal (also known as Gaussian) distributions are by far the most commonly used family of continuous distributions. They are ‘bell-shaped’ –and are indexed by two parameters: The mean m – the distribution is symmetric about this value The standard deviation s – this determines the spread of the distribution. Roughly 2/3 of the distribution lies within 1 standard deviation of the mean, and 95% within 2 standard deviations.

The probability of continuous variables
IQ test Mean = 100 and sd = 15 What is the probability of randomly selecting an individual with a test score of 130 or greater? P(X ≤ 95)? P(X ≥ 112)? P(X ≤ 95 or X ≥ 112)?

The probability of continuous variables (cont.)
What is the probability of randomly selecting three people with a test score greater than 112? Remember the multiplication rule for independent events.