Presentation on theme: "Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons."— Presentation transcript:
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally firstname.lastname@example.org Fordham University May contain work from the Creative Commons.
Probability Distributions First, we will cover Probability, which it the foundation for understanding and interpreting these statistics. Next, we will be introduced to statistical inference, which uses summary statistics to help you reach conclusions about your data.
Probability Distributions Sometimes it is easy to make predictions. The speed at which objects fall. The reaction when two chemicals are mixed. What will happen if you flip a coin. Well, we must me more specific here. We know that it will land on heads or tails. But, we do not know which one it will land on.
Probability Distributions Random Phenomenon An phenomenon where we know all the possible outcomes, but we do not know which outcome we will get next. Individual outcomes are uncertain but follow a general pattern of occurrences. We study Random Phenomenon to quantify the general pattern of occurrences, in order to make general predictions.
Probability Distributions Theoretical Probability The likelihood that an event will occur.
Probability Distributions Relative frequency To determine a probability, we can repeatedly perform the action and measure the outcomes.
Probability Distributions Law of Large Numbers As the number of replications increases, the relative frequency will approach the probability of the event. What happens it we perform an infinite number of trials?
Probability Distributions Probability Distribution Contains two important elements 1)The probability of each event (or combination of events) must range between 0 to 1. 2)The sum of the probabilities of all possible events must be equal to 1.
Probability Distributions Discrete Probability Distribution The probabilities are associated with a series of discrete outcomes. E.g. rolling dice, flipping coins. Rolling a die can be written as: This is read as: “The probability of y equals one-sixth, for values of y from 1 to 6.”
Probability Distributions Discrete Probability Distribution In this picture, the lines that connect points are simply for reference. The function only has values on the integer inputs.
Probability Distributions Poisson Distribution Discrete does not mean finite. The Poisson Distribution is one case. It is a discrete probability distribution that covers an infinite number of possible outcomes.
Probability Distributions Continuous Probability Distribution Probabilities are assigned to a continuous range of values instead of distinct individual values. E.g. playing darts. Positive probabilities cannot be assigned to a specific value in a continuous distribution. The probability of a specific value is zero. It is important for you to realize why this is so. Meditate on it!!!
Probability Distributions Continuous Probability Distribution
Probability Distributions Probability Density Function (PDF) Used to calculate continuous probability distributions. When we plot the PDF against the range of possible values, we get a curve where the height of the curve indicates the position of the most likely values. The probability associated with a range of values is equal to the area under the PDF curve. So, what value must the AUC equal?
Probability Distributions Random Variables A variable whose values occur at random. A Discrete Random Variable comes from a Discrete Probability Distribution. A Continuous Random Variable comes from a Continuous Probability Distribution. We reserve Uppercase letters for Random Variables. Lowercase letters are reserved for the values a Random Variable may attain.
Probability Distributions Random Variables For example, let’s take the case of flipping a coin. The probability that a random variable Y can have a value y is:
Probability Distributions Random Variables When we perform an experiment and a Random Variable actually obtains a value, we call this an observation. A collection of observations is called a Sample. If observations are generated at random, with no bias, then the sample can be called a Random Sample.
Probability Distributions Random Samples When we take samples, usually we want the sample to be random. For instance, let’s say we stopped people on the street and asked them what their net worth is. In this case, net worth is the Random Variable. If we selected only people in their mid-twenties, then our observations are going to be biased. Or sample does not take into account those who are too young to work, those who are retired, and many other variations or people.
Probability Distributions Random Samples As we take more observations for our random sample, the actual distribution of values should be approached by the distribution of the sample.
Probability Distributions Normal Distribution Arguably, the most important probability distribution in Statistics. Comes in the form of a Bell-shaped curve. The green line is the standard normal distribution.
Probability Distributions Normal Probability Distribution Function This equation can appear daunting. There are two main points to get out of it!!! f(y) is the common way to represent a probability function.
Probability Distributions Normal Probability Distribution Function Things to get out of this equation: 1., pronounced mu, is the mean or center of the distribution. 2., pronounced sigma, measures the standard deviation or spread of the distribution. These are two important parameters in the function. They define the shape and the values of the distribution.
Probability Distributions Normal Probability Distribution Function We will work with a normal probability distribution and the effect of different sigma and mu values, later, in the computer lab. Let’s try to talk through it now though.
Probability Distributions Normal Probability Distribution Function
Probability Distributions Normal Distributions About how many values lie within 1 standard deviation? About how many values lie within 2 standard deviations? How about 3 standard deviations?
Probability Distributions Normal Probability Plot Since Normal Distributions are powerful, it is convenient to be able to check if our data follows a normal distribution. We do this by calculating the Normal Score. A Normal Score is the value you would expect if your sample came from a standard normal distribution.
Probability Distributions Normal Scores Example For a sample size of 5, there are 5 normal scores: -1.163, -0.495, 0, 0.495, 1.163 Imagine generating many samples of standard normal data, with 5 observations each. Take the average of the lowest numbers, then the second lowest, and so forth. These averages are the Normal Scores.
Probability Distributions Normal Scores Example This tells us how many standard deviations an observation is above or below the mean. Once you have the Normal Scores, plot these against the 5 observations from your sample. This is called the Normal Probability Plot. If your data is normally distributed, the points should fall close to a straight line.
Probability Distributions Normal Scores Example
Probability Distributions Parameters and Estimators We have previously assumed that we know the parameter values of our probability distributions, i.e. mean and standard deviation. In most cases, we do not know these parameters. So, we have to use our data to estimate them. We call these Estimators.
Probability Distributions Estimators In a normal distribution, we have two parameters: We can estimate the value of by calculating the sample average. We call it We can estimate the value of by calculating the standard deviation. We call it s.
Probability Distributions Consistent Estimators The values and s have a special property. They are consistent estimators. This means, by the law of large numbers, as you increase the size of the random sample, the values of these estimators come closer and closer to the actual parameter values.
Probability Distributions Standard Error Is the standard deviation of the sample average. Often, we do not know the true mean, so the Standard Error has to be estimated also. The standard error of the mean, of a sample, is the standard deviation of the sample divided by the square root of the number of samples.
Probability Distributions Central Limit Theorem This is the 2 nd most important theorem in Statistics, the first is the Law of Large Numbers. This theorem states that the distribution of an average tends to be normal, even when the distribution is non-normal. So, if we continual take samples, of increasing size, the averages of the samples will fall into a normal distribution.
Always Due in One Week Homework (Always Due in One Week) Read Chapter 5. Complete Chapter 5, pg. 201: 1(a-c), 2, 4(a-c), 5(a-d). Probability Distributions