# The T Distribution ©Dr. B. C. Paul 2005. Wasn’t the Herby Assembly Line Problem Fun But there is one little problem But there is one little problem We.

## Presentation on theme: "The T Distribution ©Dr. B. C. Paul 2005. Wasn’t the Herby Assembly Line Problem Fun But there is one little problem But there is one little problem We."— Presentation transcript:

The T Distribution ©Dr. B. C. Paul 2005

Wasn’t the Herby Assembly Line Problem Fun But there is one little problem But there is one little problem We knew that our mean value could have been all over the map relative to the real true mean We knew that our mean value could have been all over the map relative to the real true mean We calculated our standard deviation from the same sample We calculated our standard deviation from the same sample How come our mean could be anything and yet our standard deviation is God’s own value for the standard deviation? How come our mean could be anything and yet our standard deviation is God’s own value for the standard deviation?

It Isn't When our value for the standard deviation is just an estimate we have another chance for things to be way out in the tails When our value for the standard deviation is just an estimate we have another chance for things to be way out in the tails Sadisticians – woops I mean statisticians figured out probability distribution for what would happen then Sadisticians – woops I mean statisticians figured out probability distribution for what would happen then Called it the T distribution Called it the T distribution First published in 1908 perfected in 1926 First published in 1908 perfected in 1926 We look up values for areas under the curve of a T distribution just like we did with a normal distribution. We look up values for areas under the curve of a T distribution just like we did with a normal distribution.

Let’s Redo Herby’s Problem Right This Time We will use the T distribution We will use the T distribution S is the estimated standard deviation The test statistic has a T distribution (assuming the underyling population Really is normally distributed) The distribution has n-1 degrees of freedom

Degrees of Freedom! What are you talking about? – this isn’t an Amnesty International Class Consider # of equations and # of unknowns Consider # of equations and # of unknowns To uniquely solve 3 unknowns you need 3 independent equations To uniquely solve 3 unknowns you need 3 independent equations Each sample is like an equation Each sample is like an equation If I have one sample I first use it as an estimate of the mean. If I have one sample I first use it as an estimate of the mean. I can’t calculate a standard deviation – I don’t have enough data I can’t calculate a standard deviation – I don’t have enough data If I have two samples If I have two samples I can estimate std deviation and still have one degree of freedom to measure something else I can estimate std deviation and still have one degree of freedom to measure something else Happens to be the mean Happens to be the mean How much extra data do I have above the bear minimum? How much extra data do I have above the bear minimum?

So How Do I Use This? (I have a really bad feeling your going to tell me) Note that this table is set up Different from Z values for normal Distribution. Area under the curve comes from The top line. Degrees of Freedom from the side Value in the middle is the T value (equivalent to the Z value) Remember in the normal table The Z value was on the edge And the area under the curve In the middle of the table

Lets Do the Problem X = 3.8 S= 0.73 N= 7 OK – So What Is t?

Finding t If we do this as a two tailed test (ie we would be concerned if our Balls were to hard or to soft) we Can only have 2.5% in each tail Pick 97.5 We have 7 samples hence n-1 or 6 degrees of freedom Read into the table 2.45

Plug and Chug 4.48 We can still reject the null hypothesis with an Alpha Level of 5% but it is now much closer Than before

Some Observations About Degrees of Freedom and the T statistic 95% of a normal distribution is within 1.96 standard deviations of the mean 95% of a normal distribution is within 1.96 standard deviations of the mean 95% of a T distribution is within 2.45 estimated standard deviations of the mean if the standard deviation estimate came from 7 samples 95% of a T distribution is within 2.45 estimated standard deviations of the mean if the standard deviation estimate came from 7 samples With 20 samples it is 2.09 estimated standard deviation units With 20 samples it is 2.09 estimated standard deviation units With 50 samples it is 2.01 With 50 samples it is 2.01 With 100 samples it is 1.98 With 100 samples it is 1.98 With 500 samples it is 1.96 With 500 samples it is 1.96 Note that as the number of samples increases the T distribution converges to a normal distribution Note that as the number of samples increases the T distribution converges to a normal distribution

So When Do I Use a T Distribution The underlying population must be realistic to model as having a normal distribution The underlying population must be realistic to model as having a normal distribution The standard deviation of the population must have been estimated from a standard deviation calculation using a sample of the population The standard deviation of the population must have been estimated from a standard deviation calculation using a sample of the population You can get out of using the T distribution and pretend that God gave you the standard deviation if you used about 100 or more samples to calculate your estimate of the standard deviation You can get out of using the T distribution and pretend that God gave you the standard deviation if you used about 100 or more samples to calculate your estimate of the standard deviation People with a lot of experience with a distribution often ignore the T distribution completely because they have seen results from hundreds of samples People with a lot of experience with a distribution often ignore the T distribution completely because they have seen results from hundreds of samples They are not “doing it wrong” using a simple normal distribution if they have that kind of data supporting their standard deviation value They are not “doing it wrong” using a simple normal distribution if they have that kind of data supporting their standard deviation value

Common Cheating on Random Samples Experiments should be planned before we look at the data Experiments should be planned before we look at the data If we look at the data and then decide what the experiment should have been we are “political spin doctors” not scientists If we look at the data and then decide what the experiment should have been we are “political spin doctors” not scientists A spin doctor looks at a result and then tries to make it say what he wants A spin doctor looks at a result and then tries to make it say what he wants A scientist sets up the test and lets the truth be what ever it is A scientist sets up the test and lets the truth be what ever it is Often we had a theory that made us want to look deeper Often we had a theory that made us want to look deeper Many theories are based on observations Many theories are based on observations But the scientific method causes you to then plan an experiment and go out and get the data you need to test the theory But the scientific method causes you to then plan an experiment and go out and get the data you need to test the theory It’s a subtle difference but its often ignored It’s a subtle difference but its often ignored The doctrine of “political correctness” is causing us all to loose our integrity The doctrine of “political correctness” is causing us all to loose our integrity

Back to Herby and the Two Tailed Test If it is true that hard balls make no difference – only soft ones then the test should have been set up as one tailed only If it is true that hard balls make no difference – only soft ones then the test should have been set up as one tailed only If the concern was the line being out of spec and that causing unhappy customers we could not know the sample would come out below 4.5 unless we peaked first If the concern was the line being out of spec and that causing unhappy customers we could not know the sample would come out below 4.5 unless we peaked first If at that point we decided we only cared about soft balls we distort the reliability of our analysis If at that point we decided we only cared about soft balls we distort the reliability of our analysis The data would have not only determined what the values of the test statistics were – it would have determined the test The data would have not only determined what the values of the test statistics were – it would have determined the test Normal distribution theory only accounts for the data determining the test statistic Normal distribution theory only accounts for the data determining the test statistic We in fact do not have good models for exactly what the consequences are if we let the data set up the test – we can say we are taking a chance of something bad happening We in fact do not have good models for exactly what the consequences are if we let the data set up the test – we can say we are taking a chance of something bad happening

My Choice So why did I do this example as a two tailed test So why did I do this example as a two tailed test 1- because that sample size analysis I did is nastier to explain if I’m only working on one side 1- because that sample size analysis I did is nastier to explain if I’m only working on one side 2- Because it sets up a great discussion on random samples and peaking and cherry picking data 2- Because it sets up a great discussion on random samples and peaking and cherry picking data 3- Because it allowed me to discuss when I should run one and two tailed tests 3- Because it allowed me to discuss when I should run one and two tailed tests The story problem told is inconclusive about whether Herby was vulnerable to the line being out of spec on one side only or on both sides The story problem told is inconclusive about whether Herby was vulnerable to the line being out of spec on one side only or on both sides

Look at the Problems We Have Run So Far We looked at a storm washing out the drainage system in a subdivision We looked at a storm washing out the drainage system in a subdivision Only too much rain would create the disaster – we really only were worried about too big rain events Only too much rain would create the disaster – we really only were worried about too big rain events (And we ran a one tailed test on the upper side) (And we ran a one tailed test on the upper side) We looked at a Mine and the amount of ore below cut-off grade that would go to the dump We looked at a Mine and the amount of ore below cut-off grade that would go to the dump We aren’t going to dump our high grade ore – we really only care about how much stuff is on the lower end We aren’t going to dump our high grade ore – we really only care about how much stuff is on the lower end (And we ran a one tailed test on the lower side) (And we ran a one tailed test on the lower side) We looked at tolerance on a machined part We looked at tolerance on a machined part The spec said we had to be plus or minus so our customer would be upset if the pegs were too big or too little The spec said we had to be plus or minus so our customer would be upset if the pegs were too big or too little (And we ran a two tailed test) (And we ran a two tailed test) Determine whether to run a one or two tailed test based on the concerns for the process or design you are working on – not from peaking at the data. Determine whether to run a one or two tailed test based on the concerns for the process or design you are working on – not from peaking at the data.

Download ppt "The T Distribution ©Dr. B. C. Paul 2005. Wasn’t the Herby Assembly Line Problem Fun But there is one little problem But there is one little problem We."

Similar presentations