Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics

Update on our sentence processing experiment… Quick graph of reaction time per region Quick graph of reaction time per region

Update Seems nice; there’s a difference in region 5 (where the NP, I, they, John were) and also in region 6. From slowest to fastest, John, he, NP, I. Seems nice; there’s a difference in region 5 (where the NP, I, they, John were) and also in region 6. From slowest to fastest, John, he, NP, I. Something like what we expected—but wait… Something like what we expected—but wait…

Update Two further things that this didn’t account for: Two further things that this didn’t account for: Different people read at different speeds. Different people read at different speeds. I is a lot shorter than the photographer. Might it go faster? I is a lot shorter than the photographer. Might it go faster? To take account of people’s reading speeds, tried average RT per character on the fillers. To take account of people’s reading speeds, tried average RT per character on the fillers.

Subject RT/c Average RT per character was pretty much all over the map. Average RT per character was pretty much all over the map. So at least it seemed worth factoring out. So at least it seemed worth factoring out. Overhead? Overhead?

Items? It’s also important to look at the items. Were any always incorrect? Those might have been too hard or had something else wrong with them. ? It’s also important to look at the items. Were any always incorrect? Those might have been too hard or had something else wrong with them. ? (Not clear that we actually care whether the answer was right) (Not clear that we actually care whether the answer was right)

End result so far? So, taking that all into account, I ended up with this… Not what we were going for. So, taking that all into account, I ended up with this… Not what we were going for.

So… There’s still work to be done. Since I’m not sure exactly what work that is, once again… no lab work to do. There’s still work to be done. Since I’m not sure exactly what work that is, once again… no lab work to do. Instead, we’ll talk about statistics generally… Instead, we’ll talk about statistics generally… Places to go: Places to go: http://davidmlane.com/hyperstat/ http://davidmlane.com/hyperstat/ http://davidmlane.com/hyperstat/ http://www.stat.sc.edu/webstat/ http://www.stat.sc.edu/webstat/ http://www.stat.sc.edu/webstat/

Measuring things When we go out into the world and measure something like reaction time for reading a word, we’re trying to investigate the underlying phenomenon that gives rise to the reaction time. When we go out into the world and measure something like reaction time for reading a word, we’re trying to investigate the underlying phenomenon that gives rise to the reaction time. When we measure reaction time of reading I vs. they, we are trying to find out of there is a real, systematic difference between them (such that I is generally faster). When we measure reaction time of reading I vs. they, we are trying to find out of there is a real, systematic difference between them (such that I is generally faster).

Measuring things So, suppose for any given person, it takes A ms to read I and B ms to read they. So, suppose for any given person, it takes A ms to read I and B ms to read they. If our measurement worked perfectly, we’d get A whenever we measure for I and B whenever we measure for they. If our measurement worked perfectly, we’d get A whenever we measure for I and B whenever we measure for they. But it’s a noisy world. But it’s a noisy world.

Measuring things Measurement never works perfectly. Measurement never works perfectly. There is always additional noise of some kind or another. You’re likely to get a value near A when you measure I, but you’re not guaranteed to get A. There is always additional noise of some kind or another. You’re likely to get a value near A when you measure I, but you’re not guaranteed to get A. Similarly, there are differences between subjects, differences between items, differences of still other sorts… Similarly, there are differences between subjects, differences between items, differences of still other sorts…

A common goal Commonly what we’re after is an answer to the question: are these two things that we’re measuring actually different? Commonly what we’re after is an answer to the question: are these two things that we’re measuring actually different? So, we measure for I and for they. Of the measurements we’ve gotten, I seems to be around A, they seems to be around B, and B is a bit longer than A. The question is: given the inherent noise of measurement, how likely is it that we got that different just by chance? So, we measure for I and for they. Of the measurements we’ve gotten, I seems to be around A, they seems to be around B, and B is a bit longer than A. The question is: given the inherent noise of measurement, how likely is it that we got that different just by chance?

Some stats talk There are two major uses for statistics: There are two major uses for statistics: Describing a set of data in some comprehensible way Describing a set of data in some comprehensible way Drawing inferences from a sample about a population. Drawing inferences from a sample about a population. That last one is the useful one for us; by picking some random representative sample of the population, we can estimate characteristics of the whole population by measuring things in our sample. That last one is the useful one for us; by picking some random representative sample of the population, we can estimate characteristics of the whole population by measuring things in our sample.

Normally… Many things we measure, with their noise taken into account, can be described (at least to a good approximation) by this “bell-shaped” normal distribution. Many things we measure, with their noise taken into account, can be described (at least to a good approximation) by this “bell-shaped” normal distribution. Often as we do statistics, we implicitly assume that this is the case… Often as we do statistics, we implicitly assume that this is the case…

First some descriptive stuff Central tendency: Central tendency: What’s the usual value for this thing we’re measuring? What’s the usual value for this thing we’re measuring? Various ways to do it, most common way is by using the arithmetic mean (“average”). Various ways to do it, most common way is by using the arithmetic mean (“average”). Average is determined by adding up the measurements and dividing by the number of measurements. Average is determined by adding up the measurements and dividing by the number of measurements.

Descriptive stats Spread Spread How often is the measurement right around the mean? How far out does it get? How often is the measurement right around the mean? How far out does it get? Range (maximum - minimum), kind of basic. Range (maximum - minimum), kind of basic. Variance, standard deviation: a more sophisticated measure of the width of the measurement distribution. Variance, standard deviation: a more sophisticated measure of the width of the measurement distribution. You describe a normal distribution in terms of two parameters, mean and standard deviation. You describe a normal distribution in terms of two parameters, mean and standard deviation.

Interesting facts about stdev About 68% of the observations will be within one standard deviation of the mean. About 68% of the observations will be within one standard deviation of the mean. About 95% of the observations will be within two standard deviations of the mean. About 95% of the observations will be within two standard deviations of the mean. Percentile (mean 80, score 75, stdev 5): 15.9 Percentile (mean 80, score 75, stdev 5): 15.9

So, more or less, … If we knew the actual mean of the variable we’re measuring and the standard deviation, we can be 95% sure that any given measurement we do will land within two standard deviations of that mean— and 68% sure that it will be within one. If we knew the actual mean of the variable we’re measuring and the standard deviation, we can be 95% sure that any given measurement we do will land within two standard deviations of that mean— and 68% sure that it will be within one. Of course, we can’t know the actual mean. But we’d like to. Of course, we can’t know the actual mean. But we’d like to.

Confidence intervals It turns out that you kind run this logic in reverse as well, coming up with a confidence interval (I won’t tell you how precisely, but here’s the idea): It turns out that you kind run this logic in reverse as well, coming up with a confidence interval (I won’t tell you how precisely, but here’s the idea): Given where you see the measurements coming up, they must be 68% likely to be within 1 CI of the mean, and 95% likely to be within 2 CI of the mean, so the more measurements you have the better guess you can make. Given where you see the measurements coming up, they must be 68% likely to be within 1 CI of the mean, and 95% likely to be within 2 CI of the mean, so the more measurements you have the better guess you can make. A 95% CI like 209.9 < µ < 523.4 means “we’re 95% confident that the real mean is in there”. A 95% CI like 209.9 < µ < 523.4 means “we’re 95% confident that the real mean is in there”.

Hypothesis testing Testing to see if the means generating two distributions are actually different. Testing to see if the means generating two distributions are actually different. The idea is to determine how likely it is that we could get the difference we observe by chance. After all, you could roll 25 6’es in a row, it’s just very unlikely. (1/6)^25. (Null hypothesis = chance). The idea is to determine how likely it is that we could get the difference we observe by chance. After all, you could roll 25 6’es in a row, it’s just very unlikely. (1/6)^25. (Null hypothesis = chance). Once you estimate the sample means and standard deviations, this is something you basically look up (t-test, based on number of observations you make). This is what you see reported as p. Once you estimate the sample means and standard deviations, this is something you basically look up (t-test, based on number of observations you make). This is what you see reported as p. “p < 0.05” means there’s only a 5% chance this happened by accident. “p < 0.05” means there’s only a 5% chance this happened by accident.

Significance Generally, 0.05 is taken to be the level of “significance”—if the difference you measure only has a 5% chance of having arisen by pure accident, than that difference is significant. Generally, 0.05 is taken to be the level of “significance”—if the difference you measure only has a 5% chance of having arisen by pure accident, than that difference is significant. There’s no real magic about 0.05, it’s just a convention. Hard to say that 0.055 and 0.045 are seriously qualitatively different. There’s no real magic about 0.05, it’s just a convention. Hard to say that 0.055 and 0.045 are seriously qualitatively different.

ANOVA Analysis of variance—same as the t-test, except for more than two means at once. Still trying to discover if there are differences in the underlying distributions of several means that are unlikely to have arisen just by chance. Analysis of variance—same as the t-test, except for more than two means at once. Still trying to discover if there are differences in the underlying distributions of several means that are unlikely to have arisen just by chance. I hope to come back to this. Perhaps it can be tacked on to a different lab. I hope to come back to this. Perhaps it can be tacked on to a different lab.

Statistical power In general, the more samples you get, the better off you are—the more statistical power your analysis has. Also, the lower the variance, the significant level you’ve chosen. In general, the more samples you get, the better off you are—the more statistical power your analysis has. Also, the lower the variance, the significant level you’ve chosen. Technically, statistical power has to do with how likely it is that you will correctly reject a false null hypothesis. Technically, statistical power has to do with how likely it is that you will correctly reject a false null hypothesis. H0 true H0 false Reject H0 Type I error Correct Do not reject H0 Correct Type II error

Correlation and Chi square Correlation between two two measured variables is often measured in terms of (Pearson’s) r. Correlation between two two measured variables is often measured in terms of (Pearson’s) r. If r is close to 1 or -1, the value of one variable can predict quite accurate the value of the other. If r is close to 1 or -1, the value of one variable can predict quite accurate the value of the other. If r is close to 0, predictive power is low. If r is close to 0, predictive power is low. Chi-square test is supposed to help us decide if two conditions/factors are independent of one another or not. (Does knowing one help predict the effect of the other?)

Much more to it… Mainly I just wanted you to see some terminology. I hope to get some workable data from some experiment or lab we do that we can put into a stats program, perhaps just WebStat. Mainly I just wanted you to see some terminology. I hope to get some workable data from some experiment or lab we do that we can put into a stats program, perhaps just WebStat. …

                      

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

Similar presentations

Presentation on theme: "Week 6. Statistics etc. GRS LX 865 Topics in Linguistics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Week 6. Statistics etc. GRS LX 865 Topics in Linguistics.

Similar presentations

Presentation on theme: "Week 6. Statistics etc. GRS LX 865 Topics in Linguistics."— Presentation transcript:

Similar presentations

About project

Feedback