Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.

Organizing and Analyzing Data

Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures of variability range, standard deviation INFERENTIAL STATISTICS: Analyzes data measures of association correlation coefficient measures of causation Statistical significance

Descriptive Statistics: Measures of Central Tendency  Mode  the most frequently occurring score in a distribution  Median  the middle score in a distribution  half the scores are above it and half are below it  Mean  the arithmetic average of a distribution  obtained by adding the scores and then dividing by the number of scores

Central Tendency PRACTICE Using the data set below, compute the (3) measures of central tendency. 2, 4, 5, 7, 7, 8, 9, 9, 9, 10, 11

Descriptive Statistics: Measures of Variability Range - the difference between the highest and lowest score in a set of data 2, 4, 5, 7, 7, 8, 9, 9, 9, 10, 11 Standard deviation- a computed measure of how much scores vary around the mean calculated by finding the square root of the variance Defines the shape of the normal distribution curve

Bell Curve: “Normal” Distribution

The red area represents the first standard deviant. 68% of the data falls within this area. The green area represents the second standard deviant. 95% of the data falls within the green PLUS the red area. Calculated by The blue area represents the third standard deviant. 99% of the data falls within blue PLUS the green PLUS the red area.

Standard Deviation

Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam? Why or why not?

The answer is… No. The average (mean) does not tell us anything about the distribution or variation in the grades.

Here are Dot-Plots of the grades in each class:

So, we need to come up with some way of measuring not just the average, but also the spread of the distribution of our data.

Why not just give an average and the range of data (the highest and lowest values) to describe the distribution of the data?

Well, for example, lets say from a set of data, the average is 17.95 and the range is 23. But what if the data looked like this:

Here is the average And here is the range But really, most of the numbers are in this area, and are not evenly distributed throughout the range.

The Standard Deviation is a number that measures how far away each number in a set of data is from their mean.

If the Standard Deviation is large, it means the numbers are spread out from their mean. If the Standard Deviation is small, it means the numbers are close to their mean. small, large,

Here are the scores on the math quiz for Team A: 72 76 80 81 83 84 85 89 Average: 81.5

The Standard Deviation measures how far away each number in a set of data is from their mean. For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5? 72 - 81.5 = - 9.5 - 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5? 89 - 81.5 = 7.5 7.5

So, the first step to finding the Standard Deviation is to find all the distances from the mean. 72 76 80 81 83 84 85 89 -9.5 7.5 Distance from Mean

So, the first step to finding the Standard Deviation is to find all the distances from the mean. 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean

Next, you need to square each of the distances to turn them all into positive numbers 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean 90.25 30.25 Distances Squared

Next, you need to square each of the distances to turn them all into positive numbers 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean 90.25 30.25 2.25 0.25 2.25 6.25 12.25 56.25 Distances Squared

Add up all of the distances 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean 90.25 30.25 2.25 0.25 2.25 6.25 12.25 56.25 Distances Squared Sum: 214.5

Divide by (n - 1) where n represents the amount of numbers you have. 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean 90.25 30.25 2.25 0.25 2.25 6.25 12.25 56.25 Distances Squared Sum: 214.5 (10 - 1) = 23.8

Finally, take the Square Root of the average distance 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean 90.25 30.25 2.25 0.25 2.25 6.25 12.25 56.25 Distances Squared Sum: 214.5 (10 - 1) = 23.8 = 4.88

This is the Standard Deviation 72 76 80 81 83 84 85 89 - 9.5 - 5.5 - 1.5 - 0.5 1.5 2.5 3.5 7.5 Distance from Mean 90.25 30.25 2.25 0.25 2.25 6.25 12.25 56.25 Distances Squared Sum: 214.5 (10 - 1) = 23.8 = 4.88

The Standard Deviation for the other class grades is 15.91 57 65 83 94 95 96 98 93 71 63 - 24.5 - 16.5 1.5 12.5 13.5 14.5 16.5 11.5 - 10.5 -18.5 Distance from Mean 600.25 272.25 2.25 156.25 182.25 210.25 272.25 132.25 110.25 342.25 Distances Squared Sum: 2280.5 (10 - 1) = 253.4 = 15.91

Now, lets compare the two classes again Team ATeam B Average on the Quiz Standard Deviation 81.5 81.5 4.88 15.91

Which is the “smarter” class and why? Class A St. Dev = 4.88 Class B St. Dev = 15.91

Bell Curve: “Normal” Distribution

The red area represents the first standard deviant. 68% of the data falls within this area. The green area represents the second standard deviant. 95% of the data falls within the green PLUS the red area. Calculated by The blue area represents the third standard deviant. 99% of the data falls within blue PLUS the green PLUS the red area.

Skew: “Non-Normal” distribution

INFERENTIAL STATISTICS Correlational design Correlation Coefficient: How strong is the relationship between the two variables? As one goes up does the other go slightly or more extremely up or down? Experimental design Statistical Significance: How confident am I that the difference between my experimental group and control group is a result of the treatment?

Correlation Coefficient A statistic that quantifies a relation between two variables Can be either positive or negative Falls between -1.00 and 1.00 The value of the number (not the sign) indicates the strength of the relation

Positive Correlation Association between variables such that high scores on one variable tend to have high scores on the other variable A direct relation between the variables

Negative Correlation Association between variables such that high scores on one variable tend to have low scores on the other variable An inverse relation between the variables

Correlational Research The correlation technique indicates the degree of association between 2 variables Correlations vary in direction: Positive association: increases in the value of variable X are associated with increases in the value of variable Y Negative association: increases in the value of variable 1 are associated with decreases in the value of variable 2 No relation: values of variable 1 are not related to variable 2 values

Correlation  Correlation Coefficient  a statistical measure of the extent to which two factors vary together, and thus how well either factor predicts the other Correlation coefficient Indicates direction of relationship (positive or negative) Indicates strength of relationship (0.00 to 1.00) r = +.37

Check Your Learning Which is stronger? A correlation of 0.25 or -0.74?

Misleading Correlations: Correlation is NOT Causation Something to think about There is a 0.91 correlation between ice cream consumption and drowning deaths. Does eating ice cream cause drowning? Does grief cause us to eat more ice cream?

45 Correlation Correlation is NOT causation -e.g., armspan and height

The Limitations of Correlation Correlation is not causation. Invisible third variables Three Possible Causal Explanations for a Correlation

Inferential statistics Statistical Significance: Computation that determines degree of confidence that your experimental results occurred due to the treatment and not other factors How likely/probable are results like mine to occur by chance?  a statistical computation and statement of how likely it is that an obtained result occurred by chance

Statistical significance is calculated by determining: the probability that the differences between sets of data occurred by chance or were the result of the experimental treatment. Statistical Significance (α) reveals the probability level that results could be obtained by chance. Most common pre-determined value= 5%/.05 (…which means that there is a 5% chance or below that results were obtained by chance)

Statistical Significance and the Null Hypothesis Two hypotheses need to be formed: Research hypothesis- the one being tested by the researcher. Null hypothesis- the one that assumes that any differences within the set of data is due to chance and is not significant.

Instead of testing to find the intended result, research test the “Null” which is the OPPOSITE of one’s hypothesis. If there is ANY difference between the control and the experimental group, and the research is confident it’s because of the IV, he/she REJECTS THE NULL. Example 1: Caffeine has NO effect on student’s ability to stay awake past 2 a.m. Example 2: Music has NO effect on subjects’ memory The Null Hypothesis

If there the experiment reveals ANY effect (statistical degree of significance/between the experimental and control groups) then we REJECT THE NULL. If the Null Hypothesis is rejected, what does that mean? Caffeine and ability to stay awake past 2 a.m Music and memory The Null Hypothesis

If there the experiment reveals NO effect (statistical degree of significance/between the experimental and control groups) then we ACCEPT THE NULL. If the Null Hypothesis is accepted, what does that mean? Caffeine and ability to stay awake past 2 a.m Music and memory The Null Hypothesis

STATISTICAL SIGNIFICANCE Statistical Significance (α) reveals the probability level that results could be obtained by chance. Most common pre-determined value= 5%/.05 (…which means that there is a 5% chance or below that results were obtained by chance) “Energy Drinks have no effects on AP Calculus exam results” The results reveal a level of significance:.06(Reject or Accept the null hypothesis?).0008 ((Reject or Accept the null hypothesis?)

Statistical significance Null Hypothesis: “There is no difference between students’ performance on CSTs when they are fed breakfast before or not” Statistical Significance (α) -.0555 -.04 -.008

ERRORS: Type I: False positive / Type II: false negative Reject null hypothesis when it is true Type I error: False Positive (Drug X really has no effect!) Fail to Reject null hypothesisType II error: False Negative (Drug X actually does have an effect!) Drug X has no effects on anxiety.

Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.

Similar presentations

Presentation on theme: "Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures.

Similar presentations

Presentation on theme: "Organizing and Analyzing Data. Types of statistical analysis DESCRIPTIVE STATISTICS: Organizes data measures of central tendency mean, median, mode measures."— Presentation transcript:

Similar presentations

About project

Feedback