A revision example.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1 A B C
Chapter 4 Sampling Distributions and Data Descriptions.
5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
Measures of Location and Dispersion
Simplifications of Context-Free Grammars
AP STUDY SESSION 2.
1
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 5- 1.
Continuous Numerical Data
Whiteboardmaths.com © 2004 All rights reserved
CALENDAR.
Study question: distribution of IQ
Your lecturer and course convener
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Multiple-choice example
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Multiple-choice example
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
PP Test Review Sections 6-1 to 6-6
Digital Lessons on Factoring
Maths Trail. How many hanging baskets are there in the garden? 1.
Frequency Distributions Quantitative Methods in HPELS 440:210.
LIAL HORNSBY SCHNEIDER
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
A bar chart of a quantitative variable with only a few categories (called a discrete variable) communicates the relative number of subjects with each of.
Chapter 2: Frequency Distributions
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Introduction Our daily lives often involve a great deal of data, or numbers in context. It is important to understand how data is found, what it means,
Biology 2 Plant Kingdom Identification Test Review.
Chapter 1: Expressions, Equations, & Inequalities
Quantitative Analysis (Statistics Week 8)
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
Subtraction: Adding UP
1 Lab 17-1 ONLINE LESSON. 2 If viewing this lesson in Powerpoint Use down or up arrows to navigate.
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Types of selection structures
12 System of Linear Equations Case Study
Converting a Fraction to %
Basic Statistics Measures of Central Tendency.
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
LIS 570 Summarising and presenting data - Univariate analysis.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Presentation transcript:

A revision example

Solution Statement A looks reasonable; but read the others to make sure. The experimenter knows what the experiment is about. B is false. C is false: the placebo effect is neutralised by having a placebo condition. D is false: the double blind is not a problem, but a procedure in experimental design. The answer is A.

Lecture 3 GETTING TO KNOW YOUR DATA

Don’t rush in! Many people can’t wait to do STATISTICAL TESTS on their data. But there are problems with that approach. You must GET TO KNOW YOUR OWN DATA first. Otherwise, you may come to seriously ERRONEOUS CONCLUSIONS.

Results of the Caffeine experiment

The raw data You have been looking at the RAW DATA, that is, the ORIGINAL SCORES achieved by the participants. From inspection, it seems that the Caffeine group tended to have higher scores. With large data sets, however, it can be very difficult to see what’s going on.

Summarising the data We need to SUMMARISE these results, in order to bring out their most important features. There are 2 ways of doing this. We can make a picture, or GRAPH, of the data. We can calculate measures known as STATISTICS, which encapsulate the most important properties of the Caffeine and Placebo results.

Graphs The first step in your analysis is to make a picture or GRAPH of your data, so that you can see at a glance what happened in the experiment.

Levels of measurement The kind of graph you need depends upon the LEVEL OF MEASUREMENT. There are three levels: 1. The SCALE level. The data are measures on an independent scale with units. Heights, weights, performance scores and IQs are scale data. Each score has ‘stand-alone’ meaning. 2. The ORDINAL level. Data in the form of RANKS (1st, 3rd, 53rd). A rank has meaning only in relation to the other individuals in the sample. A rank does not express, in units, the extent to which a property is possessed. 3. The NOMINAL level. Assignments to categories (so-many males, so-many females.)

Distribution A DISTRIBUTION is a table or graph showing the FREQUENCIES with which different values are to be found. The first approach in the analysis of a data set is to picture the data as a whole by obtaining a graph of the distribution.

A picture of the results

What happened in this experiment? The scores of the Caffeine group TEND to be higher than do the scores of the Placebo group. There is, however, considerable overlap: some participants in the Placebo condition outperformed those in the Caffeine condition.

Human variability In the Caffeine distribution, values are densest around 13; whereas in the Placebo distribution, values are densest around 9. But there is a huge RANGE in performance. The worst performer was in the Caffeine group; the best was in the Placebo group. Worst score Best score

Histograms A HISTOGRAM is useful for displaying the distribution of a large data set. Here is a histogram of the heights of 1000 men. Note that you cannot recover the raw data from a histogram.

A histogram

Histograms The entire range of variation (shown on the x-axis) is divided into CLASS INTERVALS. The heights of the bars are proportional to the FREQUENCIES of values (y-axis) falling within the class intervals represented by the bases of the bars. The bars touch each other, indicating the CONTINUOUS variation of the variable.

Histograms of the Caffeine and Placebo distributions The data from our experiment are really too scarce for histograms. But you can see that the scores of the Caffeine group tend to be higher than the scores of the Placebo group. In each histogram, you also see the two atypical scores: 20 in the Placebo group; 2 in the Caffeine group.

Horizontal histograms You might prefer to reverse the scales, so that frequency is measured along the horizontal axis. Again, it is clear that scores obtained under the Caffeine condition tend to be higher.

Outliers I have drawn your attention to the atypical scores of 2 (in the Caffeine group) and of 20 (in the Placebo group). Such atypical scores are known as OUTLIERS. With small data sets, outliers can have marked effects upon the values of some statistics and make them unrepresentative of the data as a whole.

Stem-and-leaf displays

Stem-and-leaf displays This kind of display is very useful with small data sets. The vertical STEM is a scale along which values can vary. But the Placebo scale is in the original scale units; whereas the Caffeine stem scale is in units of ten: the upper scale value 1 represents the interval from 10-14; the lower represents scores from 15 to 19. A LEAF is an observation at a particular point on the stem scale. A many-leafed stem can represent either a recurring value or a range of common values, depending on the stem unit.

Stem-and-leaf displays In the Placebo group, the stem point 8 has the most leaves. In the Caffeine group, the stem point 12 has the most leaves.

The caffeine scores The atypical ‘extreme score’ (2) has been identified. The stem scale unit is ten. The leaves at upper stem point 1 represent the numbers 10, 11, 12, 13 and 14, the repetitions of a digit indicating recurrence of the same value. On the left of the stem are the frequencies of occurrence of values within the ranges indicated on the stem. The stem plus the frequencies show the FREQUENCY DISTRIBUTION. Values between 15 and 19, inclusive

Statistics The word STATISTICS has more than one meaning. A STATISTIC is a measure which summarises an important aspect of a distribution. But STATISTICS is also a discipline which is concerned, not only with description of data that have already been gathered but also with the making of inferences about data that MIGHT be gathered in the future. We shall now try to use some STATISTICS to describe the Caffeine and Placebo distributions.

The average An AVERAGE is a value that is TYPICAL or REPRESENTATIVE of those in a distribution. It is clear that the average score of the Caffeine distribution is higher than the average score of the Placebo distribution. Several different measures of ‘the average’ are available.

The mean

Formula for the mean

Calculating the means

The mean as the ‘centre of gravity’ The mean can be thought of as THE CENTRE OF GRAVITY of a distribution, the point at which it would BALANCE on a knife-point. We can see (because this distribution is symmetrical) that the mean of this distribution is 3.

Scenario 2

Interpretation of Scenario 2 The scores of both groups cluster around the same value: 12 . Since the distributions are completely symmetrical, the mean of either is clearly 12. In the Caffeine distribution, however, the scores are more widely SPREAD OUT or DISPERSED than those of the Placebo group. Perhaps, over and above individual differences, caffeine promotes performance in some participants, but impedes others.

Dispersion or spread The DISPERSION of a distribution is the extent to which scores are spread out, scattered about or DEVIATE from the central mean. Dispersion is another very important aspect of a data set and one which must be examined carefully when interpreting the data. There are several ways of measuring the dispersion of a distribution.

The simple range The SIMPLE RANGE is the highest score minus the lowest score. So, for the Placebo group in Scenario 2, the simple range is (15 – 9) = 6 score units. For the Caffeine group, the simple range is (18 – 6) = 12 score units. The Caffeine distribution shows twice as much spread or dispersion of scores around the mean.

A problem with the simple range The simple range statistic only uses two scores out of the whole distribution. Should those particular scores be highly atypical of the distribution, the range may not reflect the typical spread of scores about the mean of the distribution. The data from the Caffeine experiment (Scenario 1) exemplify this.

Use of the range Nevertheless, the range can be a very useful statistic when you are EXPLORING a data set. There are more complex RANGE STATISTICS which use more of the information in a data set than does the simple range.

The variance and the standard deviation (SD) The VARIANCE and the STANDARD DEVIATION (SD) are also measures of dispersion. Both statistics use the values of ALL the scores in the distribution.

Deviation scores The DEVIATION SCORE is the building block from which the variance and SD are calculated. When a score is greater than the mean, the deviation will have a positive sign. When a score is less than the mean, the deviation will have a negative sign. When a score is equal to the mean, the deviation is zero.

Deviations sum to zero Zero deviations -ve deviations +ve deviations The mean is the centre of gravity, or balance point. The deviations are the distances of the points from the balance point. They must sum to zero: the positives and negatives must cancel each other out.

The mean deviation is zero Deviations about the mean sum to zero. So the MEAN DEVIATION will always be zero. The mean deviation would be USELESS as a measure of spread.

The squared deviations The sum of the SQUARED deviations is always either positive (when scores have different values) or zero (if all the scores have the same value). If there is any variability in the scores at all, the sum of the squared deviations will have a positive value.

Formula for the variance The Greek letter sigma (Σ) is used to indicate that you are to obtain the deviation of each score from the mean, square it, then add up all the squared deviations. Why is 1 subtracted from the number of scores? Explanation later!

Applying the formula

Variance of the Caffeine scores

A problem with the variance The simple range statistic has the merit of being in the same units as the raw data. The variance, since it is based on the squares of the deviations, is in SQUARED UNITS and is therefore difficult to interpret. If you take the (positive) square root of the variance, you have the STANDARD DEVIATION, which is in the original units of measurement.

The SD is the positive square root of the variance We found that the variance was 10.73 For the scores in the Caffeine condition, we take the square root of 10.73 to obtain an SD of 3.28 . The square root operation restores the measure of spread to the original measurement units: we can say that the standard deviation is 3.28 hits.

Statistical summary of the data This table is an adequate summary of the results. Always accompany the values of the means (or some other measure of ‘the average’) with information about the SPREAD of the data. Here we see that the standard deviations of the two sets of scores have similar values. That has implications for further analysis.

Distribution shape We have measured the AVERAGE and the SPREAD of the Caffeine and Placebo distributions. We noted that both distributions were (at least approximately) SYMMETRICAL. There are circumstances in which that would not be the case.

A disappointing result The mean for the Caffeine group is only very slightly greater than the Placebo mean. But note that both means are near the top of the scale (20).

Scenario 3: a ceiling effect

Ceiling effect … The scores of both groups are bunched around the top of the scale. Any possible effect of caffeine intake has been masked by a CEILING EFFECT. The task chosen was TOO EASY for the participants. No conclusions about the effects of ingestion of caffeine can be drawn from these data.

Another disappointing result Again the Caffeine mean is only slightly greater than the Placebo mean. But both means are near the bottom of the scale (zero).

Scenario 4: a floor effect

Floor effect The scores of either group are bunched around the bottom of the scale. The task was too difficult. No conclusions about the effects of ingestion of caffeine can be drawn from these data either.

Skewness In both Scenarios 3 and 4, the distributions are asymmetric or SKEWED. When a distribution has a tail to the left, it is said to be NEGATIVELY SKEWED; when it has a tail to the right, it is POSITIVELY SKEWED. In Scenario 3, the distributions are negatively skewed; whereas in Scenario 4 they are positively skewed. Ceiling and floor effects result in skewed distributions; though skewness of distribution does not necessarily imply a ceiling or floor effect.

Summary The three most important properties of a distribution are: The typical value, AVERAGE, or CENTRAL TENDENCY. The SPREAD or DISPERSION of scores around the average. The SHAPE of the distribution.

Summary … The MEAN is the arithmetical average. The VARIANCE and STANDARD DEVIATION (its square root) are measures of SPREAD. There are also measures of the asymmetry or SKEWNESS of the distribution. This property, however, is often clear from inspection of the graph.

Key terms distribution histogram stem-&-leaf display outlier average mean spread or dispersion simple range deviation variance standard deviation

Key terms… Ceiling and floor effects positive and negative skewness

Multiple-choice example

Study questions The mean weight of three people in a car is 170 pounds. They pick up another person, whose weight is 190 pounds. What is now the mean weight of the people in the car? We have seen that the mean of the scores in the Caffeine group is 11.90 and the SD = 3.28. Suppose we add a constant of 2 to each of the 20 scores. What effects would that have upon the values of the mean, the variance and the SD? What would be the effects of multiplying each score by 2?