1 Week 1 Review of basic concepts in statistics handout available at 30-9-2007 Trevor Thompson.

Slides:



Advertisements
Similar presentations
Population vs. Sample Population: A large group of people to which we are interested in generalizing. parameter Sample: A smaller group drawn from a population.
Advertisements

1 Week 2 Sampling distributions and testing hypotheses handout available at Trevor Thompson.
Quantitative Methods in HPELS 440:210
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Introduction to Summary Statistics
Introduction to Data Analysis
© Biostatistics Basics An introduction to an expansive and complex field.
Statistics for the Social Sciences
Descriptive Statistics
Methods and Measurement in Psychology. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Introduction to Educational Statistics
Data observation and Descriptive Statistics
Central Tendency and Variability
Describing Data: Numerical
Statistics for Linguistics Students Michaelmas 2004 Week 1 Bettina Braun.
@ 2012 Wadsworth, Cengage Learning Chapter 5 Description of Behavior Through Numerical 2012 Wadsworth, Cengage Learning.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Part II Sigma Freud & Descriptive Statistics
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Numerical Descriptive Techniques
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Summary statistics Using a single value to summarize some characteristic of a dataset. For example, the arithmetic mean (or average) is a summary statistic.
Lecture 3 A Brief Review of Some Important Statistical Concepts.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Tuesday August 27, 2013 Distributions: Measures of Central Tendency & Variability.
Measures of Variability. Variability Measure of the spread or dispersion of a set of data 4 main measures of variability –Range –Interquartile range –Variance.
1 PUAF 610 TA Session 2. 2 Today Class Review- summary statistics STATA Introduction Reminder: HW this week.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Psychology 101. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Chapter 3 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 3: Measures of Central Tendency and Variability Imagine that a researcher.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Chapter Eight: Using Statistics to Answer Questions.
Univariate Statistics PSYC*6060 Peter Hausdorf University of Guelph.
Statistics topics from both Math 1 and Math 2, both featured on the GHSGT.
Introduction to statistics I Sophia King Rm. P24 HWB
Today: Standard Deviations & Z-Scores Any questions from last time?
Describing Distributions Statistics for the Social Sciences Psychology 340 Spring 2010.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
CHAPTER 2: Basic Summary Statistics
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Statistical Methods Michael J. Watts
Measurements Statistics
Statistical Methods Michael J. Watts
Data Mining: Concepts and Techniques
Measures of Central Tendency
Univariate Statistics
Reasoning in Psychology Using Statistics
Central Tendency and Variability
Description of Data (Summary and Variability measures)
Summary descriptive statistics: means and standard deviations:
Basic Statistical Terms
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Summary descriptive statistics: means and standard deviations:
An introduction to an expansive and complex field
Descriptive Statistics Healey Chapters 3 and 4 (1e) or Ch. 3 (2/3e)
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Presentation transcript:

1 Week 1 Review of basic concepts in statistics handout available at Trevor Thompson

2 Review of following topics: Population vs. sample Population vs. sample Measurement scales Measurement scales Plotting data Plotting data Mean & Standard deviation Mean & Standard deviation Degrees of freedom Degrees of freedom Transforming data Transforming data Normal distribution Normal distribution - Howell (2002) Chap 1-3. Statistical Methods for Psychology

3 Population vs. sample Population - an entire collection of measurements Population - an entire collection of measurements (e.g. reaction times, IQ scores, height or even height of male Goldsmiths students) (e.g. reaction times, IQ scores, height or even height of male Goldsmiths students) Sample – smaller subset of observations taken from population Sample – smaller subset of observations taken from population sample should be drawn randomly to make inferences about population. Random assignment to groups improves validity sample should be drawn randomly to make inferences about population. Random assignment to groups improves validity

4 Population vs. sample In general: In general: population parameters =Greek letters population parameters =Greek letters sample statistics=English letters sample statistics=English letters -worth learning glossary of other symbols now to avoid later confusion (e.g. Σ=the sum of) PopulationSample mean μ (mu) X variance σ 2 (sigma) s2s2s2s2

5 Measurement scales Categorical or Nominal Categorical or Nominal e.g. male/female, or catholic/protestant/other e.g. male/female, or catholic/protestant/other Continuous Continuous Ordinal - e.g. private/sergeant/admiral Ordinal - e.g. private/sergeant/admiral Interval- e.g. temperature in celsius Interval- e.g. temperature in celsius Ratio - e.g. weight, height etc Ratio - e.g. weight, height etc

6 Plotting data Basic rule is to select plot which represents what you want to say in the clearest and simplest way Basic rule is to select plot which represents what you want to say in the clearest and simplest way Avoid chart junk (e.g. plotting in 3D where 2D would be clearer) Avoid chart junk (e.g. plotting in 3D where 2D would be clearer) Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts discussed in workshop Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts discussed in workshop

7 Summary statistics Two essential components of data are: Two essential components of data are: (i) central tendency of the data & (ii) spread of the data (e.g. standard deviation) (i) central tendency of the data & (ii) spread of the data (e.g. standard deviation) Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful

8 Measures of central tendency Mode Mode the most frequent observation: 1, 2, 2, 3, 4,5 the most frequent observation: 1, 2, 2, 3, 4,5 Median Median the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 (average of middle two numbers when even number of scores exist) relatively uninfluenced by outliers relatively uninfluenced by outliers Mean = Mean = Mode Mode the most frequent observation: 1, 2, 2, 3, 4,5 the most frequent observation: 1, 2, 2, 3, 4,5 Median Median the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 (average of middle two numbers when even number of scores exist) relatively uninfluenced by outliers relatively uninfluenced by outliers

9 Measures of dispersion Several ways to measure spread of data: Several ways to measure spread of data: Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation Variance – average of the squared deviations Variance – average of the squared deviations Variance for population of 3 scores (-10,0,10) is (200/3) Variance for population of 3 scores (-10,0,10) is (200/3) Standard deviation is simply the square root of the variance Standard deviation is simply the square root of the variance

10 Calculating sample variance Population variance ( 2 ) is the true variance of the population calculated by - this equation is used when we have all values in a population (unusual) However, the variance of a sample (S 2 ) tends to be smaller than the population from which it was drawn. So, we use this equation: The correction factor of N-1 increases the variance to be closer to the true population variance (in fact, the average of all possible sample variances exactly equals 2 )

11 Degrees of freedom Why is N-1 used to calculate sample variance? Why is N-1 used to calculate sample variance? When calculating sample variance, we calculate the sample mean thus making make the last number in the dataset redundant – i.e. we lose a degree of freedom (last no. is not free to vary) When calculating sample variance, we calculate the sample mean thus making make the last number in the dataset redundant – i.e. we lose a degree of freedom (last no. is not free to vary) e.g. M=10, sample data: 12, 9, 10, 11, 8 Calculating the sample mean (10) means that we have already (implicitly) included the last number in our calculations. If we (knew and) used the population mean rather than the sample mean this would not be the case so we could use N not N-1. Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search) Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search) Bottom line is whenever we have to estimate a statistic (e.g. mean) we lose a degree of freedom Bottom line is whenever we have to estimate a statistic (e.g. mean) we lose a degree of freedom

12 Transforming data One reason we might transform data is to convert from one scale to another One reason we might transform data is to convert from one scale to another e.g. feet into inches, centigrade into fahrenheit, raw IQ scores into standard IQ scores e.g. feet into inches, centigrade into fahrenheit, raw IQ scores into standard IQ scores Scale conversion can usually be achieved by simple linear transformation (multiplying/dividing by a constant and adding/subtracting a constant) Scale conversion can usually be achieved by simple linear transformation (multiplying/dividing by a constant and adding/subtracting a constant) X new = b*X old + c So to convert centigrade data into fahrenheit we would apply the following: So to convert centigrade data into fahrenheit we would apply the following:

13 Transforming data Z-transform (standardisation) is one common type of linear transform, which produces a new variable with M=0 & SD=1 Z-transform (standardisation) is one common type of linear transform, which produces a new variable with M=0 & SD=1 Z -scores= X Z -scores= X Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire) Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire) After standardisation these scales could also be added together (adding two quantities on different scales is obviously problematic) After standardisation these scales could also be added together (adding two quantities on different scales is obviously problematic)

14 Normal Distribution Many real-life variables (height, weight, IQ etc etc) are distributed like this Many real-life variables (height, weight, IQ etc etc) are distributed like this Mathematical equation mimics this normal (or Gaussian) distribution Mathematical equation mimics this normal (or Gaussian) distribution

15 Normal Distribution The mathematical normal distribution is useful as its known mathematical properties give us useful info about our real-life variable (assuming our real-life variable is normally distributed) The mathematical normal distribution is useful as its known mathematical properties give us useful info about our real-life variable (assuming our real-life variable is normally distributed) For example, 2 standard deviations above the mean represent the extreme 2.5% of scores (calculus equations used to derive this) For example, 2 standard deviations above the mean represent the extreme 2.5% of scores (calculus equations used to derive this) Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed) Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed)

16 Normal Distribution Normality is important assumption (though more about this next week). Violations of normality generally take two forms: Normality is important assumption (though more about this next week). Violations of normality generally take two forms: SKEWNESS SKEWNESS KURTOSIS KURTOSIS