STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.

Slides:



Advertisements
Similar presentations
Class Session #2 Numerically Summarizing Data
Advertisements

Sampling: Final and Initial Sample Size Determination
Section #1 October 5 th Research & Variables 2.Frequency Distributions 3.Graphs 4.Percentiles 5.Central Tendency 6.Variability.
Sociology 690 – Data Analysis Simple Quantitative Data Analysis.
Lecture 2 Describing Data II ©. Summarizing and Describing Data Frequency distribution and the shape of the distribution Frequency distribution and the.
Math 20: Foundations FM20.6 Demonstrate an understanding of normal distribution, including standard deviation and z-scores. FM20.7 Demonstrate understanding.
QUANTITATIVE DATA ANALYSIS
Evaluating Hypotheses
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Central Tendency and Variability
MEASURES OF CENTRAL TENDENCY & DISPERSION Research Methods.
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
Today: Central Tendency & Dispersion
STAT 13 -Lecture 2 Lecture 2 Standardization, Normal distribution, Stem-leaf, histogram Standardization is a re-scaling technique, useful for conveying.
Measurement Tools for Science Observation Hypothesis generation Hypothesis testing.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
POPULATION DYNAMICS Required background knowledge:
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
Overview Summarizing Data – Central Tendency - revisited Summarizing Data – Central Tendency - revisited –Mean, Median, Mode Deviation scores Deviation.
Statistics Chapter 9. Statistics Statistics, the collection, tabulation, analysis, interpretation, and presentation of numerical data, provide a viable.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Lecture 3 A Brief Review of Some Important Statistical Concepts.
Descriptive Statistics II: By the end of this class you should be able to: describe the meaning of and calculate the mean and standard deviation of a sample.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
Psyc 235: Introduction to Statistics Lecture Format New Content/Conceptual Info Questions & Work through problems.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Chapter 21 Basic Statistics.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS (contd)
The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure.
Determination of Sample Size: A Review of Statistical Theory
Central Tendency & Dispersion
Confidence Interval Estimation For statistical inference in decision making:
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
The field of statistics deals with the collection,
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics Research Writing Aiden Yeh, PhD.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Lectures' Notes STAT –324 Probability Probability and Statistics for Engineers First Semester 1431/1432 and 5735 Teacher: Dr. Abdel-Hamid El-Zaid Department.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
MR. MARK ANTHONY GARCIA, M.S. MATHEMATICS DEPARTMENT DE LA SALLE UNIVERSITY.
MDFP Mathematics and Statistics 1. Univariate Data – Today’s Class 1.STATISTICS 2.Univariate (One Variable) Data 1.Definition 2.Mean, Median, Mode, Range.
7 th Grade Math Vocabulary Word, Definition, Model Emery Unit 4.
An Introduction to Statistics
INTRODUCTION TO STATISTICS
Descriptive Statistics (Part 2)
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
MEASURES OF CENTRAL TENDENCY
Theme 4 Describing Variables Numerically
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Describing Quantitative Data with Numbers
Introduction to Summary Statistics
Introduction to Summary Statistics
Ticket in the Door GA Milestone Practice Test
Introduction to Summary Statistics
Presentation transcript:

STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4

STAT03 - Descriptive statistics (cont.) - variability 2 Introduction We previously discussed measures of central tendency (location) of a data sample (collection) in descriptive statistics – arithmetic mean, median and mode; and also the range as a measure of statistical dispersion (variability) Here we continue with other important measures of variability – namely variance and standard deviations We will also get acquainted with some parameters leading to their definitions We will look at how we perform these operations in R, and a bit more about plotting as well

STAT03 - Descriptive statistics (cont.) - variability 3 Variability and deviations A measure of variability is perhaps the most important quantity in statistical analysis. –The greater the variability in the data, the greater will be our uncertainty in the values of the parameters estimated from the data, and –the lower will be our ability to distinguish between competing hypotheses about the data. Measures of variability – a single number describing the variability of data – eventually we look for variance and standard deviation

STAT03 - Descriptive statistics (cont.) - variability 4 Variability and deviations Deviations – distances of the individual values in the data sample, from the mean value Plotting – using lines in a for loop

STAT03 - Descriptive statistics (cont.) - variability 5 Variability and deviations The longer the lines – the more variable the data Could we use the sum of the deviations as a measure of variability? No – because of the definition of arithmetic mean, it is the line positioned such that the sum of the deviations cancels out. Quick proof

STAT03 - Descriptive statistics (cont.) - variability 6 Absolute deviations The minus signs of the deviations could be seen as the reason for cancellation of the sum We could try using the absolute deviations Their sum will be obviously different from 0. However, hard to compute – need an easier way

STAT03 - Descriptive statistics (cont.) - variability 7 Squared deviations and sum of squares Squaring the deviations is computationally less intensive Their sum will, again, be obviously different from 0. It is the well known sum of squares: More properly – it is the sum of squared deviations An unscaled, or unadjusted measure of dispersion

STAT03 - Descriptive statistics (cont.) - variability 8 Scaling the sum of squares – Mean Squared Deviation Now, what would happen to the sum of squares if we added an [additional] data point? –It would get bigger, of course. So usually, the sum of squares will grow with the size of the data collection. –That is a manifestation of the fact that it is unscaled. –Scaling (also known as normalizing) means adjusting the sum of squares so that it does not grow as the size of the data collection grows. We don't want our measure of variability to depend on sample size in this way, so the obvious solution is to divide by the number of samples, to get the mean squared deviation The MSD can be taken to be the wanted variance parameter, but…

STAT03 - Descriptive statistics (cont.) - variability 9 Degrees of freedom Suppose we had a sample of five numbers and their average was 4, What was the sum of the five numbers? It must have been 20, otherwise the mean would not have been 4. So now let us think about each of the five numbers in turn: We are going to put a number in each of the five boxes. If we allow that the numbers could be positive or negative real numbers, we ask how many values could the first number take.

STAT03 - Descriptive statistics (cont.) - variability 10 Degrees of freedom If we allow that the numbers could be positive or negative real numbers, we ask how many values could the first number take. You will realize it could take any value. Suppose it was a 2. 2

STAT03 - Descriptive statistics (cont.) - variability 11 Degrees of freedom How many values could the next number take? It could be anything. Say it was a

STAT03 - Descriptive statistics (cont.) - variability 12 Degrees of freedom And the third number could be anything. Suppose it was a

STAT03 - Descriptive statistics (cont.) - variability 13 Degrees of freedom The fourth number could be anything at all. Say it was

STAT03 - Descriptive statistics (cont.) - variability 14 Degrees of freedom Now, how many values could the last number take? Just one - it has to be another 7 because the numbers have to add up to 20 because the mean of the five numbers is

STAT03 - Descriptive statistics (cont.) - variability 15 Degrees of freedom We have total freedom in selecting the first number - and the second, third and fourth numbers. But we have no choice at all in selecting the fifth number. We have four degrees of freedom when we have five numbers (and their mean). In general we have (n-1) degrees if freedom if we estimated the mean from a sample of size n. More generally still, we can propose a formal definition of degrees of freedom: degrees of freedom is the sample size, N, minus the number of parameters, p, estimated from the data

STAT03 - Descriptive statistics (cont.) - variability 16 Scaling the sum of squares – variance The mean is a parameter estimated from the data itself – hence we lose one degree of freedom Thus we finally arrive at a definition for variance – sum of squares divided by the degrees of freedom Only difference between MSD and variance – division with N or N-1, respectively

STAT03 - Descriptive statistics (cont.) - variability 17 Standard deviation Variance has a unit of measure which is squared (cm 2 ) in relation to the original units (cm) Therefore, another measure is used – standard deviation – measured in same units as the data

STAT03 - Descriptive statistics (cont.) - variability 18 Sample and population parameters Usually you are interested in drawing conclusions about the population from which your (random) sample of data is drawn. It is very important to keep in mind the difference between the descriptive statistics that characterise your sample, and the corresponding parameters that characterise the population from which your sample is drawn. Population (finite, infinite) “true” parameters Sample (finite) Estimates of population parameters mean variance standard deviation Ex. All raisin boxes ever produced by the company/factory Ex. The particular data collection for only 17 particular raisin boxes Needs (probability) distributions

STAT03 - Descriptive statistics (cont.) - variability 19 Geometric interpretations - quantity graph Standard deviation – same units as the quantity

STAT03 - Descriptive statistics (cont.) - variability 20 Geometric interpretations - quantity graph Variance - area

STAT03 - Descriptive statistics (cont.) - variability 21 Geometric interpretations - quantity graph Variance - area

STAT03 - Descriptive statistics (cont.) - variability 22 Geometric interpretation - histogram (frequency count) More commonly – geometric interpretation on a histogram. Makes it easier to see the spread If no deviations – standard deviation is 0 – the whole histogram collapses to a single peak

STAT03 - Descriptive statistics (cont.) - variability 23 Review Arithmetic mean Median Mode Range Variance Standard deviation Measures of Central tendency (location) Measure of Statistical variability (dispersion - spread) Descriptive statistics

STAT03 - Descriptive statistics (cont.) - variability 24 Exercise for mini-module 3 – STAT03 Exercise Use the following data: The data in the following table come from three garden markets. The data show the ozone concentrations in parts per hundre million (pphm) on ten consecutive summer days 1. Import the data into R, and for each garden, find the the central tendency parameters of the ozone concentrations. 2. Using R, for each garden, find dispersion parameters - the sample variance and sample standard deviation. 3. Using R, plot the relative frequency histogram for each of the gardens. Mark graphically the arithmetic mean on each graph and the one standard deviation range. Delivery: Deliver the collected data (in tabular format), the found statistics and the requested graphs for the assigned years in an electronic document. You are welcome to include R code as well.