Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.

Slides:



Advertisements
Similar presentations
Analysis of Variance Compares means to determine if the population distributions are not similar Uses means and confidence intervals much like a t-test.
Advertisements

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 7 Introduction to Sampling Distributions
Topics: Inferential Statistics
Statistics: Data Analysis and Presentation Fr Clinic II.
Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals.
Measures of Variability
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Wednesday, October 3 Variability. nominal ordinal interval.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
VARIABILITY. PREVIEW PREVIEW Figure 4.1 the statistical mode for defining abnormal behavior. The distribution of behavior scores for the entire population.
Central Tendency and Variability
Measures of Variability: Range, Variance, and Standard Deviation
Chapter 12 Section 1 Inference for Linear Regression.
Chapter 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY.
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
Inference for regression - Simple linear regression
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
A Sampling Distribution
Population: a data set representing the entire entity of interest - What is a population? Sample: a data set representing a portion of a population Population.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Chapter 4 Variability. Variability In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In.
Variability. Statistics means never having to say you're certain. Statistics - Chapter 42.
The normal distribution Binomial distribution is discrete events, (infected, not infected) The normal distribution is a probability density function for.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Analysis of Variance (ANOVA) Can compare the effects of different treatments Can make population level inferences based on sample population.
Descriptive Statistics: Presenting and Describing Data.
Unit 3 Lesson 2 (4.2) Numerical Methods for Describing Data
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Statistical Analysis IB Topic 1. Why study statistics?  Scientists use the scientific method when designing experiments  Observations and experiments.
Statistical analysis. Types of Analysis Mean Range Standard Deviation Error Bars.
1 Psych 5500/6500 Measures of Variability Fall, 2008.
Descriptive Statistics Used to describe a data set –Mean, minimum, maximum Usually include information on data variability (error) –Standard deviation.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
PCB 3043L - General Ecology Data Analysis.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Measures of Variation Range Standard Deviation Variance.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Chapter 4: Variability. Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Descriptive Statistics Used in Biology. It is rarely practical for scientists to measure every event or individual in a population. Instead, they typically.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Variability. The differences between individuals in a population Measured by calculations such as Standard Error, Confidence Interval and Sampling Error.
Variability.
GOVT 201: Statistics for Political Science
Statistical analysis.
AP Biology Intro to Statistics
Measures of dispersion
Objectives The student will be able to:
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
Distribution of the Sample Means
AP Biology Intro to Statistics
Econ 3790: Business and Economics Statistics
Summary (Week 1) Categorical vs. Quantitative Variables
Data Literacy Graphing and Statisitics
Presentation transcript:

Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe data sets? –Average, range, variance, standard deviation, coefficient of variation, standard error

Length NumberPondLake Average= Table 1. Total length (cm) and average length of spotted gar collected from a local farm pond and from a local lake.

Are the two samples equal? –What about 47.2 and 47.3? If we sampled all of the gar in each water body, would the average be different? –How different? Would the lake fish average still be larger? Length NumberPondLake Average=

Range Simply the distance between the smallest and largest value Length (cm) Figure 1. Range of spotted gar length collected from a pond and a lake. The dashed line represents the overlap in range.

Length (cm) Does the difference in average length (47.2 vs. 68.2) seem to be much as large as before?

Variance An index of variability used to describe the dispersion among the measures of a population sample. Need the distance between each sample point and the sample mean.

Figure 2. Mean length (cm) of each spotted gar collected from the pond. The horizontal solid line represents the sample mean length.

We can easily put this new data set into a spreadsheet table. By adding up all of the differences, we can get a number that is a reflection of how scattered the data points are. –Closer to the mean each number is, the smaller the total difference. After adding up all of the differences, we get zero. –This is true of all calculations like this What can we do to get rid of the negative values? #LengthMeanDifference Sum =0

Sum of Squares #LengthMeanDifferenceDifference Sum = Now is a number we can use! This value is called the SUM OF SQUARES.

Back to Variance Sum of Squares (SOS) will continue to increase as we increase our sample size. –A sample of 10 replicates that are highly variable would have a higher SOS than a sample of 100 replicates that are not highly variable. To account for sample size, we need to divide SOS by the number of samples minus one (n-1). –We’ll get to the reason (n-1) instead of n later

Calculate Variance (σ 2 ) σ 2 = S 2 =  (X i – X m ) 2 / (n – 1) SOS Degrees of Freedom Variance for Pond = S 2 = / 9 =

More on Variance Variance tends to increase as the sample mean increases –For our sample, the largest difference between any point and the mean was 30.8 cm. Imagine measuring a plot of cypress trees. How large of a difference would you expect (if measured in cm)? The variance for the lake sample =

Standard Deviation Calculated as the square root of the variance. –Variance is not a linear distance (we had to square it). Think about the difference in shape of a meter stick versus a square meter. By taking the square root of the variance, we return our index of variability to something that can be placed on a number line.

Calculate SD For our gar sample, the Variance was The square root of = –Reported with the mean as: 47.2 ± (mean ± SD). Standard Deviation is often abbreviated as σ (sigma) or as SD. SD is a unit of measurement that describes the scatter of our data set. –Also increases with the mean

Standard Error Calculated as: SE = σ / √(n) –Indicates how close we are to estimating the true population mean –For our pond ex: SE = / √10 = –Reported with the mean as 47.2 ± (mean ± SE). –Based on the formula, the SE decreases as sample size increases. Why is this not a mathematical artifact, but a true reflection of the population we are studying?

Normal Distribution Most characteristics follow a normal distribution –For example: height, length, speed, etc. One of the assumptions of the ANOVA test is that the sample data is ‘normally distributed.’

Sample Distribution Approaches Normal Distribution With Sample Size

Sample Size The number of individuals within a population you measure/observe. –Usually impossible to measure the entire population As sample size increases, we get closer to the true population mean. –Remember, when we take a sample we assume it is representative of the population.

Effect of Increasing Sample Size I measured the length of 100 gar Calculated SD and SE for the first 10, then included the next additional 10, and so on until all 100 individuals were included.

Sample Size

SD = Square root of the variance (Var =  (X i – X m ) / (n – 1))

Sample Size SE = SD / √(n)

MEAN ± CONFIDENCE INTERVAL When a population is sampled, a mean value is determined and serves as the point-estimate for that population. However, we cannot expect our estimate to be the exact mean value for the population. Instead of relying on a single point-estimate, we estimate a range of values, centered around the point-estimate, that probably includes the true population mean. That range of values is called the confidence interval.

Confidence Interval Confidence Interval: consists of two numbers (high and low) computed from a sample that identifies the range for an interval estimate of a parameter. There is a 5% chance (95% confidence interval) that our interval does not include the true population mean. y ± (t  /0.05 )[(  ) / (  n)] ±    (use 1.96) (SE)