Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009.

Slides:



Advertisements
Similar presentations
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Advertisements

SUMMARIZING DATA: Measures of variation Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value.
BHS Methods in Behavioral Sciences I April 18, 2003 Chapter 4 (Ray) – Descriptive Statistics.
The Basics of Regression continued
Measures of Variability or Dispersion
Biostatistics Unit 2 Descriptive Biostatistics 1.
Measures of Variability
Distributions When comparing two groups of people or things, we can almost never rely on a single comparison Example: Are men taller than women?
2.3. Measures of Dispersion (Variation):
Standard Error for AP Biology
Business Statistics - QBM117 Statistical inference for regression.
Quantitative Genetics
Probability and Statistics in Engineering Philip Bedient, Ph.D.
Central Tendency and Variability
Measures of Variability: Range, Variance, and Standard Deviation
Chapter 4 SUMMARIZING SCORES WITH MEASURES OF VARIABILITY.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 4 Summarizing Data.
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Answering questions about life with statistics ! The results of many investigations in biology are collected as numbers known as _____________________.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
SECTION 3.2 MEASURES OF SPREAD Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Basic Statistics Measures of Variability The Range Deviation Score The Standard Deviation The Variance.
 IWBAT summarize data, using measures of central tendency, such as the mean, median, mode, and midrange.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Statistics 1 Measures of central tendency and measures of spread.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Chapter 8 Quantitative Data Analysis. Meaningful Information Quantitative Analysis Quantitative analysis Quantitative analysis is a scientific approach.
Skewness & Kurtosis: Reference
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
13-1 Copyright  2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 13 Measures.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Measures of Dispersion
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
PCB 3043L - General Ecology Data Analysis.
Measures of Variation Range Standard Deviation Variance.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
1.  In the words of Bowley “Dispersion is the measure of the variation of the items” According to Conar “Dispersion is a measure of the extent to which.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 2 The Mean, Variance, Standard.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Session 8 MEASURES Of DISPERSION. Learning Objectives  Why do we study dispersion?  Measures of Dispersion Range Standard Deviation Variance.
CHAPTER 2: Basic Summary Statistics
From the population to the sample The sampling distribution FETP India.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Descriptive Statistics Used in Biology. It is rarely practical for scientists to measure every event or individual in a population. Instead, they typically.
Chapter 6: Random Errors in Chemical Analysis. 6A The nature of random errors Random, or indeterminate, errors can never be totally eliminated and are.
Statistical Analysis IB Topic 1. IB assessment statements:  By the end of this topic, I can …: 1. State that error bars are a graphical representation.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Different Types of Data
Standard Error for AP Biology
Descriptive Statistics
PCB 3043L - General Ecology Data Analysis.
Basic Statistics Measures of Variability.
Standard Error for AP Biology
Central Tendency and Variability
Summary descriptive statistics: means and standard deviations:
Stats for AP Biology SLIDE SHOWS MODIFIED FROM:
Construction Engineering 221
How do we categorize and make sense of data?
Summary descriptive statistics: means and standard deviations:
Statistical Analysis IB Topic 1.
Standard Error for AP Biology
CHAPTER 2: Basic Summary Statistics
2.3. Measures of Dispersion (Variation):
Presentation transcript:

Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009

Describing Scientific Data 32.6 cm Length of coho salmon returning to North Creek October 8, 2003 * * pretend data These data need to be included in a report to Pacific Salmon Commission on salmon return in streams of the Lake Washington watershed So, what now? Do we just leave these data as they appear here?

Describing Scientific Data Length of coho salmon returning to North Creek October 8, 2003 We can create a

Describing Scientific Data Length of coho salmon returning to North Creek October 8, 2003 A common mistake This is NOT a frequency distribution

Describing Scientific Data Length of coho salmon returning to North Creek October 8, 2003 * * pretend data These data need to be included in a report to Pacific Salmon Commission on salmon return in streams of the Lake Washington watershed Next steps? What information does a frequency distribution reveal? This is also known as a “Histogram”

Describing Scientific Data Length of coho salmon returning to North Creek October 8, 2003

Describing Scientific Data Mean values can hide important information Valiela (2001) Populations with the same mean value but different frequency distributions Non-normal distribution Normal distribution of data

Describing Scientific Data Normal distributions are desirable and required for many statistical methods. Even simple comparisons of mean values is NOT good if distributions underlying those means are not “normal”. Thus, data with non-normal distributions are usually mathematically transformed to create a normal distribution. Problems with non-normal distributions

Describing Scientific Data Mean values can hide important information Populations with the same mean value but different frequency distributions

Describing Scientific Data Expressing Variation in a Set of Numbers 32.6 cm Range: difference between largest and smallest sample Range = 33.5 Range = 14.1 – 47.6 (or sometimes expressed as both smallest and largest values)

Describing Scientific Data Expressing Variation in a Set of Numbers 32.6 cm Variance & Standard Deviation: single-number expressions of the degree of spread in the data Variance x – x For each value, calculate its deviation from the mean

Describing Scientific Data Expressing Variation in a Set of Numbers 32.6 cm Variance & Standard Deviation: single-number expressions of the degree of spread in the data Variance ( x – x ) 2 Square each deviation to get absolute value of each deviation

Describing Scientific Data Expressing Variation in a Set of Numbers 32.6 cm Variance & Standard Deviation: single-number expressions of the degree of spread in the data Variance ( x – x ) 2 n - 1 Σ Add up all the deviations and divide by the number of values (to get average deviation from the mean)

Describing Scientific Data Expressing Variation in a Set of Numbers Variance : An expression of the mean amount of deviation of the sample points from the mean value Example of 2 data points: 20.0 & Calculate the mean: ( )/2 = Calculate the deviations from the mean: 20.0 – 24.3 = – 24.3 = Square the deviations to remove the signs (negative): (-4.3) 2 = 18.5 (4.3) 2 = Sum the squared deviations: = Divide the sum of squares by the number of samples (minus one*) to standardize the variation per sample: 37.0 / (2-1) = 37.0 * Differs from textbook Variance = 37.0

Describing Scientific Data Expressing Variation in a Set of Numbers Standard Deviation (SD): Also an expression of the mean amount of deviation of the sample points from the mean value Using the square root of the variance places the expression of variation back into the same units as the original measured values (37.0) 0.5 = 6.1 SD = 6.1 SD = √ variance

Describing Scientific Data Length of coho salmon returning to North Creek October 8, 2003 SD captures the degree of spread in the data 32.8 ± 8.9 Mean Standard Deviation Mean Conventional expression Mean ± 1 SD

CV = (variance / x) * 100 Describing Scientific Data Expressing Variation in a Set of Numbers Coefficient of Variation (CV): An expression of variation present relative to the size of the mean value Particularly useful when comparing the variation in means that differ considerably in magnitude or comparing the degree of variation of measurements with different units Sometimes presented as SD / x

An example of how the CV is useful Describing Scientific Data Coefficient of Variation (CV): An expression of variation present relative to the mean value Soil moisture in two sites SiteMeanSD CV

Describing Scientific Data The Bottom Line on expressing variation ExpressionWhat it isWhen to use it Range Spread of data When extreme absolute values are of importance Variance Average deviation of samples from the mean Often an intermediate calculation – not usually presented Standard Deviation Average deviation of samples from the mean on scale of original values Simple & standard presentation of variation; okay when mean values for comparison are similar in magnitude Coefficient of Variation Average deviation of samples from the mean on scale of original values relative to size of the mean Use to compare amount of variation among samples whose mean values differ in magnitude Bottom Line: use the most appropriate expression; not just what everyone else uses!

Does variation only obscure the true values we seek?

The importance of knowing the degree of variation Why are measures of variation important?

The importance of knowing the degree of variation Spawning site density (# / meter) North Creek0.40 ± 5.2 a Bear Creek0.42 ± 0.2 a Density of sockeye salmon spawning sites along two area creeks (mean ± SD) What can you conclude from the means? What can you conclude from the SDs?

The importance of knowing the degree of variation Annual temperature (°F) Darrington (western WA) 49.0 ± 11.2 a Leavenworth (eastern WA) 48.4 ± 24.7 a Mean annual air temperature at sites in eastern and western WA at 1,000 feet elevation (mean ± SD) What can you conclude from the means? What can you conclude from the SDs?

Look before you leap: The value of examining raw data

Types of variables? Wetland # Plant Species Mean WLF (cm) How might we describe these data? Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?

Puget Sound Wetlands (Cooke & Azous 2001) Water Level Fluctuation # Plant species Scatter plots help you to assess the nature of a relationship between variables The value of examining raw data Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?

Puget Sound Wetlands Water Level Fluctuation # Plant species What might you do to describe this relationship? The value of examining raw data Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?

Puget Sound Wetlands Water Level Fluctuation # Plant species So – does this capture the nature of this relationship? Group work time to devise strategy for expressing relationship The value of examining raw data Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?

Puget Sound Wetlands Water Level Fluctuation # Plant species Describing Scientific Data

BOTTOM LINE Puget Sound Wetlands Water Level Fluctuation # Plant species Describing Scientific Data

Sampling the World

Rarely can we measure everything – Instead we usually SAMPLE the world The “entire world” is called the “population” What we actually measure is called the “sample”

QUESTION Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples How would you study this?

Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples 1.

Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples 2.

Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples 3. A major research question: How many samples do you need to accurately describe (1) each population and (2) their possible difference? Depends on sample variability (standard error – see textbook pages ) and degree of difference between populations Sample size calculations before study are important (statistics course for more detail)

Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples 4. Samples are taken that are representative of the population

Samples are taken representative of the population What criteria are used to locate the samples? Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples

Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples Samples are taken representative of the population What criteria are used to locate the samples?

Samples are taken representative of the population What criteria are used to locate the samples? Randomization Commonly, but not in all studies (statistics course) Bottom Line: sampling scheme needs to be developed INTENTIONALLY. It must match the question & situation, not what is “usually done”. Stratification Accounts for uncontrolled variables (e.g., elevation, aspect) Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades? Populations & Samples

Describing & Taking Data Conventional WisdomBest Practices Examine the data Compare averagesLook at raw data as well as summaries Summarize the data Take averagesFrequency distributions, Means, etc. Describe variation Use Standard DeviationUse measure appropriate to need Use variation data As an adjunct to means (testing for differences) Use for understanding system as well as for statistical tests Describing relationships Fit a line to dataUse approach appropriate to need; examine raw data Creating a sampling scheme Take random samplesUse approach appropriate to need SOME BOTTOM LINES FOR THE LAST 2 DAYS