Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.

Slides:



Advertisements
Similar presentations
Chapter 3 Properties of Random Variables
Advertisements

Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Environmental Data Analysis with MatLab
Sampling: Final and Initial Sample Size Determination
Environmental Data Analysis with MatLab Lecture 9: Fourier Series.
Environmental Data Analysis with MatLab
Sociology 690 – Data Analysis Simple Quantitative Data Analysis.
Environmental Data Analysis with MatLab Lecture 13: Filter Theory.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Lecture 3 Probability and Measurement Error, Part 2.
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 12: Power Spectral Density.
Lecture 2 Probability and Measurement Error, Part 1.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
QUANTITATIVE DATA ANALYSIS
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Lecture 3 Review of Linear Algebra Simple least-squares.
Evaluating Hypotheses
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Lecture 2 Probability and what it has to do with data analysis.
BCOR 1020 Business Statistics
Lecture 4 Probability and what it has to do with data analysis.
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Statistical Intervals Based on a Single Sample.
Lecture II-2: Probability Review
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Hydrologic Statistics
Environmental Data Analysis with MatLab Lecture 20: Coherence; Tapering and Spectral Analysis.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Numerical Descriptive Techniques
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Model Inference and Averaging
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Statistical Methods II&III: Confidence Intervals ChE 477 (UO Lab) Lecture 5 Larry Baxter, William Hecker, & Ron Terry Brigham Young University.
What does Statistics Mean? Descriptive statistics –Number of people –Trends in employment –Data Inferential statistics –Make an inference about a population.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Cell#1 Program Instructions. Don’t run. Used to load a Statistical Package Cell #3 Defines standard normal pdf and cdf functions Ignore spelling warning.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Lecture 2 Probability and what it has to do with data analysis.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Biostatistics Class 3 Probability Distributions 2/15/2000.
IENG 486: Statistical Quality & Process Control
Lecture 26: Environmental Data Analysis with MatLab 2nd Edition
Environmental Data Analysis with MatLab 2nd Edition
Sociology 690 – Data Analysis
Sociology 690 – Data Analysis
Environmental Data Analysis with MatLab
Presentation transcript:

Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error

Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectra Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

purpose of the lecture apply principles of probability theory to data analysis and especially to use it to quantify error

Error, an unavoidable aspect of measurement, is best understood using the ideas of probability.

random variable, d no fixed value until it is realized indeterminate d=1.04 indeterminate d=0.98

random variables have systematics tendency to takes on some values more often than others

example: d = number of deuterium atoms in methane C H HH H C D HH H C D DH H C D DH D C D DD D d =0 d =1 d =2 d =3 d =4

tendency or random variable to take on a given value, d, described by a probability, P(d) P(d) measured in percent, in range 0% to 100% or as a fraction in range 0 to 1

P d dP dP 010% 130% 240% 315% 45% P four different ways to visualize probabilities

probabilities must sum to 100% the probability that d is something is 100%

continuous variables can take fractional values 0 5 depth, d d=2.37

d d1d1 d2d2 p(d) area, A The area under the probability density function, p(d), quantifies the probability that the fish in between depths d 1 and d 2.

an integral is used to determine area, and thus probability probability that d is between d 1 and d 2

the probability that the fish is at some depth in the pond is 100% or unity probability that d is between its minimum and maximum bounds, d min and d max

How do these two p.d.f.’s differ? d p(d) d

Summarizing a probability density function typical value “center of the p.d.f.” amount of scatter around the typical value “width of the p.d.f.”

several possible choices of a “typical value”

d 15 p(d) mode d mode One choice of the ‘typical value’ is the mode or maximum likelihood point, d mode. It is the d of the peak of the p.d.f.

0 10 d 15 p(d) median d median area=50% area= 50% Another choice of the ‘typical value’ is the median, d median. It is the d that divides the p.d.f. into two pieces, each with 50% of the total area.

d 15 p(d) mean d mean A third choice of the ‘typical value’ is the mean or expected value, d mean. It is a generalization of the usual definition of the mean of a list of numbers.

≈ s d dsds ≈ s NsNs N data histogram NsNs dsds p ≈ s P(d s ) probability distribution step 1: usual formula for mean step 2: replace data with its histogram step 3: replace histogram with probability distribution.

If the data are continuous, use analogous formula containing an integral: ≈ s p(d s )

MabLab scripts for mode, median and mean [pmax, i] = max(p); themode = d(i); pc = Dd*cumsum(p); for i=[1:length(p)] if( pc(i) > 0.5 ) themedian = d(i); break; end themean = Dd*sum(d.*p);

several possible choices of methods to quantify width

d d typical p(d) d typical – d 50 /2 d typical + d 50 /2 area, A = 50% One possible measure of with this the length of the d -axis over which 50% of the area lies. This measure is seldom used.

A different approach to quantifying the width of p(d) … This function grows away from the typical value: q(d) = (d-d typical ) 2 so the function q(d)p(d) is small if most of the area is near d typical, that is, a narrow p(d) large if most of the area is far from d typical, that is, a wide p(d) so quantify width as the area under q(d)p(d)

variance width is actually square root of variance, that is, σ d. use mean for d typical

d p(d) q(d)q(d)p(d) d d  d +  d max d min visualization of a variance calculation now compute the area under this function

MabLab scripts for mean and variance dbar = Dd*sum(d.*p); q = (d-dbar).^2; sigma2 = Dd*sum(q.*p); sigma = sqrt(sigma2);

two important probability density distributions: uniform Normal

uniform p.d.f. d d min d max p(d) 1/(d max- d min ) probability is the same everywhere in the range of possible values box-shaped function

Large probability near the mean, d. Variance is σ 2. d 2σ2σ Normal p.d.f. bell-shaped function

d d = d 0  = exemplary Normal p.d.f.’s same variance different means same means different variance

probability between d±nσ Normal p.d.f.

functions of random variables data with measurement error data analysis process inferences with uncertainty

simple example data with measurement error data analysis process inferences with uncertainty one datum, d uniform p.d.f. 0<d<1 m = d 2 one model parameter, m

functions of random variables given p(d) with m=d 2 what is p(m) ?

use chain rule and definition of probabiltiy to deduce relationship between p(d) and p(m) = absolute value added to handle case where direction of integration reverses, that is m 2 <m 1

with m=d 2 and d=m 1/2 intervals: d=0 corresponds to m=0 d=1 corresponds to m=1 p(d)=1 so m[d(m)]=1 p.d.f.: p(d) = 1 so p[d(m)]=1 derivative: ∂d/ ∂ m = (1/2)m -1/2 so: p(m) = (1/2) m -1/2 on interval 0<m<1

d 0 1 m 0 1 p(d)p(m) note that p(d) is constant while p(m) is concentrated near m=0

mean and variance of linear functions of random variables given that p(d) has mean, d, and variance, σ d 2 with m=cd what is the mean, m, and variance, σ m 2, of p(m) ?

the result does not require knowledge of p(d) formula for mean the mean of m is c times the mean of d

formula for variance the variance of m is c 2 times the variance of d

What’s Missing ? So far, we only have the tools to study a single inference made from a single datum. That’s not realistic. In the next lecture, we will develop the tools to handle many inferences drawn from many data.