Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita.

Slides:



Advertisements
Similar presentations
Point Estimation Notes of STAT 6205 by Dr. Fan.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: introduction to maximum likelihood estimation Original citation: Dougherty,
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistical Estimation and Sampling Distributions
Sampling: Final and Initial Sample Size Determination
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Copyright © Cengage Learning. All rights reserved.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Maximum likelihood (ML) and likelihood ratio (LR) test
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Maximum likelihood (ML)
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Statistical inference
Evaluating Hypotheses
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
5. Estimation 5.3 Estimation of the mean K. Desch – Statistical methods of data analysis SS10 Is an efficient estimator for μ ?  depends on the distribution.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Inferences About Process Quality
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Maximum likelihood (ML)
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Likelihood Methods in Ecology June 2 nd – 13 th, 2008 New York, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Charles Yackulic.
Estimation Basic Concepts & Estimation of Proportions
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Statistical Decision Theory
The Triangle of Statistical Inference: Likelihoood
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Chap 6-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 6 Introduction to Sampling.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Chapter 7 Point Estimation
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
CLASS: B.Sc.II PAPER-I ELEMENTRY INFERENCE. TESTING OF HYPOTHESIS.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Likelihood Methods in Ecology
STATISTICS POINT ESTIMATION
12. Principles of Parameter Estimation
Chapter 4. Inference about Process Quality
Likelihood Methods in Ecology
Parameter Estimation 主講人:虞台文.
Maximum Likelihood Estimation
Introduction to Probability and Statistics
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Applied Statistics and Probability for Engineers
Presentation transcript:

Likelihood Methods in Ecology November 16 th – 20 th, 2009 Millbrook, NY Instructors: Charles Canham and María Uriarte Teaching Assistant Liza Comita

Daily Schedule l Morning - 8:30 – 9:20 Lecture - 9:20 – 10:10Case Study or Discussion - 10:30 – 12:00Lab l Lunch 12:00 – 1:30 (in this room) l Afternoon - 1:30 – 2:20Lecture - 2:20 – 3:10 Lab - 3:30 – 5:00Lab

Course Outline Statistical Inference using Likelihood l Principles and practice of maximum likelihood estimation l Know your data – choosing appropriate likelihood functions l Formulate statistical models as alternate hypotheses l Find the ML estimates of the parameters of your models l Compare alternate models and choose the most parsimonious l Evaluate individual models l Advanced topics Likelihood is much more than a statistical method... (it can completely change the way you ask and answer questions…)

Lecture 1 An Introduction to Likelihood Estimation l Probability and probability density functions l Maximum likelihood estimates (versus traditional “method of moment” estimates) l Statistical inference l Classical “frequentist” statistics : Limitations and mental gyrations... l The “likelihood” alternative: Basic principles and definitions l Model comparison as a generalization of hypothesis testing

A simple definition of probability for discrete events... “...the ratio of the number of events of type A to the total number of all possible events (outcomes)...” The enumeration of all possible outcomes is called the sample space (S). If there are n possible outcomes in a sample space, S, and m of those are favorable for event A, then the probability of event, A is given as P{A} = m/n

Probability defined more generally... l Consider an outcome X from some process that has a set of possible outcomes S: - If X and S are discrete, then P{X} = X/S - If X is continuous, then the probability has to be defined in the limit: Where g(x) is a probability density function (PDF)

The Normal Probability Density Function (PDF)  = mean   = variance Properties of a PDF: (1) 0 < g(x) < 1 (2) ∫ g(x) = 1

Common PDFs... l For continuous data: - Normal - Lognormal - Gamma l For discrete data: - Poisson - Binomial - Multinomial - Negative Binomial See McLaughlin (1993) “A compendium of common probability distributions” in the reading list

Why are PDFs important? Answer: because they are used to calculate likelihood… (And in that case, they are called “likelihood functions”)

Statistical “Estimators” A statistical estimator is a function applied to a sample of data used to estimate an unknown population parameter (and an “estimate” is just the result of applying an “estimator” to a sample)

Properties of Estimators l Some desirable properties of “point estimators” (functions to estimate a fixed parameter) - Bias: if the average error is zero, the estimate is unbiased - Efficiency: an estimate with the minimum variance is the most efficient (note: the most efficient estimator is often biased) - Consistency: As sample size increases, the probability of the estimate being close to the parameter increases - Asymptotically normal: a consistent estimator whose distribution around the true parameter θ approaches a normal distribution with standard deviation shrinking in proportion to as the sample size n grows

Maximum likelihood (ML) estimates versus Method of moment (MOM) estimates Bottom line: MOM was born in the time before computers, and was OK, ML needs computing power, but has more desirable properties…

Doing it MOM’s way: Central Moments

What’s wrong with MOM’s way? l Nothing, if all you are interested in is calculating properties of your sample… l But MOM’s formulas are generally not the best way 1 to infer estimates of the statistical properties of the population from which the sample was drawn… For example: Population variance (because the second central moment is a biased underestimate of the population variance) 1 … in the formal terms of bias, efficiency, consistency, and asymptotic normality

The Maximum Likelihood alternative… Going back to PDF’s: in plain language, a PDF allows you to calculate the probability that an observation will take on a value (x), given the underlying (true?) parameters of the population

But there’s a problem… l The PDF defines the probability of observing an outcome (x), given that you already know the true population parameter (θ) l But we want to generate an estimate of θ, given our data (x) l And, unfortunately, the two are not identical:

Fisher and the concept of “Likelihood”... In plain English: “The likelihood (L) of the parameter estimates (θ), given a sample (x) is proportional to the probability of observing the data, given the parameters...” {and this probability is something we can calculate, using the appropriate underlying probability model (i.e. a PDF)} The “Likelihood Principle”

R.A. Fisher ( ) b+lik.htm “Likelihood and Probability in R. A. Fisher’s Statistical Methods for Research Workers” (John Aldrich) A good summary of the evolution of Fisher’s ideas on probability, likelihood, and inference… Contains links to PDFs of Fisher’s early papers… A second page shows the evolution of his ideas through changes in successive editions of Fisher’s books… Age 22

Calculating Likelihood and Log-Likelihood for Datasets More generally, for i = 1..n independent observations, and a vector X of observations (x i ): But, logarithms are easier to work with, so... whereis the appropriate PDF From basic probability theory: If two events (A and B) are independent, then P(A,B) = P(A)P(B)

Likelihood “Surfaces” The variation in likelihood for any given set of parameter values defines a likelihood “surface”... For a model with just 1 parameter, the surface is simply a curve: (aka a “likelihood profile”)

“Support” and “Support Limits” Log-likelihood = “Support” (Edwards 1992)

A (somewhat trivial) example l MOM vs ML estimates of the probability of survival for a population: - Data: a quadrat in which 16 of 20 seedlings survived during a census interval. (Note that in this case, the quadrat is the unit of observation…, so sample size = 1) x <- seq(0,1,0.005) y <- dbinom(16,20,x) plot(x,y) x[which.max(y)] i.e. Given N=20, x = 16, what is p?

A more realistic example # Create some data (5 quadrats) N <- c(11,14,8,22,50) x <- c(8,7,5,17,35) # Calculate the log-likelihood for each # probability of survival p <- seq(0,1,0.005) log_likelihood <- rep(0,length(p)) for (i in 1:length(p)) { log_likelihood[i] <- sum(log(dbinom(x,N,p[i]))) } # Plot the likelihood profile plot(p,log_likelihood) # What probability of survival maximizes log likelihood? p[which.max(log_likelihood)] # How does this compare to the average across the 5 quadrats mean(x/N) 0.665

Focus in on the MLE… # what is the log-likelihood of the MLE? max(log_likelihood) [1] Things to note about log-likelihoods: They should always be negative! (if not, you have a problem with your likelihood function) The absolute magnitude of the log- likelihood increases as sample size increases

An example with continuous data… x = observed  = mean   = variance The normal PDF: In R: dnorm(x, mean = 0, sd = 1, log = FALSE) > dnorm(2,2.5,1) [1] > dnorm(2,2.5,1,log=T) [1] > Problem: Now there are TWO unknowns needed to calculate likelihood (the mean and the variance)! Solution: treat the variance just like another parameter in the model, and find the ML estimate of the variance just like you would any other parameter… (this is exactly what you’ll do in the lab this morning…)