458 Fitting models to data – IV (Yet more on Maximum Likelihood Estimation) Fish 458, Lecture 11.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
458 Quantifying Uncertainty using Classical Methods (Likelihood Profile, Bootstrapping) Fish 458, Lecture 12.
Integration of sensory modalities
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
458 Meta-population models and movement (Continued) Fish 458; Lecture 19.
458 More on Model Building and Selection (Observation and process error; simulation testing and diagnostics) Fish 458, Lecture 15.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Estimation of parameters. Maximum likelihood What has happened was most likely.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
458 Fitting models to data – III (More on Maximum Likelihood Estimation) Fish 458, Lecture 10.
458 Fitting models to data – I (Sum of Squares) Fish 458, Lecture 7.
511 Friday March 30, 2001 Math/Stat 511 R. Sharpley Lecture #27: a. Verification of the derivation of the gamma random variable b.Begin the standard normal.
381 Discrete Probability Distributions (The Binomial Distribution) QSCI 381 – Lecture 13 (Larson and Farber, Sect 4.2)
Generalized Linear Models
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Review of Lecture Two Linear Regression Normal Equation
Chapter Two Probability Distributions: Discrete Variables
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Discrete Random Variables Chapter 4.
The maximum likelihood method Likelihood = probability that an observation is predicted by the specified model Plausible observations and plausible models.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
What is a probability distribution? It is the set of probabilities on a sample space or set of outcomes.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
381 Discrete Probability Distributions (The Poisson and Exponential Distributions) QSCI 381 – Lecture 15 (Larson and Farber, Sect 4.3)
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Quantitative Methods Analyzing event counts. Event Count Analysis Event counts involve a non-negative interger-valued random variable. Examples are the.
1 7. What to Optimize? In this session: 1.Can one do better by optimizing something else? 2.Likelihood, not LS? 3.Using a handful of likelihood functions.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
381 More on Continuous Probability Distributions QSCI 381 – Lecture 20.
Starting point for generating other distributions.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Machine Learning 5. Parametric Methods.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Introduction A probability distribution is obtained when probability values are assigned to all possible numerical values of a random variable. It may.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.
Nonlinear function minimization (review). Newton’s minimization method Ecological detective p. 267 Isaac Newton We want to find the minimum value of f(x)
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Probability distributions and likelihood
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
CHAPTER 12 MODELING COUNT DATA: THE POISSON AND NEGATIVE BINOMIAL REGRESSION MODELS Damodar Gujarati Econometrics by Example, second edition.
Statistical Modelling
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
12. Principles of Parameter Estimation
Parameter Estimation and Fitting to Data
Discrete Probability Distributions
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Maximum Likelihood Estimation
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Statistical NLP: Lecture 4
Integration of sensory modalities
Confidence Intervals for Proportions and Variances
Mathematical Foundations of BME Reza Shadmehr
12. Principles of Parameter Estimation
Presentation transcript:

458 Fitting models to data – IV (Yet more on Maximum Likelihood Estimation) Fish 458, Lecture 11

458 The Poisson Distribution The density function : rt is the expected number of events (r is a rate and t is time). k is the number of discrete events (count data). The Poisson distribution has only one parameter (rt) which is both the mean and the variance. However, often we find the variance is larger than would be expected under the Poisson model so assume this model with care – better still, look at the negative binomial distribution first!

458 Poisson Model – Example-I 100 longline sets are observed and the following data are collected. What is the rate (numbers per set) at which seabird are captured?

458 Poisson Model – Example-II The log-likelihood function (after removal of constants) is given by: This equation is maximized at r=0.69. How else could we have obtained the same estimate for r?

458 The Negative Binomial Distribution-I The negative binomial distribution extends the Poisson distribution by allowing the rate parameter to be a (gamma) random variable: R is the expected number of observations (discrete or continuous) k is an “overdispersion” parameter. Note:

458 The Negative Binomial Distribution-II The mean of the negative binomial distribution is R. The variance of the negative binomial distribution is: The negative binomial distribution collapses to the Poisson distribution as

458 The Negative Binomial Distribution-III

458 The Negative Binomial Distribution-IV Consider the case in which we monitor the catch of a given species (in number) as a function of fishing effort. If the catch occurs randomly per unit time we would expect the catch to be Poisson distributed with mean (and variance) equal to the product of the fishing duration and a rate of capture. For this problem, we apply the Poisson model and the Negative binomial model.

458 The Negative Binomial Distribution-V k is 84 for this fit – good evidence that the Poisson model is adequate.

458 The Negative Binomial Distribution-VI The data are now (really) overdispersed relative to a Poisson distribution. The estimates are again identical, but the negative binomial indicates lesser precision than the Poisson.

458 Overdispersion Overdisperson implies that the variance of the data is greater than that expected under the distribution assumed (e.g. Poisson  variance=mean). If the data are overdispersed but this is ignored, you are overweighting the data (i.e. underestimating their uncertainty).

458 Likelihood “Cheat sheet” Data? ContinuousDiscrete Can be negative? Normal / t lognormal / gamma Number of outcomes Binomial Poisson / Negative binomial / Multinomial 2 ManyNo Yes

458 Fitting – Miscellany-II Robustness In many cases, the assumptions underlying the likelihood function are wrong: “some data points are too unlikely”. Such data points are outliers. Outliers can either be left out of the analysis or the likelihood “robustified” to reduce their influence. Robustification includes: minimizing the median residual, leaving out the largest residuals, downweighting large residuals.

458 Fitting – Miscellany-III Contradictory data All probability statements are based on the assumptions of the model and likelihood function, and these may be wrong! Often when we have two (or more) data sources they disagree! The problem is that (at least) one data source is not measuring what we think it is. Solutions: Include some probability that each index doesn’t tell us anything; and Run separate assessments for each index in turn.

458 Contradictory data (northern cod) The northern cod dilemma: two abundance indices – one increasing (and relatively precise), the other not (and noisy). To pool or not to pool!

458 Additional Readings Chen, Y. Fournier, D Can. J. Fish. Aquat. Sci. 56: 1525 – Fournier, D.A.; Hampton, J.; Sibert, J.R Can. J. Fish. Aquat. Sci. 55: 2105–2116. Schnute, J.T.; Hilborn, R Can. J. Fish. Aquat. Sci. 50: 1916 – 1927.