Statistical modelling and latent variables. Constructing models based on insight and motivation.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Previous Lecture: Distributions. Introduction to Biostatistics and Bioinformatics Estimation I This Lecture By Judy Zhong Assistant Professor Division.

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Bayesian statistics 2 More on priors plus model choice.

Statistics review of basic probability and statistics.

Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.

Simulation with ArenaAppendix C – A Refresher on Probability and StatisticsSlide 1 of 33 A Refresher on Probability and Statistics Appendix C.

Sampling Distributions (§ )

Probability - 1 Probability statements are about likelihood, NOT determinism Example: You can’t say there is a 100% chance of rain (no possibility of.

How to get data and model to fit together?. The field of statistics Not dry numbers, but the essence in them. Model vs data – estimation of parameters.

Objective: To estimate population means with various confidence levels.

Bayesian statistics – MCMC techniques

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

Probability theory Much inspired by the presentation of Kren and Samuelsson.

Topic 7 Sampling And Sampling Distributions. The term Population represents everything we want to study, bearing in mind that the population is ever changing.

Maximum likelihood (ML) and likelihood ratio (LR) test

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Evaluating Hypotheses

Probability and Probability Distributions

Visual Recognition Tutorial

Experimental Evaluation

Inferences About Process Quality

Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Standard Error of the Mean

INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.

Statistical Decision Theory

Model Inference and Averaging

Modeling and Simulation CS 313

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.

Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.

Random Sampling, Point Estimation and Maximum Likelihood.

PARAMETRIC STATISTICAL INFERENCE

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

1 G Lect 3b G Lecture 3b Why are means and variances so useful? Recap of random variables and expectations with examples Further consideration.

0 K. Salah 2. Review of Probability and Statistics Refs: Law & Kelton, Chapter 4.

Geo597 Geostatistics Ch9 Random Function Models.

BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

Bayesian statistics Probabilities for everything.

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

Chapter 1 Overview and Descriptive Statistics 1111.1 - Populations, Samples and Processes 1111.2 - Pictorial and Tabular Methods in Descriptive.

CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Confidence Interval & Unbiased Estimator Review and Foreword.

Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.

Inference: Probabilities and Distributions Feb , 2012.

Sampling and estimation Petter Mostad

Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Statistical modelling and latent variables.

Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.

4. Overview of Probability Network Performance and Quality of Service.

Pattern Recognition Probability Review

Oliver Schulte Machine Learning 726

Bayes Net Learning: Bayesian Approaches

Oliver Schulte Machine Learning 726

Review of Probability and Estimators Arun Das, Jason Rebello

Sampling Distributions (§ )

Presentation transcript:

Statistical modelling and latent variables. Constructing models based on insight and motivation

Statistical modelling - why? A statistical model describes a possible distribution of incoming data, given some (unknown) parameter values - likelihood. If this model is to be useful, knowing the parameter values would answer some of the questions you have. –Yes/no-answers, detecting new effects –Decision-making –Prediction –Quantifying effects –Statistics (probabilities, averages, variances etc.) –Results could be used as data for further analysis With data, the statistical model can be used for sayings something about the parameter values (inference).

Models and reality A model that exactly describes reality is unrealistic, but... If a model contains properties we know not to be the case, our inference will suffer. (GIGO) –Unrealistic estimates, effects, uncertainties, probabilities, predictions –Faulty decision making –Incorrect answers to yes/no questions. Give your model the chance to be right! –Exception: When added realism makes the inference much harder without affecting the accuracy of what you want answered. Einstein: Everything should be as simple as possible, but not simpler! (See also Occam’s razor)

When a model clashes with reality - scale data Confidence interval for average mammoth body mass Dataset: x=(5000kg,6000kg,11000kg ) Model 1: x i ~N( ,  2 ) i.i.d. –Allows for single mammoths to have negative mass! –Resulting 95% confidence interval, C(  )=(-650kg,15300kg) contains expectation values that cannot possible be correct! Model 2: log(x i ) ~ N( ,  2 ) i.i.d. ( x i ~ log N( ,  2 ) ) –Only positive measured body mass and expectancy, E(x i )=exp(  +  2 /2). –Resulting 95% bootstrapped confidence interval: (2500kg,10400kg). Even better if we could include some prior assumptions. Getting an unbiased point estimate is a bit more involved. If only an unbiased estimate is wanted, model 1 may be better.

When a model clashes with reality – independence vs timeseries Simulated water temperature series with expectancy  =10. Assume variance known,  =2. Want to estimate  and test  =10. Model 1, independence: T i =  +  i,  i ~N(0,1) i.i.d. –The graph seems to say a different story... –Estimated: –95% conf. int. for  : (11.02,11.80).  =10 rejected! Model 2, auto-correlated model with expectancy , standard deviation  and auto-correlation a. –Linear dependency between the temperature one day and the next. –Estimated: –95% conf. int. for  : (8.7,14.10).  =10 not rejected.

Some notation: Use Pr(x) to denote the probability that a certain random variable (X) has the value x. Not possible for continuous variables ( except in the form Pr(a<X<b) ). Keep in mind that probabilities should sum to one: Pr(A)+Pr(not A)=1. Use f(x) to denote probability density of a continuous random variable (X), as a function of it’s input argument, x. Use f for different such variables, using the input argument to denote which density we are looking at ( f(x) is the density of x, f(y) is the density of y ). Keep in mind that probability density should integrate to one,  f(x)dx=1. I will switch between probability and probability density, since some of the variables we may be looking at are discrete, while others are continuous.

Parameters, observations and latent variables – Observations, D Observations, D: o Before getting them, they are random variables. You can not accurately predict them. They have an element of stochasticity. You can assign a statistical probability distribution to them (the model). o After getting them, they should tell you something about their distribution and this again should answer the questions you are looking for. o Do not gather data that are not relevant to your questions! (If you have a choice of gathering data that is a little relevant or a lot relevant, choose the latter.)

Parameters, observations and latent variables – Parameters,  Their values are assumed to be fixed but unknown. Getting data (D) should reveal something about the nature of the parameter set (  ). The parameters should affect the outcome of observations. I.e. the likelihood, Pr(D|  ) or f(D|  ) as a function of , should not be flat. Arrows (green) here means we have a specification of the probability of D given  (the likelihood). The reason we are interested in the parameter values is because that this should answer some questions we may have. In frequentist statistics: We look at functions of the data which relates to the parameters in a simple way, estimators. An estimator is a random variable, just like the individual data.

Parameters, observations and latent variables – latent variables, L Latent variables (L) are unobserved but random: Pr(L) or f(L). Can add realism to our modelling. Stuff we observe can depend on unobserved states and processes (that has some element of unpredictability/randomness in them). Affects the outcome of observations (D), just like parameters, Pr(D|L). Thus getting data should reveal something about latent variables. Since both D and L is stochastic, this is a conditional probability. We need to be able to deal with such... Since L are unknown random variables rather than unknown fixed values, we can use probability theory to sum up what we know about the latent variables, given the data.

Conditional probabilities – definitions and intuition Pr(B|A) means the probability that B is the case, given that we know that A is the case. For example Pr(rain | overcast) means the probability that it rains for those cases when it’s overcast. A is probabilistic evidence for B when Pr(B|A)>Pr(B). Technically, it is defined by looking at the distribution of both A and B and then “zooming in on A”: Pr(B|A)=Pr(A and B)/Pr(A). So it’s that fraction of probability space where both A and B happens in relation to that fraction of the space of possibilities where B happens. We remove all possibilities of B not happening. Ex: Pr(rain and overcast)/Pr(overcast) = Pr(rain | overcast) Overcast Rain Overcast Rain Sunny Pr(rain and overcast) (dark area compared to the whole) Pr(rain | overcast) (dark area compared to the whole)

Conditional probabilities – combined probabilities If we run the definition of conditional probability backwards, we get the probability for a combination: Pr(A and B)=Pr(B|A)Pr(A). (We “zoom out” from the probability of B when A is certain to the probability of A and B, where neither A nor B is a certainty.) Pr(rain | overcast) Pr(overcast) = Pr(rain and overcast) Overcast Rain Overcast Rain Sunny Pr(rain | overcast) (dark area compared to the whole) Pr(rain and overcast) (dark area compared to the whole)

Conditional probabilities – dependence and information (1) Independence means Pr(A and B)=Pr(A)Pr(B) which is equivalent to Pr(B|A)=Pr(B) and Pr(A|B)=Pr(A). For instance, the probability of getting a 6 on the second throw of the die is the same as getting a six on the second throw, given that you got 3 on the first. Knowing the result for the first die doesn’t help you predict the outcome of the next. With dependency, the probabilities change when we condition. Getting information about A also gives us information (evidence) about B. The arrows describes how we model. In case of A  B, it says that we start with a probabilistic description of A (Pr(A)) and then supply this with a description of B given A (Pr(B|A)). This gives us the combined probability, Pr(A,B)=Pr(A)Pr(B|A). Typically, we build our models from our understanding of what and how something affects something else. (For instance, occupancy affects detections but not vice versa).

Conditional probabilities – dependence and information (2) If B depends on A then A depends on B. Dependency flows both ways. We only use arrows to show in which way we do our modelling (usually based on out understanding of what affects what). We can describe the dependency structure of several phenomena: Ex: or Pr(A,B,C)=Pr(A)Pr(B|A)Pr(C|B) Pr(A,B,C)=Pr(B)Pr(A|B)Pr(C|B) We may model by first sketching what influences what. That will then inform us about the structure of the combined probabilities. When we specify what influences what and *how*, we have a model. Concrete example: A) Carrying capacity  Actual population size  Measured population size B) Finch egg laying  Season  River discharge In A  B  C, knowing B tells us something about both A and C. Knowing A tells us something about B and C. But, conditioned on B, A says nothing about C or vice versa. ABCABC

Conditional probabilities – from conditional probabilities to marginal probabilities Sometimes we want the distribution of quantity without having to specify anything else. The law of total probability says that the marginal (unconditional) probability for B is: Example: Pr(rain) = Pr(rain | overcast)Pr(overcast) + Pr(rain | sunny)Pr(sunny) Useful when calculating likelihoods (later). Rain Sunny Overcast Rain + =

Conditional probabilities – Bayes theorem Looking at latent variables L and data D, we start out with a specification of L (the marginal probabilities) and a specification of D given L. Know: the data. Unknown: the latent variables. We’re interested in the opposite of what has been modelled, the marginal probability of the data and the probability of the latent variable given the data. Since D and L are dependent, we can do this. The law of total probabilities gives us the first: The definition of conditional probabilities gives use the second, Bayes theorem: For continuous variables, replace probabilities with probability densities and sums with integrals.