Probabilistic tomography

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Data-Assimilation Research Centre
Introduction to Data Assimilation NCEO Data-assimilation training days 5-7 July 2010 Peter Jan van Leeuwen Data Assimilation Research Center (DARC) University.
Introduction to Data Assimilation Peter Jan van Leeuwen IMAU.
A Partition Modelling Approach to Tomographic Problems Thomas Bodin & Malcolm Sambridge Research School of Earth Sciences, Australian National University.
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
1 Parametric Sensitivity Analysis For Cancer Survival Models Using Large- Sample Normal Approximations To The Bayesian Posterior Distribution Gordon B.
Lecture 3 Probability and Measurement Error, Part 2.
Title Stephan Husen Institute of Geophysics, ETH Zurich, Switzerland,
Inverse Theory CIDER seismology lecture IV July 14, 2014
SPICE Research and Training Workshop III, July 22-28, Kinsale, Ireland Seismic wave Propagation and Imaging in Complex media: a European.
SES 2007 A Multiresolution Approach for Statistical Mobility Prediction of Unmanned Ground Vehicles 44 th Annual Technical Meeting of the Society of Engineering.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Inverse Problems in Geophysics
Data Mining Techniques Outline
Genetic algorithms for neural networks An introduction.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
The Earth in three dimensions From seismology to composition and temperature.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Experimental System for Predicting Shelf-Slope Optics (ESPreSSO): Assimilating ocean color data using an iterative ensemble smoother: skill assessment.
Team 2: Final Report Uncertainty Quantification in Geophysical Inverse Problems.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Lecture II-2: Probability Review
Classification: Internal Status: Draft Using the EnKF for combined state and parameter estimation Geir Evensen.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Markov Localization & Bayes Filtering
Surface wave tomography : 1. dispersion or phase based approaches (part A) Huajian Yao USTC April 19, 2013.
Particle Filters in high dimensions Peter Jan van Leeuwen and Mel Ades Data-Assimilation Research Centre DARC University of Reading Lorentz Center 2011.
Nonlinear Data Assimilation and Particle Filters
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
1 / 41 Inference and Computation with Population Codes 13 November 2012 Inference and Computation with Population Codes Alexandre Pouget, Peter Dayan,
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Bayesian Inversion of Stokes Profiles A.Asensio Ramos (IAC) M. J. Martínez González (LERMA) J. A. Rubiño Martín (IAC) Beaulieu Workshop ( Beaulieu sur.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
MIT Workshop for Advanced Methods on Earthquake Location
The good sides of Bayes Jeannot Trampert Utrecht University.
Dropout as a Bayesian Approximation
SPICE Research and Training Workshop III, Kinsale, Ireland 1/40 Trans-dimensional inverse problems, model comparison and the evidence Malcolm Sambridge.
Short Introduction to Particle Filtering by Arthur Pece [ follows my Introduction to Kalman filtering ]
Inverse Problems in Geophysics
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
A Random Subgrouping Scheme for Ensemble Kalman Filters Yun Liu Dept. of Atmospheric and Oceanic Science, University of Maryland Atmospheric and oceanic.
Geogg124: Data assimilation P. Lewis. What is Data Assimilation? Optimal merging of models and data Models Expression of current understanding about process.
Constraints on transition zone discontinuities from surface wave overtone measurements Ueli Meier Jeannot Trampert Andrew Curtis.
Reference Earth model: heat-producing elements & geoneutrino flux *Yu Huang, Roberta Rudnick, and Bill McDonough Geology, U Maryland *Slava Chubakov, Fabio.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Bayesian inference Lee Harrison York Neuroimaging Centre 23 / 10 / 2009.
Neural Network Approximation of High- dimensional Functions Peter Andras School of Computing and Mathematics Keele University
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Monte Carlo Sampling to Inverse Problems Wojciech Dębski Inst. Geophys. Polish Acad. Sci. 1 st Orfeus workshop: Waveform inversion.
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Special Topics In Scientific Computing
Filtering and State Estimation: Basic Concepts
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Presentation transcript:

Probabilistic tomography Jeannot Trampert Utrecht University

Is this the only model compatible with the data? Project some chosen model onto the model null space of the forward operator Then add to original model. The same data fit is guaranteed! Deal et al. JGR 1999

True model APPRAISAL Model uncertainty Data Estimated model Forward problem (exact theory) APPRAISAL Model uncertainty Data Estimated model Inverse problem (approximate theory, uneven data coverage, data errors) Ill-posed (null space, continuity) Ill-conditioned (error propagation)

In the presence of a model null space, the cost function has a flat valley and the standard least-squares analysis underestimates uncertainty. Realistic uncertainty for the given data

Multiple minima are possible if the model does not continuously depend on the data, even for a linear problem. Data Model

Does it matter? Yes! dlnVs dlnVphi dlnrho SPRD6 (Ishii and Tromp, 1999) Input model is density from SPRD6 (Resovsky and Trampert, 2002)

The most general solution of an inverse problem (Bayes, Tarantola)

Only a full model space search can estimate Exhaustive search Brute force Monte Carlo (Shapiro and Ritzwoller, 2002) Simulated Annealing (global optimisation with convergence proof) Genetic algorithms (global optimisation with no covergence proof) Sample r(m) and apply Metropolis rule on L(m). This will result in importance sampling of s(m) (Mosegaard and Tarantola, 1995) Neighbourhood algorithm (Sambridge, 1999) Neural networks (Meier et al., 2007)

The model space is HUGE! Draw a 1000 models per second where m={0,1} M=30  13 days M=50  35702 years Seismic tomography M = O(1000-100000)

The model space is EMPTY! Tarantola, 2005

The curse of dimensionality  small problems M~30 Exhaustive search, IMPOSSIBLE Brute force Monte Carlo, DIFFICULT Simulated Annealing (global optimisation with convergence proof), DIFFICULT Genetic algorithms (global optimisation with no covergence proof), ??? Sample r(m) and apply Metropolis rule on L(m). This will result in importance sampling of s(m) (Mosegaard and Tarantola, 1995), BETTER, BUT HOW TO GET A MARGINAL Neighbourhood algorithm (Sambridge, 1999), THIS WORKS Neural networks, PROMISING

The neighbourhood algorithm: Sambridge 1999 Stage 1: Guided sampling of the model space. Samples concentrate in areas (neighbourhoods) of better fit.

The neighbourhood algorithm (NA): Stage 2: importance sampling Resampling so that sampling density reflects posterior 2D marginal 1D marginal

Advantages of NA Interpolation in model space with Voronoi cells Relative ranking in both stages (less dependent on data uncertainty) Marginals calculated by Monte Carlo integration  convergence check

NA and Least Squares consistency As damping is reduced, LS solution converges towards most likely NA solution. In the presence of a null space, LS solution will diverge, but NA solution remains unaffected. NA-LS consistency

Finally some tomography! We sampled all models compatible with the data using the Neighbourhood Algorithm (Sambridge, GJI 1999) Data: 649 modes (NM, SW fundamentals and overtones) He and Tromp, Laske and Masters, Masters and Laske, Resovsky and Ritzwoller, Resovsky and Pestena, Trampert and Woodhouse, Tromp and Zanzerkia, Woodhouse and Wong, van Heijst and Woodhouse, Widmer-Schnidrig Parameterization: 5 layers [ 2891-2000 km] [ 2000-1200 km] [1200-660 km] [660-400 km] [400-24 km] Seismic parameters dlnVs, dlnVΦ, dlnρ, relative topography on CMB and 660

Back to density Stage 1 Most likely model SPRD6 Stage 2

Gravity filtered models over 15x15 degree equal area caps Likelihoods in each cap are nearly Gaussian Most likely model (above one standard deviation) Uniform uncertainties dlnVs=0.12 % dlnVΦ=0.26 % dlnρ=0.48 %

chemical heterogeneity Likelihood of correlation between seismic parameters Evidence for chemical heterogeneity Vs-VF Half-height vertical bar corresponds to SPRD6

chemical heterogeneity Likelihood of rms-amplitude of seismic parameters Evidence for chemical heterogeneity RMS density Half-height vertical bar corresponds to SPRD6

The neural network (NN) approach: Bishop 1995, MacKay 2003 A neural network can be seen as a non-linear filter between some input and output The NN is an approximation to any function g in the non-linear relation d=g(m) A training set is used to calculate the coefficients of the NN by non-linear optimisation

The neural network (NN) approach: Trained on forward relation Trained on inverse relation

1) Generate a training set: D={dn,mn} 2) Network training: 3) Forward propagating a new datum through the trained network (i.e. solving the inverse problem) dobs p(m|dobs,w*) 1) Generate a training set: D={dn,mn} Network Input: Observed or synthetic data for training dn Neural Network We first need to generate a so called training set. Consisting of a collection of random earth models and the corresponding dispersion curves. We then design a neural network which takes dispersion curves as input vector and it’s output are the various parameters of a guassian mixture model, which Network Output: Conditional probability density p(m|dn,w)

Model Parameterization Topography (Sediment) Vs (Vp, r linearly scaled) 3 layers Moho Depth Vpv, Vph, Vsv, Vsh, h 7 layers +- 10 % Prem 220 km Discontinuity As already mentioned we consider local dispersion curves as the response of a 1-dimensional earth model parameterized as a stratigraphical earth model with various layers. The data we consider do not have above 400 km we allow the main discontinuites such as 220, Moho and Topography to vary as well as the indicated paramters in the layers. We have three crustal layers with a sedimentary layer of variable thickness on top. All the prior information is explicitly defined by the bounds of variations on all the model parameters. Moho depth for example is allowd to vary between 10-110 km over contients and 0-40 km under oceans. We proceed by randomly generating a series of earth models where all the indicated parameters are allowed to vary and compute the corresponding synthetic dispersion curves. Surface waves in the period range considered can not resolve all three crustal layers. Instead of inverting for shear wave velocity in each crustal layer individually we invert for the vertically averaged crustal shear wave velocity. We actually do this as allready mentioned with an extenden neural network approach the mixture density network. The application of which can best be explained with a short flowchart. Vp, Vs, r 8 layers +- 5 % Prem 400 km Discontinuity

Example of a Vs realisation

Advantages of NN The curse of dimensionality is not a problem because NN approximates a function not a data prediction! Flexible: invert for any combination of parameters (Single marginal output)

Advantages of NN 4. Information gain Ipost-Iprior allows to see which parameter is resolved Information gain for Moho depth (Meier et al, 2007)

Standard deviation s [km] Mean Moho depth [km] The aim of my project is to map discontinuities such as crustal thickness shown here within the Earth. Additionally we would like to obtain the corresponding uncertainties in our estimate as well. Over the last decades seismologist deduced images of the Earth interior from the information contained in 100 thousands of recorded seimograms. These images generally show fast and slow velocity anomalies of p and s wave velocity with respect to a reference model at various depth. Relatively little effort has been undertaken to actually map the topography of the various discontinuities within the Earth. Knowledge of which is crucial for various other fields in the Earth Sciences. This forms the main motivation of my Phd research, developing a new approach to invert surface wave data for discontinuities within the Earth. First I give a short introduction about the kind of data we consider, then I will explain the basic concepts of the method we use and then I will show an application of this approach to global crustal thickness. Meier et al. 2007 Standard deviation s [km]

Mean Moho depth [km] Meier et al. 2007 CRUST2.0 Moho depth [km]

NA versus NN NA: joint pdf  marginals by integration  small problem NN: 1D marginals only (one network per parameter)  large problem

Advantage of having a pdf Quantitative comparison

Mean models of Gaussian pdfs, compatible with observed gravity data, with almost uniform standard deviations: dlnVs=0.12 % dlnVΦ=0.26 % dlnρ=0.48 %

Thermo-chemical Parameterization: Temperature Variation of Pv Variation of total Fe

Sensitivities

Ricard et al. JGR 1993 Lithgow-Bertelloni and Richards Reviews of Geophys. 1998 Superplumes

Tackley, GGG, 2002