# Uncertainty Quantification & the PSUADE Software

## Presentation on theme: "Uncertainty Quantification & the PSUADE Software"— Presentation transcript:

Uncertainty Quantification & the PSUADE Software
Mahmoud Khademi Supervisor: Prof. Ned Nedialkov Department of Computing and Software McMaster University, Hamilton, Ontario Canada 2012

Outline Introduction to Uncertainty Quantification (UQ) Identification
Characterization Propagation Analysis Common algorithms and methods PSUADE: UQ software library and environment https://computation.llnl.gov/casc/uncertainty_quantification/ Conclusions & future research directions

Introduction to UQ Quantitative characterization and reduction of uncertainty Estimating probability of certain outcomes when some aspects of system are unknown Advances of simulation-based scientific discovery caused emergence of verification and validation (V&V) and UQ Many problems in the natural sciences and engineering have uncertainty

Identification Model structure: models are only approximation to reality Numerical approximation: methods are not exact Input and model parameters may only be known approximately Variations in inputs and model parameters due to differences between instances of same object Noise, measurement errors and lack of data

Characterization Aleatoric (statistical) uncertainties: differ each time we run same experiment Monte Carlo methods are used, probability density function (PDF) can be represented by its moments Epistemic (systematic) uncertainties: due to things we could in principle know but don't in practice Fuzzy logic or generalization of Bayes theory are used

Propagation How uncertainty evolve?
Analyzing impact parameter uncertainties have on outputs Finding major sources of uncertainties (sensitivity analysis) Exploring “interesting” regions in parameter space (model exploration)

Analysis Assessing "anomalous" regions in parameter space (risk analysis) Creating integrity of a simulation model (validation) Providing information on which additional physical experiments are needed to improve understanding of system (experimental guidance)

Selecting Proper Methods
Is there nonlinear relationship between uncertain and output variables? Is uncertain parameter space high-dimensional? There may be some model form uncertainties How much is computational cost per simulation? Which experimental data are available?

Monte Carlo Algorithms
Based on repeated random sampling to compute their results Used when it is not feasible to compute an exact result with a deterministic algorithm Useful for simulating systems with many degrees of freedom, e.g. cellular structures

Monte Carlo Method: Outline
Define a domain of possible inputs Generate inputs randomly from a probability density function over domain Perform a deterministic computation on inputs Aggregate results

Polynomial Regression
Input data: Unknown parameters: ε: random error with mean zero conditioned on x

MARS MARS (multivariate adaptive regression splines) is weighted sum of some bases functions: Each basis is constant 1, hinge function or product of them as: or Each step of forward pass finds pair of bases functions that gives maximum reduction in error Backward pass prunes the model

MARS Versus Linear Regression

Principal Component Analysis
Consider a set of N points in n-dimensional space: Principal Component Analysis (PCA) looks for n by m linear transformation matrix W mapping original n-dimensional space into an m-dimensional feature space, where m < n: High variance is associated with more information

Principal Component Analysis
Scatter matrix of transformed feature vectors is: is scatter of input vectors & mean s Projection is chosen to maximize determinant of total scatter matrix of projected samples: are set of eigenvectors corresponding to m largest eigenvalues of scatter matrix of input vectors

PSUADE: How it works? Input section allows the users to specify number of inputs, their names, their range, their distributions, etc. Driver program can be in any language provided that it is executable. Run PSUADE with: [Linux] psuade psuade.in At completion of runs, information will be displayed and data file will also be created for further analysis

PSUADE Capabilities Can study first order sensitivities of individual input parameter (main effect) Can construct a relationship between some input parameters to model & output (response surface modeling) Can quantify impact of a subset of parameters on output (global sensitivity analysis) Can identify subset of parameters accounting for output variability (parameter screening)

PSUADE Capabilities Monte Carlo, quasi-Monte Carlo, Latin hypercube and variants, factorial, Morris method, Fourier Amplitude Sampling Test (FAST), etc Simulator Execution Environment Markov Chain Monte Carlo for parameter estimation and basic statistical analysis Many different types of response surfaces Many methods for main, second-order, and total-order effect analyses

Linear regres. (y with respect to x1)
Scatter plot of x1 and y Linear regres. (y with respect to x1) Quadratic regres. (y with respect to x1) MARS (y with respect to x1)

Linear regres. (y with respect to x2)
Scatter plot of x2 and y Linear regres. (y with respect to x2) Quadratic regres. (y with respect to x2) MARS (y with respect to x2)

Linear regres. (y with respect to x3)
Scatter plot of x3 and y Linear regres. (y with respect to x3) Quadratic regres. (y with respect to x3) MARS (y with respect to x3)

Sensitivity Analysis MARS screening rankings :
* Rank 1 : Input = 1 (score = 100.0) * Rank 2 : Input = 3 (score = 0.0) * Rank 3 : Input = 2 (score = 0.0) MOAT Analysis (ordered): Input 1 (mu*, sigma, dof) = e e-05 17 Input 3 (mu*, sigma, dof) = e e+00 -1 Input 2 (mu*, sigma, dof) = e e+00 -1 delta_test: perform Delta test: Order of importance (based on 20 best configurations): (D)Rank 1 : input 1 (score = 80 ) (D)Rank 2 : input 3 (score = 48 ) (D)Rank 3 : input 2 (score = 38 )

Sensitivity Analysis Gaussian process-based sensitivity analysis:
* Rank 1 : Input = 1 (score = 100.0) * Rank 2 : Input = 2 (score = 75.9) * Rank 3 : Input = 3 (score = 5.9) Sum-of-trees-based sensitivity analysis: * SumOfTrees screening rankings (with bootstrapping) * Minimum points per node = 10 * Rank 2 : Input = 3 (score = 0.9) * Rank 3 : Input = 2 (score = 0.0)

Correlation Analysis Pearson correlation coefficients (PEAR) - linear relationship - which gives a measure of relationship between X_i's & Y. * Pearson Correlation coeff. (Input 1) = e-01 * Pearson Correlation coeff. (Input 2) = e-18 * Pearson Correlation coeff. (Input 3) = e-18 Spearman coefficients (SPEA) - nonlinear relationship - * Spearman coefficient(ordered) (Input 1 ) = e-01 * Spearman coefficient(ordered) (Input 2 ) = e-02 * Spearman coefficient(ordered) (Input 3 ) = e-02

Main Effect Analysis RS-based 1-input Sobol' decomposition:
RSMSobol1: Normalized VCE (ordered) for input 1 = e+00 RSMSobol1: Normalized VCE (ordered) for input 2 = e-32 RSMSobol1: Normalized VCE (ordered) for input 3 = e-33 McKay's correlation ratio: INPUT 1 = 7.27e-01 (raw = 2.02e-09) INPUT 2 = 1.14e-11 (raw = 3.17e-20) INPUT 3 = 1.77e-35 (raw = 4.92e-44)

Response surface analysis (MARS)
Response surface anal . (Linear regres.)

Response surface anal ysis (Cubic)

Response surface analysis (Sum-of-trees)
Response surface anal ysis (Quartic)

Future Research Directions
Resolving curse of dimensionality Representation of uncertainty Bayesian computation & machine learning techniques e.g. stochastic multi-scale systems for model selection , classification & decision making Visualization in high-dimensional spaces