Statistics 2 for Chemical Engineering lecture 4

Slides:



Advertisements
Similar presentations
Sampling plans for linear regression
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Design-Expert version 71 What’s New in Design-Expert version 7 Mixture and Combined Design Pat Whitcomb March 25, 2006.
Polynomial Regression and Transformations STA 671 Summer 2008.
1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
IE341 Problems. 1.Nuisance effects can be known or unknown. (a) If they are known, what are the ways you can deal with them? (b) What happens if they.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Experimental Design, Response Surface Analysis, and Optimization
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Sampling plans Given a domain, we can reduce the prediction error by good choice of the sampling points The choice of sampling locations is called “design.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Rethinking Steepest Ascent for Multiple Response Applications Robert W. Mee Jihua Xiao University of Tennessee.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
Mixture Designs Simplex Lattice Simplex Centroid
11.1 Introduction to Response Surface Methodology
Response Surface Method Principle Component Analysis
Design and Analysis of Experiments
Section 4.2 Fitting Curves and Surfaces by Least Squares.
1 Chapter 6 The 2 k Factorial Design Introduction The special cases of the general factorial design (Chapter 5) k factors and each factor has only.
Lecture 23: Tues., Dec. 2 Today: Thursday:
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Industrial Applications of Response Surface Methodolgy John Borkowski Montana State University Pattaya Conference on Statistics Pattaya, Thailand.
Response Surfaces max(S(  )) Marco Lattuada Swiss Federal Institute of Technology - ETH Institut für Chemie und Bioingenieurwissenschaften ETH Hönggerberg/
/ department of mathematics and computer science DS01 Statistics 2 for Chemical Engineering lecture 3
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
/ department of mathematics and computer science DS01 Statistics 2 for Chemical Engineering lecture 2
Lecture 17 Today: Start Chapter 9 Next day: More of Chapter 9.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Design-Expert version 71 What’s New in Design-Expert version 7 Factorial and RSM Design Pat Whitcomb November, 2006.
Space-Filling DOEs Design of experiments (DOE) for noisy data tend to place points on the boundary of the domain. When the error in the surrogate is due.
Objectives of Multiple Regression
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Simple Linear Regression Models
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Industrial Applications of Experimental Design John Borkowski Montana State University University of Economics and Finance HCMC, Vietnam.
Selecting Variables and Avoiding Pitfalls Chapters 6 and 7.
Statistical Design of Experiments
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Engineering Statistics ENGR 592 Prepared by: Mariam El-Maghraby Date: 26/05/04 Design of Experiments Plackett-Burman Box-Behnken.
Chapter 11Design & Analysis of Experiments 8E 2012 Montgomery 1.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
MSE-415: B. Hawrylo Chapter 13 – Robust Design What is robust design/process/product?: A robust product (process) is one that performs as intended even.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1 Statistical Design of Experiments BITS Pilani, November ~ Shilpa Gupta (97A4)
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
International Conference on Design of Experiments and Its Applications July 9-13, 2006, Tianjin, P.R. China Sung Hyun Park, Hyuk Joo Kim and Jae-Il.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
1 System Planning 2013 Lecture 7: Optimization Appendix A Contents: –General about optimization –Formulating optimization problems –Linear Programming.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
L. M. LyeDOE Course1 Design and Analysis of Multi-Factored Experiments Response Surface Methodology.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Lecture 18 Today: More Chapter 9 Next day: Finish Chapter 9.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Stats Methods at IC Lecture 3: Regression.
Chapter 5 Introduction to Factorial Designs
CHAPTER 29: Multiple Regression*
Hypothesis testing and Estimation
ENM 310 Design of Experiments and Regression Analysis Chapter 3
Presentation transcript:

Statistics 2 for Chemical Engineering lecture 4 2DS01 Statistics 2 for Chemical Engineering lecture 4

Contents Summary of previous lectures Limitations of factorial designs and standard RSM designs mixture designs D-optimal designs

Summary of previous lectures one-way ANOVA: compare means of several groups noise reduction through blocking factorial designs: screening blocks fractions centre points optimisation steepest ascent designs CCD Box-Behnken

Example 1: adhesive amount of adhesive temperature factors: amount of adhesive temperature constraints (in terms of coded variables) too little adhesive at too low temperature: unsatisfactory bonding too much adhesive at too high temperature: damage experimental region: Source: Montgomery, Design and Analysis of Experiments, 5th edition

Example 2: separation of chlorophenols Factors: pH percentage organic modifier Constraints: retention times should be not too short nor too long Model (based on RPLC knowledge): complete second order model + 3rd order term in pH Experimental region: Source: P.F. De Aguilar, Chem. Int. Lab. Syst. 30 1995), 199-210. RPLC = reversed-phase liquid chromatography

Example 3: Blending of gasoline Factors: types of octanes Constraints: effect of octanes only depends on proportions Model not known in general; sometimes only small number of octanes are active Experimental region: simplex (triangle, tetrahedron)

Mixtures: necessity for new designs for independent factors, factorial designs are suitable (exp. region: hypercube) in mixtures, factors are dependent because they add up to 100% notions of effects and interactions do not carry over to mixture experiments hypercube experimental regions give poor coverage of experimental region of mixtures:

Mixture designs factors are ingredients of mixture factors are dependent constraints: 0  xi  1 x1 + x2 + x3 +... + xp = 1 experimental region is simplex: x1 + x2 = 1 x1 + x2 + x3= 1

Trilinear coordinate system x1 (1,0,0) 0.8 (1/3,1/3,1/3) (1/2, 1/2,0) 0.6 0.4 (0,1,0) (0,0,1) 0.2 x2 x3

Simplex lattice design {p,m} -simplex lattice design p = number of factors m+1 = number of factor levels xi = 0, 1/m, 2/m, ..., 1 (i = 1, ..., p) total number of design points: Examples: {3,2} lattice {3,3} lattice

Simplex centroid design p components: p permutations of (1,0,...,0) permutations of (1/2,1/2,0,....,0) permutations of (1/3,1/3,1/3,0,....,0) .... total 2p-1 design points Example: 3 components x1 = 1 x2 = x23 = 1/2 x1 = x2 = 1/2 x1 = x2 = x3= 1/3 x2 = 1 x3 = 1 x2 = x3 = 1/2

Models for mixture designs Polynomial models for mixture responses may be written in different ways because of constraint x1+ x2 + x3 +... + xp = 1. Usual interpretation of constant term does not make sense (measurements at (0,0,...,0) are impossible). The constant term can always be removed, e.g., for 3 components we may write

Scheffé canonical polynomials In order to have meaningful interpretations of coefficients, one applies canonical forms of polynomials for mixture data. Scheffé introduced the following polynomials (examples for p=3): linear: quadratic special cubic cubic There exist other types of canonical polynomials: Cox polynomials homogeneous polynomials (Kronecker type)

Mixture models: interpretation of coefficients usual interpretation of interaction no longer holds due to dependence mixture factors i is expected response when xi =1 and xj =0 (“pure blend”) i + j + ij is expected response when xi +xj =1 excess ij indicates “interaction” effect: - ij > 0: “(binary) synergistic blending” - ij < 0: “(binary) antagonistic blending”

Simplex-lattice versus simplex centroid designs simplex-lattice allows for fine grid on experimental region {p,m} simplex-lattice cannot detect synergisms of order higher than m simplex centroid may be executed sequentially (first pure blends, then binary mixtures, ...) both designs have most of their points on the boundary ( = at least one factor equal to 0 )

General recommendations for mixture designs allow enough degrees of freedom (# design points - # model terms) to allow precise estimation of variance add extra points of special interest replicate design add points in interior to increase coverage of experimental region to increase degrees of freedom for variance estimation perform lack-of-fit test if there are replicates use linear model when screening; use higher-order models for optimization perform blocking if necessary

Various remarks about mixture designs mixture designs may be combined with factorial designs when some variables are not related to the mixture (“process variables”) pseudocomponents may be used when there are further restrictions on the mixture ingredients like 0 ≤ xi ≤ 0.3

Example of analysis of mixture data octane blending with 3 components response is octane rating goal is optimization of octane rating simplex centroid design 23-1 = 7 points two additional check points of commercial interest of current production process every observation repeated, so in total 18 observations all experiments under same conditions, so no blocks because the goal is optimization, we start with the quadratic model (simplest model that allow optimization)

Results of analysis mixture data: quadratic model residuals look OK significant model (p-value in ANOVA < 0.05; see also high R2) BUT: significant lack-of-fit (option must be actived in Statgraphics by using right-mouse click)

Results of analysis mixture data: special-cubic model choose next simplest model (leaves more degrees of freedom for accurate estimation of error variance) residuals look OK significant model (p-value in ANOVA < 0.05) and no significant lack-of-fit

Further results special-cubic model residuals show only light indication of not being normally distributed slight pattern in residual plots (variance not constant) BC “ interaction” not significant (unimportant when optimizing) antagonistic blending of AB and AC

Optimization results optimum near x1=1.0

Limitations of factorial designs + classical RSM designs experimental region may not be hypercube impossibility to reach corner experimental region specific constraints process factors are ingredients of mixture chemical knowledge postulates asymmetrical model interaction not possible extra higher order term for one factor Factorial designs and classical RSM designs (CCD, Box-Behnken) cannot be used in these circumstances.

Some desirable properties of designs require minimum number of experimental runs allows precise estimates of regression coefficients allows precise predictions of responses allows experiments to be performed in blocks make it possible to detect lack-of-fit Note: 2. and 3. seem similar, but are not the same! We will generalize the use of corner points in 2p designs using criterion 2.

Example: simple linear regression given: minimal and maximal settings of factor problem: which settings are optimal for determining slope? min max min max large effect in slope small effect in slope

Simple linear regression: variance of slope

Distribution of design points: simple linear regression Recall: variance of slope small if large Experimental region: -1  x  +1 n = 2: x1 = -1 and x2 = +1 (or vice-versa): S = 2 n = 3 : x1 = -1 , x2 = 0, x3 = +1: S = 2 x1 = -1 , x2 = -1, x3 = +1: S = 8/3 > 2 x1 = -1 , x2 = c, x3 = +1: S = 2/3 * (c2+3) “optimal solution” (not feasible!) : 1 ½ measurement at –1 1 ½ measurement at +1

General setup: matrix formulation

Design matrix: quadratic linear regression

Information matrix and confidence regions Confidence region for regression parameters: Properties of confidence region: it is an ellipsoid volume proportional to (det(XtX)-1)1/2 length of axes proportional to (eigenvalues)1/2 of (XtX)-1

Information matrix and prediction variance where f t (x) is a row vector with entries of design matrix X Example: In order to compare designs one uses scaled prediction variance:

Comparison of designs: n=3 E(Y) = 0 + 1 x1 design -1,0,1 (Xt X)-1(2,2)=1/2 scaled predicted variance: 1 + 3/2 x2 E(Y) = 0 + 1 x1 design -1,1,1 (Xt X)-1(2,2)=3/8 scaled predicted variance: 3/8*(3-2x + 3 x2) better choice for maximum predicted variance better choice for slope

Exact design versus continuous designs mathematical design puts weights on design points exact design optimal distribution may not be feasible (non-integer weights) continuous design: optimal distribution with integer weights is feasible

Confidence region: example 1 1 small variance, i.e. known with high precision 2 large variance, i.e. known with low precision axes ellipsoid parallel to coordinate axes, hence parameter estimates for 1 and 2 uncorrelated 2 1

Confidence region: example 2 1 and 2 known with same precision axes ellipsoid parallel to coordinate axes, hence parameter estimates for 1 and 2 uncorrelated 2 1

Confidence region: example 3 1 medium variance, i.e. known with medium precision 2 large variance, i.e. known with low precision axes ellipsoid not parallel to coordinate axes, hence parameter estimates for 1 and 2 correlated 2 1

Optimality criteria Several criteria are being used to construct optimal designs: based on ( X t X )-1: A-optimality (maximize trace = sum of eigenvalues) D-optimality (maximize determinant) based on prediction variance G-optimality (minimize maximum scaled prediction variance) V-optimality (minimize average scaled prediction variance) Note: usual 2p designs are D-optimal!

Algorithms several algorithms exist to compute (approximately) D-optimal designs algorithms usually require candidate set of design points exhaustive search of all possible subsets often not possible exchange algorithms try to optimize criterion by exchanging candidate points or coordinates of candidate points

Software Matlab -> Statistics Toolbox cordexch (coordinate exchange algorithm) rowexch ( row exchange algorithm) x2fx (generates design matrix for standard models) Statgraphics ->Special -> Experimental Design -> Optimize Design Gosset: http://www.research.att.com/~njas/gosset/ (limited Windows version (called Strategy) available at http://www.strategy4doe.com/ )

Example: separation of chlorophenols steps in pH: 0.1 steps in organic modifier: 1% constraints 5.7  pH  7.2 24%  % modifier  50% modifier+14.8*pH  129.8 model: Y = 0 + 1 x1 + 2 x2 + 11 x12 + 22 x22+ 12 x1 x2 + 111 x13 minimal 7 runs necessary for 7 parameters + additional runs to estimate variance possible combinations to check???? Source: P.F. De Aguilar, Chem. Int. Lab. Syst. 30 1995), 199-210. RPLC = reversed-phase liquid chromatography

Literature P.F. de Aguiar et al., D-optimal designs (tutorial), Chem. Intell. Lab. Syst. 30 (1995), 199-210. L.E. Eriksson et al., Mixture design – design generation, PLS analysis, and model usage (tutorial), Chem. Intell. Lab. Syst. 43 (1998), 1-24. NIST Engineering Statistics Handbook: http://www.itl.nist.gov/div898/handbook/