# Multivariate Statistical Process Control and Optimization

## Presentation on theme: "Multivariate Statistical Process Control and Optimization"— Presentation transcript:

Multivariate Statistical Process Control and Optimization
Alexey Pomerantsev & Oxana Rodionova Semenov Institute of Chemical Physics Russian Chemometrics Society © Chris Marks Traditionally, at the beginning, I show you a picture from the previous conference. Here it is. Our compliments are to Chris for this excellent photo. Some of you could remember Belokurikha highlands and pleasant time we have had there.

Agenda Introduction SPC MSPC Passive optimization (E-MSPC)
Active optimization (MSPO) Conclusions The topic of this lecture is statistical process control and optimization. The first word – control is well known now. The multivariate statistical process control (MSPC) is now a very popular approach that helps to understand and to run the real-world technological processes. It combines the old-known statistical methods like the statistical process control with the modern multivariate data analysis techniques in order to produce a new knowledge about the process in question. This knowledge gives an easy way to monitor and to control the process, but it does not offer a method to optimize the process performance. However, we think that the general aim of any statistical analysis of technology is to improve the quality of the final product or/and to reduce the production costs. In other words there should be a method that could optimize the process.

Statistical Process Control (SPC)
SPC Objective To monitor the performance of the process SPC Concept To study historical data representing good past process behaviour SPC Method Conventional statistical methods In this slide I show you the main features of SPC. SPC Approach To plot univariate chart in order to monitor key process variables

Historical Process Data (Chemical Reactor)
Production cycles s1, s2, ... ,s54 Key process variables (sensors) X1, X2, ... , X17 Here we show our first example that is chemical industrial process. It is described by 17 key process variables, which actually are various sensor indicators. We have a set of historical data about this process represented by the 54 rows in the table. Each row represents a time point where sensor indices were fixed.

Shewart Charts (1931) Using Shewart approach we can represent the data of sensor 1 by the following chart. Here we can see sensor indices versus time as well as two limits. The first is the boundary for the normal and the second one is the limit for out-of –control process state. That is the first sensor chart and this is the second sensor chart and now we can see both sensors together. Owing to the different natural sensor scales such a plot is very inconvenient, so we performed scaling and plot all sensors together. Now all sensors have common control limit that is equal to 1.

Panel Process Control (just a game)
This, however, is still inconvenient to control and now we present the process model in column diagram view. This is just a computer game, which simulates the process behavior. Playing this game could give us some ideas about the process. New each sensor is represented by the bar and control levels are shown by these red lines. Each bar should be inside and do not exceed the limits. Watching these bars operator can control the process and even manage it. Let we try to do it too. Using these handles we can increase and decrease a variable value. However we cannot operate them independently. If we increase sensor 10 index, we see that other indices are changed too. Some become higher, other become lower. Thus we see they are correlated. No wonder as everybody should expect that temperature and pressure be highly correlated in chemical reactor.

Multivariate Statistical Process Control (MSPC)
MSPC Objective To monitor the performance of the process MSPC Concept To study historical data representing good past process behavior MSPC Method Projection methods of Multivariate Data Analysis (PCA, PCR, PLS) This idea moves us to the Multivariate Statistical Process Control. It has the same objective and concept as the conventional SPC. The main distinction is that now we will take into account internal links that present in the system. I am sure that everybody here knows how to do it. MDA in general and Projection technique in particular are the tools we need. MSPC Approach To plot multivariate score plots to monitor the process behavior

Projection Methods Initial Data Data Plane Data Center PCs
Data Projections Usually we spend a lot of time in explanation of these pictures for the industry people. Now I may relax a little and give you a chance to enjoy our advanced approach in teaching PCA for engineers.

Low Dimensional Presentation
Now we can represent our data in a low dimensional subspace.

MSPC Charts (Chemical Reactor)
Samples Key Variables We applied projection method (namely PCA) to our example and obtained the following results. In this plot we see the score representation of all samples in two dimensional space. The red ellipse (T-Hotelling) stands for in-state-of-control process limit. The second plot shows all sensors in loading space. Here we can clearly see two groups of similar sensors as well as negative correlation between sensor X7 and the first group.

Panel Process Control (not just a game)
This is our game again but now it is more complicated. Besides the bar chart we show the Score chart, where current process state is represented by the dot within the control ellipse. Watching this plot, operator could control and manage process behavior in easy way. Moreover, now, having the PCA model of the process, we can actively interfere in the management and to suggest a stabilization procedure that moves the process into stabile state. This is a first attempt to optimize the process. More advanced approach will be considered with the next example which is the famous Norwegian cruise ship.

Cruise Ship Control (by Kim Esbensen)
Here is the example, earlier presented by Kim Esbensen and recently by Oxana Rodionova. We will use it in order to explain what process optimization is and for this purpose some background details will be necessary. This is a cabotage vessel that runs along the seashore in Norway. In order to study this process a special agent was introduced into the crew. Actually he was a PhD student who gathers information for his thesis.

Key Process Variables This student could collect the following information that is shown in the table. Each row represents a small trip from one port to another. This trip is characterized by uncontrolled variables that are weather conditions occur during the way and by the controlled variables that are captain's orders about the propeller set up. The last group combines variables that represent the results of that trip – fuel consumption and ship’s speed.

PLS1 Prediction of Fuel Consumption
Samples Predicted vs. Measured Using these data the student constructed a PLS model that can predict the fuel consumption on the base of the given weather conditions and the known (or snooped) cap’s setup. Now, we came very close to the concept of passive and active optimizations. Weather conditions X1, X2, X3, X4 Cap’s setup X5, X6, X7 PLS1 Fuel Consumption Y

Passive Optimization 42 42 Captain Computer Student censored
Weather conditions Prediction ? X1, X2, X3, X4 X5, X6, X7 X5, X6, X7 censored Fuck Order!!! Order!!! Prediction ! Prediction ! 42 42 To explain them we will show you a 3-act drama, where 3 personages are involved. They are the captain (or technologist, manager, in other words the person in charge), the student (scientist, chemometrician or in other words absolutely irresponsible person) and a computer. The first act. Passive optimization. First scene. Cap thinks about the weather, student thinks about uncontrolled variables and cap gives an order. Second scene. Student thinks about the cap’s setup, computer calculates and student makes a prediction Third scene Cap thinks about the prediction and gives another order. Student thinks about the new order, computer calculates and student makes a new prediction. Cap delivers a speech about the student’s mother, about the proper usage of computers, and about the extraordinary utility of chemometrics for navigation. The curtain falls, entr’acte. Captain Computer Student

Active Optimization 42 Student Computer Captain Weather conditions
X5 X6, X7 Advice!!! X1, X2, X3, X4 Censored Optimal X5, X6, X7 Order? 42 The second act of the play. Active optimization. Forth scene Student thinks about the weather conditions, computer thinks about uncontrolled variables and calculates the optimal variables for propeller set-up. Fifth scene. Student thinks about the optimized set-up variables and advices the captain. Cap gives an order and computer thinks about the captain’s mother, about the proper usage of students, and about the extraordinary utility of chemometrics for computers. Student Computer Captain

In Hard Thinking about PC and PCs
Forty two censored The last Act. The cap is thinking about PC and PCs. The drama is over.

Multivariate Statistical Process Optimization (MSPO)
MSPO Objective To optimize the performance of the process (product quality) MSPO Concept To study historical data representing good past process behavior MSPO Methods That was a play. Now we shall see how it works in practice. MSPO objective is not to control, but to optimize the process performance, i.e. the the final product quality. To reach this target we use historical records as a knowledge base. These data are analyzed with conventional projection technique and (this is a novel point!) with a Simple Interval Calculation method. Projection methods and Simple Interval Calculation (SIC) method MSPO Approach To plot predicted quality at each process stage

Technological Scheme. Multistage Process
To illustrate our ideas, we consider a multi-stage continuous technological process. It is represented by 25 key variables x and by one output variable y, which is the final quality of the product. The whole cycle is divided into seven stages numbered by the Roman numerals. First stage (I) is represented by six input variables (W1, W2, W3 and S1, S2, S3) that stand for the properties of the raw components S and W. At the second stage (II) component W is refining and variables WR1 and WR2 characterize this process. Variables CW1, CW2, and CW3 (Stage III) represent the properties of the outcome product CW. The next stage (IV) is mixing of the raw component S and the refined component CW. The result M is characterized by variables M1, M2, and M3. Afterward, blend M is also refined (Stage V) with the process characteristics MR1 and MR2, and the properties of outcome CM are presented by variables CM1, CM2, and CM3 (Stage VI). The last stage (VII) stands for the ultimate amendments, which are done with additives A1,…, A6. The output variable (P=y) is the final product quality.

Historical Process Data
X preprocessing Y preprocessing We have a collection of historical data measured for 154 samples that characterize proper process performance. Each sample corresponds to the entire production cycle shown in previous slide. The whole data set is divided horizontally (by samples) in two parts: the training set (102 objects) and the test set (52 object). All data are also divided vertically (by variables) into 7 blocks in conformity with the technological stages: For example, block XIV is the (102*3) matrix, which includes 3 variables: M1, M2, and M3, and 102 calibration samples. The data are centered and scaled in such a way that each variable, including quality measure y, varies within range (–1,+1) and that all values outside this interval are not valid.

Quality Data (Standardized Y Set)
It is also assumed that the highest product quality corresponds to y=+1, as well as the lowest one corresponds to y=–1

General PLS Model Due to classical MSPC approach we take the whole calibration data set with 25 X-variables and construct the general PLS1 regression model employing 6 PLS components. After that, we perform an additional data scaling, multiplying some columns by factor –1, in order to make all regression coefficients positive. This is done to standardize the process reply, because now increasing of any process variable x leads to the improvement of quality y.

SIC Prediction. All Test Samples
On the base of the PLS-model we also construct the SIC-model of the entire process. Here you can see the predicted quality Y for all test samples together with SIC intervals of uncertainty. For illustration purposes we select five samples from the test set. They present the most typical cases with respect to the product quality as well as to the SIC-status i.e. sample’s relationship towards the process MSPC model.

SIC Prediction. Selected Test Samples
Outsiders Abs. Outsiders Insiders Sample No Quality status SIC Status 1 Normal Insider 2 High Outsider 3 Absolute outsider 4 Low 5 Let’s look at these selected samples narrowly and talk about SIC object classification. To understand this approach we need keep in mind two items. The first is that SIC method gives a measure of calibration error. It is shown in the left plot by the black error bounds around the referenced test values. We call them test intervals. The second item is the SIC prediction intervals that are shown by blue bars. Inspecting this plot one can see that the blue bars sometimes are inside the black bars. These are samples 1 and 5. It means that prediction error is less then calibration error. Good case. Sample 3 demonstrates the inverse case – blue bar is outside the black one. This is a bad case as prediction is worse, then calibration. At last, samples 2 and 4 represent cases, when prediction is biased against the test. SIC object status plot shows these relations in a compact form. The SIC leverage is actually the width of prediction interval, divided by calibration error, so it shows precision. The SIC residual is a difference between prediction and test, so it a bias characteristics. Samples lying in this triangle are similar to the calibration samples, so they are insiders – in other words they are inside the model. They completely agree with the model, thus insiders are the most trusted objects in prediction Samples that are out of triangle, lie outside the model, therefore they are called outsiders. Outsiders do not contradict the model and, being added to the calibration set, improve the calibration accuracy. However, while they are not in calibration set and they are less-than-perfect in prediction. There are may be two reasons: the width of prediction interval (that is, of course, the SIC-leverage) is greater than the calibration error, or there is a bias (that is characterized by SIC-residual). Usually working with a new sample, we do not know its reference, y value. In this case, it is impossible to calculate the SIC-residual, but we can calculate the SIC-leverage. Such a sample, which leverage is greater than one, could not become an insider for any response value. These samples form a special class of objects, which are called absolute outsiders. Even having no information about their reference values, we can vigorously state that prediction error of such samples will be more than the obtained calibration error. Now we see that selected test samples have the following properties with respect to the quality of final product and regarding to their SIC status.

Passive Optimization in Practice
Objective To predict future process output being in the middle of the process Concept To study historical data representing good past process behaviour Method Simple Interval Prediction We will show how passive optimization can be implemented in this example using selected samples for illustration. The main objective of passive optimization is to predict future process output being in the middle of the process. We will use historical records as the base and SIC method as the main tool. Approach Expanding Multivariate Statistical Process Control (E-MSPC)

Expanding MSPC, Sample 1 To make the passive optimization we apply a method of expanding MSPC, which appears to be a new approach in this area. At the end of first stage that is actually the beginning of the process we can predict y using the upper block as the training X set and the lower block as new X set. We obtain PLS point estimates ( red dots) and the SIC-intervals ( blue bars). We also show actual X-variables (dotted line) that were used in the production of these samples. They can be presented in the same plot due to the uniform scaling performed before modeling. At the next stage we may expand the X data block and obtain new prediction model. Using this calibration we can try different values the next forthcoming stage. They are variables WR1 and WR2 that represent set-up of stage II. We can select the best set-up and continue the process with these corrections. When stage II is over we start to adjust the next stage variables, using new expanded model. Thus, we build a series of PLS1 regressions where X matrix is expanding together with a process and Y matrix is fixed.

Expanding MSPC , Samples 2 & 3
The results of expanded modeling and correspondent SIC-modeling are presented in these slides where 5 plots correspond to five selected samples. In each plot we demonstrate the results of prediction of future quality y that are obtained on each process stage. Green rhombus in the right part of the plots show the quality value, y that was actually obtained in the production. The SIC-prediction intervals are decreasing along the process, but the reference value is always located inside them. Sample 2 (outsider) has a high predicted quality at the beginning of the cycle and later actions improve it. Sample 3 (absolute outsider) has the largest prediction intervals in the course of the process, but no corrections could improve its normal quality status

Expanding MSPC , Samples 4 & 5
Sample 4 (outsider) initially has a low predicted quality and all correction actions do not recover it, yet make it worse. Sample 5 (insider) keeps its normal status along the cycle without changes. The width of the SIC-interval (i.e. the degree of uncertainty) is smaller for the insiders (Samples 5) and is the largest for the outsiders (Sample 3).

Active Optimization in Practice
Objective To find corrections for each process stage that improve the future process output (product quality) Concept Corrections are admissible if they are similar to ones that sometimes happened in the historical data in the similar situation Method Simple Interval Prediction and Status Classification You have seen how passive optimization was done. In the course we didn’t suggest optimal process corrections but just checked how good were captain’s (operator’s) actions. Now we shall consider how to find the optimal correcting actions that could be performed at the end of each stage, i.e. the active optimization. Actually, this means the proper choice of the controlled variables that become the input variables for the next forthcoming stage Approach Multivariate Statistical Process Optimization (MSPO)

Linear Optimization Linear function always reaches extremum at the border. So, the main problem of linear optimization is not to find a solution, but to restrict the area, where this solution should be found. All our models are linear. The main problem in linear optimization is not to find a solution, but to restrict the area where this solution is sought . Optimization of a linear model on unreasonable region of predictors always gives the senseless decision, where optimized characteristic tends to infinity.

Optimization Problem Model
Weather conditions X1, X2, X3, X4 PLS1 Fuel Consumption Y Cap’s setup X5, X6, X7 Fixed variables Xfix PLS1 Quality measure Y Optimized Xopt Y = X*a = Y0 + Xopt*a2, where Y0 = Xfix*a1 = Const Model For given Xfix and a1 to find Xopt that maxi(mini)mizes Y Task Let’s remember the problem of cruise ship. There we had the model that consisted of two X blocks: fixed weather conditions and free setup values. The data were used to predict the quality measure, which was fuel consumption. In general can we can express this problem in such a way. Our linear prediction model may be represented as the two terms sum. We have the following task. For a given subset of fixed X and for a given parameter values a , to find the second X subset values that optimize the quality Y. The solution is easy as optimum always occurs at the boundary of optimized variable area. In our case it is the upper limits, as we have corrected all coefficient a to be positive. Thus we reduce the optimization problem to another problem, which is to restrict the area of optimized variables. max (Y) = Y0 + max (Xopt)*a2, as all a > 0 (by g factor) Solution

Interval Prediction of Xopt
PLS2 Xopt Let us consider (just for example) optimization of the forth stage of the process. Here we have 11 fixed variables into 3 past stages. Our aim is to find optimal setup for the next, fourth block. The first, rough restrictions on the predictor area come from the physical or the technological meaning of the process variables. In our case they are +- 1. To find the more sensible range, we suggest applying the main MSPC concept again. We use this block of historical data as X matrix and that block as Y matrix and built PLS2 model that can predicts the forth block variables in the course of production. These red dots presents PLS prediction of variable M1 for all selected test samples. To get the intervals we could draw PMSEP intervals around these points. However, we prefer to use the SIC intervals that take into account individual features of samples. Here you can see the ultimate results where optimal X values (they are upper bounds) are marked with green squares.

Dubious Result of Optimization
Applying this method to all stages of the process we obtain the optimized process variables (blue line) that are above the old test X values (black line). We can also build the prediction of quality based on this optimized process setup. Oops, we are in a mess. Predicted quality is out of sense. No drink could be stronger then 100 degrees! What’s up? The point is that predicted Xopt variables are out of model! Predicted Xopt variables are out of model!

Concept Corrections are admissible if they are similar to ones that sometimes happened in the historical data in the similar situation. Optimal variables Xopt should be within the model ! Let come back to the predicted X intervals. To get a reasonable prediction we have to follow this concept. What model do we mean? This is PLS2 model, which predicts the next block of X variables using all previous blocks as X data. If we look at the SIC-intervals for each model together with their object status plot, we see that some samples are out of the insiders area, in other words they are out of model. We have to put them in reducing their intervals in such a way that they are located completely inside the test intervals. Now all samples are insiders and corresponding intervals could give a rational forecast.

Sample 1 Normal Quality Insider
At last you can see the reasonable output. This is optimization of samples 1. It was Normal Quality Insider and it became a little bit better.

Sample 2 High Quality Outsider
Sample 2 (outsider of high quality) did not change its quality, but also did not became worse

Sample 3 Normal Quality Abs. Outsider
Sample 3 (absolute outsider of normal quality) turned into a high quality object

Sample 4 Low Quality Outsider
Sample 4 (outsider of low quality) became a normal quality object

Sample 5 Normal Quality Insider
Sample 5 (insider of normal quality) did not change its quality; it kept its normal status The presented results give some reasons for general conclusions. Active optimization shows that status of a sample regarding the general calibration model determines its chances to be optimized. All insiders (test Samples 1 and 5) are the worst objects for improvements. These samples are so average, so normal, that no actions could enhance them. Apparently, this comes from the common technological wisdom that if something is going well let it go at that. This historical experience collected in the actual process records, constricts the area of available actions with x-variables to a very narrow range, which does not permit any enhancements. On the other hand, the outsiders, i.e. the outstanding objects that are far from the common zero level, give an excellent chance to improve or to worsen them. If we undertake some correcting actions with such marginals, they immediately respond with improvements. This happens because the area of possible X-variables is much wider for these unusual, extreme samples. The special case is an absolute outsider (Sample 3), which is an extraordinary object with the abnormal initial properties of the raw components. The historical data contain no knowledge about such samples, so the area of possible corrections is always so wide that it requires some shrinkage to put it in the routine mainstream. Nevertheless, even restricted, such a sample demonstrates excellent possibilities to ‘upbringing’ and it could become a high quality object. Further generalization, association, and parallels to a human inclinations and their development are, unfortunately, out of scope today.

Philosophy of MSPO. Food Industry