Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and.

Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Motivation n It is conjectured that there may be factors relating specifically to a process that cannot be identified from the spectroscopic measurements that could be described by the process data or vice versa. n Consequently one way to enhancing prediction accuracy and process performance and fault detection is through the integration of process and spectral data. n The aim of the subsequent studies was to investigate the combined power of spectral and process data.

Overview n Process Modelling l Fermentation Process Spectral Data Spectral and Process Data n Process Monitoring and Fault Detection l Polymer-resin Manufacturing Process Data Process and Spectral Data

Challenges in the Monitoring of Fermentation Processes n Fermentation is a process in which micro-organisms convert chemical species to products of higher value. n On-line information relating to the progression of the process is not easily attained. n Near Infrared and Mid Infrared spectroscopy have been applied for the monitoring of fermentation processes. n The successful implementation of these spectroscopic approaches necessitates the application of appropriate multivariate data analysis techniques, such as partial least squares (PLS).

Experimental Data Set n The industrial pilot-plant scale Streptomyces fermentation process involves two stages: l Seed stage l Final stage n The seed stage materialises in the generation of biomass. l The starting ingredients include carbohydrate, soya protein, vegetable oil and trace elements in water. n The biomass is transferred to the final stage for the production of the desired product. l The final stage is a fed batch process lasting approximately 140hrs. n NIR measurements were collected for the final stage of the process.

Spectra Data Acquisition n The NIR spectral data were recorded using a Zeiss Corona 45

Description of the Data Set n Final stage data from 7 standard batches and 7 Design of Experiment batches form the basis of the subsequent analysis. n Data collected included on-line process data, off-line data, biochemical and NIR measurements.

Methodological Summary n Pre-processing of the spectral data set l First derivates l Splining n Segmented wavelength region selection n Global modelling – Linear PLS, Neural Network PLS, Quadratic PLS n Local modelling - Linear PLS, Neural Network PLS, Quadratic PLS n Bagging of the models l Linear partial least squares l Averaging

Data Pre-processing n The NIR data (Zeiss Corona NIR) were recorded every 15 minutes and the first derivatives were taken. n Since only ten values of titre were recorded, a spline was fitted to the data. l The splined titre values were aligned to the 550 spectral values for each batch. n The range utilised for both the spectral and quality data was 43.75 to 125 log hours.

Data Pre-processing

NIR Data and First Derivatives NIR Data First Derivative

Spectral Window Selection Algorithm N Select training and validation batches Mean centre and take derivatives of the spectral data Generate random centres and widths Build model ‘input’ matrix eliminating common data. Generate PLS model Calculate RMS errors Generate random changes to centres and widths Apply the random changes to the current centres and widths Build new input matrix, generate model and calculate RMS errors Has the RMS on training data decreased? Has number of iterations been exceeded and there are more models to build ? Present the final bagged model N Y Y

Spectral Window Selection Algorithm Centre Width Generate random increment in centre and width Centre Width Update the centre and width Take another step with the Centre and Width increment Step too far. The prediction error has increased. Go back to where we were. Generate a new increment in centre and width and continue search Has the prediction error decreased? Yes, then a step in the right direction

Benefits of the SWS Algorithm n SWS offers the opportunity to consider not only the extremes of a single wavelength and the full set but also restricts selection to multiple sub-sets of the full set. n Finds the ‘best’ possible models for the product concentration and the biochemical components. n Finds the ‘best’ wavelength range from which these models can be built.

Bagging n SWS does not provide a unique model. n To obtain a more robust model, bagging is implemented. n ‘Resample and Combine’ method or ‘bagging’ is an algorithm that helps improve the robustness of models by combining predictions from different models.

Bagging of Models n 30 models were generated by changing the initial random seed of the wavelength selection algorithm. n Bagging was applied to the 30 models: l The average value was calculated from the output of the 30 models. l A PLS model was fitted between the real and fitted values to give a weighted average.

Global and Local Modelling

2 critical points at 70 and 100 hours were identified from plots of the biochemical data Local Modelling

First Time IntervalSecond Time IntervalThird Time Interval

Local Modelling Approach n Three time regions for both the spectra and the quality variable values (titre) were selected.  Samples up to 70 log hours, i.e 175-280 sample points.  From 70 log hours to 100 log hours, i.e 280-400 sample points.  From 100 log hours up to the end of the chosen window, i.e. 400-500 sample points.

Local Modelling Approach Region 2 Region 3

Results : Time Interval 1

n The RMS of the training set for models 1, 7 and 29 is large. n The RMS of the validation data set for models 1, 7 and 29 is small. n The RMS error for PLS Bagging is smaller than the error of each individual model RMS error after PLS Bagging

Linear PLS – Region 1 (Wavelength Selection) Training Data Set Validation Data Set

Results : Time Interval 1 The wavelengths between 30 and 40 are selected most frequently.

Neural Network PLS – Region 2 (Wavelength Selection) Training Data Set Validation Data Set

Polynomial PLS – Region 3 (Wavelength Selection) Training Data Set Validation Data Set

Local Modelling : Training Data Set Global Modelling Local Modelling Global Modelling predictions Local Modelling predictions for time intervals 1, 2 and 3

Local Modelling : Validation Data Set 1rst Time Interval 2nd Time Interval 3rd Time Interval

Genetic Algorithm Results Genetic algorithms provide the possibility of selecting individual wavelengths but potentially does not predict future samples well. SWS Genetic Algorithms

GA Results – Region 2 SWS Averaging Ga’S Averaging RMS of Validation - SWS: 0.048 GAs:0.069

Genetic Algorithm Results Time Interval 1Time Interval 2Time Interval 3 PLS Bagging Average Bagging PLS Bagging Average Bagging PLS Bagging Average Bagging SWS with Linear PLS0.0180.0340.0250.0340.0390.060 GAs with Linear PLS0.018 0.0230.0240.0370.038 TRAINING Time Interval 1Time Interval 2Time Interval 3 PLS Bagging Average Bagging PLS Bagging Average Bagging PLS Bagging Average Bagging SWS with Linear PLS0.0450.0490.0590.0480.0950.058 GAs with Linear PLS0.0450.0430.0670.0690.1770.139 VALIDATION RESULTS

Summary of Results n GAs produced slightly better predictions for the training data set resulting in overfitting. n In the validation model, SWS combination with bagging for local modelling gave better results than the GA in combination with bagging. n Local modelling gives better results than global modelling. n SWS with bagging gives better results compared with the purported ‘one-shot wonder’ models.

Design of Experiment Data Integration of Process and Spectral Data

Conjunction of Process and Spectral Data n In the later stages of the fermentation, the error in the calibration models was observed to be greater with offsets being present. n During this time, significant changes in the fermentation broth concentrations occur. n The offset can potentially be modelled by utilising other process information such as off-gas measurements.

Data Set and Aim n The aim is to infer product concentration and the biochemical components from the spectral data. n Working on the off-line, biochemical and NIR data for the design of experiment batches. n Changing conditions in experimental design: Temperature (°C) pH Sugar feed (gh -1 ) Oil feed (%)

Conjunction of Process and Spectral Data MODEL Spectral Σ + Biochemical Concentration - Calibration spectral residuals MODEL Process Data Σ + Calibration Spectral Residuals - Innovations First Step: Calculation of the calibration spectral residuals. Second Step: Modelling of the calibration spectral residuals from the process data and the generation of the innovations. Σ Biochemical Concentration Predictions by Spectra Residuals Prediction by Process Data Final Product Concentrations Final Step: Prediction of the product concentration

Conjunction of Process and Spectral Data CER CO 2 Total pH OUR Temperature 5 variables were considered to be the most important for the prediction of product concentration 2004006008001000120014001600 Time Series Plot 5 pH 2004006008001000120014001600 Time Series Plot 2 CER 2004006008001000120014001600 Time Series Plot 3 CO2 Total 2004006008001000120014001600 Time Series Plot 9 OUR

050010001500 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted train values Conjunction of Process and Spectral Data Predictions Residuals 050010001500 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Residuals for training data set 0 50100150200250300350 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Predicted valid values

Final predictions of the product 050100150200250300350 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 Real values, Predicted values and Final predicted values for valid New residuals The off-set is reduced The residuals exhibit less structure and reflect noise Conjunction of Process and Spectral Data

Conclusions n A Spectral Window Selection (SWS) algorithm has been proposed to select a window of wave numbers. n Multiple models are ‘bagged’ to produce a more robust model. n SWS produces better results than when the complete wavelength region is included. n Process data was combined with spectral data to eliminate offsets. n The wavelength selection-bagging approach in combination with the process data is now under investigation. n The results to date are promising.

Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and.

Similar presentations

Presentation on theme: "Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and.

Similar presentations

Presentation on theme: "Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and."— Presentation transcript:

Similar presentations

About project

Feedback