ENBIS-18 in Nancy

2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018

Process Optimization through PLS Model Inversion Using Historical Data (Not Necessarily from DOE)

3 September 2018, 15:30 – 15:50


Submitted by
Alberto J. Ferrer-Riquelme
Alberto Ferrer (Universidad Politécnica de Valencia), Daniel Palací-López (Universidad Politécnica de Valencia)
Process data in modern industry, although shares many of the characteristics presented in Big Data (i.e. volume, variety, veracity, velocity and value), may not really be that “Big” in comparison to other sectors such as social networks, sales, marketing and finance. However, the complexity of the questions we are trying to answer with industrial process data is really high. Not only do we want to find and interpret patterns in the data and use them for predictive purposes, but we also want to extract meaningful relationships that can be used for trouble-shooting and process optimization (García-Muñoz and MacGregor 2016).

Optimizing a production process requires building a causal model that explains how changes in input variables (e.g. materials and their properties, processing conditions…) relate to changes in the outputs (e.g. amount of product obtained, its quality, purity, value, generated pollutants…). To this purpose, deterministic (i.e. first principles-based) models are always desirable. However, the lack of knowledge and the generally ample need of resources required to properly construct such models makes their use unfeasible in a large number of cases. This is why data-driven models are often resorted to (Liu and MacGregor 2005, Bonvin et al. 2016).

To guarantee causality, when using data-driven approaches, independent variation in the input variables is required. This could be obtained from a Design of Experiments (DOE) (Box, Hunter and Hunter 2005) performed on the plant. The problem is that this is quite difficult to get in real practice. On the contrary, large amounts of historical plant operating data (highly collinear and low rank data not from a DOE) are available in most production processes. In these contexts, classical linear regression (LR) or even machine learning (ML) methods cannot be used for process optimization because none of the infinite number of good prediction models that can be fitted is unique or causal. The problem is that the process variables are highly correlated and the number of independent variations in the process is much smaller than the number of measured variables. This calls for the use of latent variable models such as PLS (Partial Least Squares).

PLS models are especially suited to handle “Big” Data. They assume that the input (X) space and the output (Y) space are not of full statistical rank so they not only model the relationship between X and Y (as classical LR and ML models) but also provide models for both the X and Y spaces. This fact gives them a very nice property: uniqueness and causality in the reduced latent space no matter if the data come either from a DOE or daily production process (historical data) (MacGregor 2018).

Following the ideas of Jaeckle and MacGregor (2000) and Tomba, Barolo and García Muñoz (2012) in this talk we are going to illustrate how to guide a process optimization by PLS model inversion using real historical data of a petrochemical process (not obtained from a DOE). Opening issues will also discussed.

Return to programme