ENBIS-18 in Nancy2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018
The following abstracts have been accepted for this event:
Application of the Bayesian Spline Model to Estimate Task-Specific Exposures for Volatile Organic Compounds
Authors: M. Abbas Virji (National Institute for Occupational Safety and Health), E. Andres Houseman (Consultant)
Primary area of focus / application: Other: Statistical methods in industrial hygiene
Keywords: Bayesian, Limit of detection, Non-stationary, Spline model, Time-series
Submitted at 31-May-2018 15:04 by M. Abbas Virji
Accepted (view paper)
Self-Prediction of Migraine Days: Analysis of Cohort of Migraine Patients Using a Digital Platform
Authors: Marina Vives-Mestres (Curelator Inc.), Kenneth J. Shulman (Curelator Inc.), Alec Mian (Curelator Inc.), Noah Rosen (Northwell Health)
Primary area of focus / application: Business
Secondary area of focus / application: Modelling
Keywords: Headache, Self-monitoring, Self-prediction, eHealth, mHealth
Submitted at 1-Jun-2018 11:23 by Marina Vives-Mestres
In this study, we examine the ability of 1,537 migraineurs to predict their own attacks 24hrs in advance. Prediction of migraine days might be expected to be difficult as there is significant confusion around what is a premonitory symptom and/or trigger factors with respect to cause and effect. Additionally, migraine premonitory symptoms and the potential trigger factors show significant inter-individual variation1. However, learning to accurately predict migraine attacks may aid in self-management of the condition, impact quality of life and allow optimal timing of medication dosing. Also understanding of the strategy of good predictors may lead to generalizable and useful information for other individuals.
Individuals with migraine registered to use Curelator Headache® and then used the digital platform to enter on a daily basis lifestyle factors, possible headaches, and medications as well as migraine expectation for the next 24 hours. For each individual we are interested in the four variables: (1) number of correct migraine day predictions, (2) number of correct migraine-free day predictions, (3) number of wrong migraine day predictions and (4) number of wrong migraine-free day predictions. The four variables (2x2 contingency table) form a composition (living in a restricted space; the simplex) and a multiple regression on the log-ratio coordinates is fitted and adjusted by covariates.
Individuals who predicted better than random differed from those with random predictions in gender (having females the greater proportion in the non-random group), age, number of tracked days and migraine frequency. In all cases the average is higher in the non-random predictors group. Almost all individuals with non-random predictions have migraine expectation positively associated with migraine occurrence the next day. The retained log-ratio model includes the variables: total migraine days tracked with Curelator, migraine frequency, gender and account type. Good migraine day predictors have higher migraine frequency than good migraine-free predictors. Individuals tracking overall more migraines are worse predictors than those having tracked less migraines. Regularly menstruating females do more often use the high/moderate predictions than other females and finally, paid users wrongly predict migraine days more often than other users.
Migraine frequency is the most relevant variable explaining migraine predictions, thus it is possible that the strategy of good predictors is simply an a priori knowledge of the probability of having a migraine based on their past experience. The second most relevant variable is the number of tracked migraine days possibly indicating that individuals having more trouble on managing their condition stay longer using Curelator. Finally up to 46 daily factors were included in the simplicial model barely improving it but indicating that individual predictions are not only based on what happens the day before the migraine, but also on a longer term relation between factors and migraine occurrence.
1. Peris F et al. Towards improved migraine management: Determining potential trigger factors in individual patients. Cephalalgia. 2017; 37(5):452-463
Statistical Analysis to Predict Clinical Outcomes with Complex Physiologic Data
Authors: Monica Puertas (Instituto para la Calidad - Pontificia Universidad Católica del Peru), Jose Zayas-Castro (University of South Florida), Peter Fabri (University of South Florida)
Primary area of focus / application: Other: South American session
Secondary area of focus / application: Mining
Keywords: Prognostic analysis, Clinical outcomes, ICU patients, Platelet count
Submitted at 1-Jun-2018 22:25 by Monica Puertas
Accepted (view paper)
Optimal Bayesian Design via MCMC Simulations for a Soldering Reliability Study
Authors: Rossella Berni (Department of Statistics, Informatics, Applications -University of Florence)
Primary area of focus / application: Design and analysis of experiments
Secondary area of focus / application: Reliability
Keywords: Optimal experimental design, Bayesian design, Utility function, Reliability
Submitted at 2-Jun-2018 19:18 by Rossella Berni
Notwithstanding the generality achieved, in actual applications further flexibility is often needed, for example by defining a utility function in which the cost of each observation depends on the value taken by the independent variable. Moreover, the relevance for costs may be also evaluated by specific weights, which take environmental conditions and technological information into account.
In this talk, we consider the improving of building optimal designs in the technological field by applying Markov Chain Monte Carlo simulations, and by evaluating: i) an hierarchical structure of the observed data; ii) an utility function including costs and weights; iii) modelling discrimination.
Deep k-Means: Jointly Clustering with k-Means and Learning Representations
Authors: Thibaut Thonet (University Grenoble Alpes - LIG)
Primary area of focus / application: Other: Session on deep learning
Secondary area of focus / application: Mining
Keywords: Clustering, Deep learning, k-means, Auto-encoders, Unsupervised learning
Submitted at 3-Jun-2018 19:58 by Thibaut Thonet
Accepted (view paper)
Process Optimization through PLS Model Inversion Using Historical Data (Not Necessarily from DOE)
Authors: Alberto Ferrer (Universidad Politécnica de Valencia), Daniel Palací-López (Universidad Politécnica de Valencia)
Primary area of focus / application: Other: Process Chemometrics
Keywords: Process optimization, Partial Least Squares (PLS), Model inversion, Latent variables
Submitted at 4-Jun-2018 00:24 by Alberto J. Ferrer-Riquelme
Optimizing a production process requires building a causal model that explains how changes in input variables (e.g. materials and their properties, processing conditions…) relate to changes in the outputs (e.g. amount of product obtained, its quality, purity, value, generated pollutants…). To this purpose, deterministic (i.e. first principles-based) models are always desirable. However, the lack of knowledge and the generally ample need of resources required to properly construct such models makes their use unfeasible in a large number of cases. This is why data-driven models are often resorted to (Liu and MacGregor 2005, Bonvin et al. 2016).
To guarantee causality, when using data-driven approaches, independent variation in the input variables is required. This could be obtained from a Design of Experiments (DOE) (Box, Hunter and Hunter 2005) performed on the plant. The problem is that this is quite difficult to get in real practice. On the contrary, large amounts of historical plant operating data (highly collinear and low rank data not from a DOE) are available in most production processes. In these contexts, classical linear regression (LR) or even machine learning (ML) methods cannot be used for process optimization because none of the infinite number of good prediction models that can be fitted is unique or causal. The problem is that the process variables are highly correlated and the number of independent variations in the process is much smaller than the number of measured variables. This calls for the use of latent variable models such as PLS (Partial Least Squares).
PLS models are especially suited to handle “Big” Data. They assume that the input (X) space and the output (Y) space are not of full statistical rank so they not only model the relationship between X and Y (as classical LR and ML models) but also provide models for both the X and Y spaces. This fact gives them a very nice property: uniqueness and causality in the reduced latent space no matter if the data come either from a DOE or daily production process (historical data) (MacGregor 2018).
Following the ideas of Jaeckle and MacGregor (2000) and Tomba, Barolo and García Muñoz (2012) in this talk we are going to illustrate how to guide a process optimization by PLS model inversion using real historical data of a petrochemical process (not obtained from a DOE). Opening issues will also discussed.