ENBIS: European Network for Business and Industrial Statistics
Forgotten your password?
Not yet a member? Please register
ENBIS-18 in Nancy
2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018The following abstracts have been accepted for this event:
-
A Comparison of Determining the Number of Components of a PLS Regression for MAR Mechanism
Authors: Titin Agustin Nengsih (University of Strasbourg)
Primary area of focus / application: Modelling
Secondary area of focus / application: Mining
Keywords: Missing data, Imputation methods, PLS regression, NIPALS, Comparison study, Missing at random
The goal of our simulation study is to analyze the impact of the missing data proportion under missing at random (MAR) assumption on the estimation of the number of components of a PLS regression. We compare six criteria for selection of the number of components of a PLS regression according to PLS regression with NIPALS algorithm (NIPALS-PLSR) on incomplete data and PLS regression on imputed data set which used three methods of imputation: multiple imputation by chained equations (MICE), k-nearest neighbor imputation (KNNimpute) and a singular value decomposition imputation (SVDimpute). The criteria are Q2-LOO, Q2-10-fold, AIC, AIC-DoF, BIC, and BIC-DoF on different proportions of missing data (ranging from 5 to 50%) and under a MAR assumption. Our simulation study shows that whatever the criterion used, the correct number of components of a PLS regression is difficult to determine, especially for small sample size and when the proportion of missing data is larger than 30%. MICE had the closest to the correct number of components at each frequency of missingness although it needs a very long time for the execution. Furthermore, NIPALS-PLSR ranked second, followed by KNNimpute and SVDimpute. Whatever the criterion, except Q2-LOO, the number of components in a PLS regression is far from the true one and tolerance to incomplete data sets depends on the sample size, the proportion of missing data and the chosen component selection method. -
Information Design & Usability – Fancy Things Sell
Authors: Anja Zernig (KAI - Kompetenzzentrum Automobil- und Industrieelektronik GmbH), Claudia Korizek (Julius Blum GmbH), Wei-Ting Yang (École des Mines de Saint-Étienne), Stefanie Feiler (AICOS Technologies AG), Kathrin Plankensteiner (FH Vorarlberg)
Primary area of focus / application: Other: Young Statisticians
Keywords: Visualization, Interactive session, Knowledge sharing, Discussions
This interactive session consists of two interconnected parts:
First, the concept of Information design, its do’s and don’ts, enriched with some tips and tricks are presented by Claudia Korizek. Further, the concept of Usability is presented by Wei-Ting Yang, who already has some experience in that field.
In the second part of this session, everyone is invited to share his/her experiences in Information design and Usability in small discussion groups. Also listeners are warmly welcome!
The second part of this session, guided by Stefanie Feiler, is on putting things into practice: in small groups you will tackle a project yourself. Onlookers are of course warmly welcome, too!
This special session, organized by young statisticians, addresses both young and experienced statisticians to share their experiences and to learn from each other. With your participation, we are looking forward to a lively and informative session. -
Online NMF with Minimum Volume Constraint for Hyperspectral Pushbroom Imaging Systems and the Estimation of the Regularization Parameter
Authors: Ludivine Nus (CRAN), Sebastian Miron (CRAN), David Brie (CRAN)
Primary area of focus / application: Design and analysis of experiments
Keywords: Hyperspectral imaging, Pushbroom imager, Online non-negative matrix factorization, Minimum volume constraint, Pareto front, Minimum distance criterion
Submitted at 15-May-2018 12:57 by Ludivine NUS
Accepted
A relevant method dedicated to this type of applications is on-line Non-negative Matrix Factorization (NMF), which is an adaptive version of the classical NMF. For a non-negative matrix X, the NMF consists in finding two non-negative matrices S and A such that: X~SA. The goal of on-line NMF methods is to sequentially update in real-time the endmembers (S) and the abundances (A) for each new acquired sample. In general, the NMF suffers from non-uniqueness of the solution. In order to reduce the set of admissible solutions, we integrate a minimum volume simplex (MVS) constraint, resulting in the on-line MVS-NMF method.
However, the effectiveness of the online MVS-NMF is hampered by the optimal determination of the strength of minimum volume simplex term. To answer this problem, we formulate it as a bi-objective optimization problem, resulting in a linear plot (response curve) of the data fitting versus regularization cost. In order to estimate the optimal value of the MVS hyperparameter, we propose to use the Minimum Distance Criterion (MDC); This choice of MDC is motivated by the fact that the MDC solution is unique under mild conditions, unlike other criteria (e.g., the maximum curvature of L-curve). By performing experiments on a simulated image and on real hyperspectral wood data, we show that our method is well-suited for the estimation of the optimal value of the MVS hyperparameter. -
Sequential Detection of Transient Changes and Its Application to Spectral Analysis
Authors: Blaise Guepie (UTT), Edith Grall (UTT), Pierre Beauseroy (UTT), Igor Nikiforov (UTT), Frédéric Michel (CEA)
Primary area of focus / application: Process
Keywords: Sequential detection, Transient signal, Sodium fast reactor, Spectral analysis
Submitted at 17-May-2018 17:35 by Blaise GUEPIE
Accepted
The considered optimality criterion minimizes the worst-case probability of missed detection provided that the worst-case probability of false alarm during a certain period is upper bounded. This kind of criterion is typical for safety-critical applications such as cyber-physical systems security (see for example [1]) or nuclear reactors security.
The problem of sodium fast reactors (SFR) heat exchanger monitoring is studied in the actual presentation. The SFR use sodium-heated steam generators or a sodium-gas heat exchanger to transfer energy from the secondary to the tertiary circuit. In both cases, the heat exchanger should be permanently monitored in order to detect a leak of water or nitrogen into sodium circuit, which can affect the SFR performance or safety.
The system of monitoring uses accelerometers installed on the heat exchanger. The goal is to detect small leaks of the heat exchanger in the presence of a high normal operating noise coming from different equipment (pumps, turbine, steam generator ...).
The proposed solution is based on the spectral analysis of accelerometer’s signals. The previously developed suboptimal CUSUM-type transient change detection algorithm [2], applied to the fast Fourier transform, is studied. The worst-case probability of missed detection and the worst-case probability of false alarm are calculated and analyzed as functions of the spectral densities of normal and abnormal operating modes of the heat exchanger.
References:
1. V. L. Do, L. Fillatre, I. Nikiforov and P. Willett Security of SCADA Systems Against Cyber–Physical Attacks. IEEE Aerospace & Electronics Systems Magazine, v. 32, n.5, pp. 28 - 45.
2. B. K. Guépié, L. Fillatre and Igor Nikiforov Detecting a Suddenly Arriving Dynamic Profile of Finite Duration. IEEE Transactions on Information Theory, v. 63, n. 5, pp. 3039 - 3052 -
A Bayesian Self-Starting Shiryaev Statistic for Phase I Data
Authors: Panagiotis Tsiamyrtzis (Athens University of Economics and Business), Konstantinos Bourazas (Athens University of Economics and Business)
Primary area of focus / application: Modelling
Secondary area of focus / application: Process
Keywords: Bayesian Statistical Process Control and monitoring, AMOC, Persistent shifts, Phase I, Short runs
Submitted at 18-May-2018 10:57 by Panagiotis Tsiamyrtzis
Accepted
In this work, we focus our attention on detecting persistent shifts in the parameters of interest under at most one change (AMOC) scenarios during phase I, where low volume data are available. We propose a Bayesian scheme, which is based on the cumulative posterior probability that a step change has already occurred. The proposed methodology is a generalization of Shiryaev’s methodology, as it allows both the parameters and shift magnitude to be unknown. Furthermore, the Shiryaev’s assumption that the prior probability on the location of the change point is constant will be relaxed. Posterior inference for the unknown parameters and the location of a (potential) change point will be provided.
A real data set will illustrate the Bayesian self-starting Shiryaev’s scheme, while a simulation study will evaluate its performance against standard competitors in the case of Normal data. -
Point Processes for Studying Failures Distribution on Linear Networks
Authors: Nicolas Dante (Institut Elie Cartan de Lorraine (IECL)), Bérengère Sixta Dumoulin (Syndicat des eaux d’Ile-de-France (SEDIF)), Radu Stefan Stoica (Institut Elie Cartan de Lorraine (IECL))
Primary area of focus / application: Modelling
Keywords: Point processes, Linear networks, Statistical inference, Modelling failures on linear networks
Submitted at 18-May-2018 16:05 by Nicolas Dante
Accepted
This talk presents the application of point processes on linear networks for the spatial spread of failures in the water distribution network.
First, the point processes on linear networks and main results of characteristics are re-called. Next, simulation algorithms are discusses, and then, inference methods are proposed. Results are presented on simulated and real data. Finally, conclusions and perspectives are depicted.