ENBIS-18 in Nancy

2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018

My abstracts

 

The following abstracts have been accepted for this event:

  • A Comparison of Determining the Number of Components of a PLS Regression for MAR Mechanism

    Authors: Titin Agustin Nengsih (University of Strasbourg)
    Primary area of focus / application: Modelling
    Secondary area of focus / application: Mining
    Keywords: Missing data, Imputation methods, PLS regression, NIPALS, Comparison study, Missing at random
    Submitted at 14-May-2018 13:50 by Titin Agustin Nengsih
    Accepted (view paper)
    3-Sep-2018 14:20 A Comparison of Determining the Number of Components of a PLS Regression for MAR Mechanism
    Missing data are endemic in much business and industry research. Missing data have been a pervasive problem in data analysis since the origin of data collection. Several methods have been developed for handling incomplete data. Method of imputation is the process of substituting missing data before estimating the relevant model parameters. Furthermore, PLS (Partial Least Squares) regression is a multivariate model for which two algorithms (SIMPLS or NIPALS) can be used to provide its parameters estimates which have been extensively used in the field of much business and industry research because of its effectiveness in analyzing causal relationships between several components. However, a few discussions can be found on how to handle missing data when using a PLS regression. The NIPALS algorithm has the interesting property of being able to provide estimates on incomplete data. Selection of the number of components to build a representative model in PLS regression is an important problem. Fitting the number of components of a PLS regression on incomplete data set leads to the problem of model validation, which is generally done using cross-validation. Determination of the number of components relies on several different criteria such as the Q2 criterion, the Akaike Information Criterion (AIC), or the Bayesian Information Criterion (BIC).

    The goal of our simulation study is to analyze the impact of the missing data proportion under missing at random (MAR) assumption on the estimation of the number of components of a PLS regression. We compare six criteria for selection of the number of components of a PLS regression according to PLS regression with NIPALS algorithm (NIPALS-PLSR) on incomplete data and PLS regression on imputed data set which used three methods of imputation: multiple imputation by chained equations (MICE), k-nearest neighbor imputation (KNNimpute) and a singular value decomposition imputation (SVDimpute). The criteria are Q2-LOO, Q2-10-fold, AIC, AIC-DoF, BIC, and BIC-DoF on different proportions of missing data (ranging from 5 to 50%) and under a MAR assumption. Our simulation study shows that whatever the criterion used, the correct number of components of a PLS regression is difficult to determine, especially for small sample size and when the proportion of missing data is larger than 30%. MICE had the closest to the correct number of components at each frequency of missingness although it needs a very long time for the execution. Furthermore, NIPALS-PLSR ranked second, followed by KNNimpute and SVDimpute. Whatever the criterion, except Q2-LOO, the number of components in a PLS regression is far from the true one and tolerance to incomplete data sets depends on the sample size, the proportion of missing data and the chosen component selection method.
  • Information Design & Usability – Fancy Things Sell

    Authors: Anja Zernig (KAI - Kompetenzzentrum Automobil- und Industrieelektronik GmbH), Claudia Korizek (Julius Blum GmbH), Wei-Ting Yang (École des Mines de Saint-Étienne), Stefanie Feiler (AICOS Technologies AG), Kathrin Plankensteiner (FH Vorarlberg)
    Primary area of focus / application: Other: Young Statisticians
    Keywords: Visualization, Interactive session, Knowledge sharing, Discussions
    Submitted at 14-May-2018 16:35 by Anja Zernig
    Accepted (view paper)
    4-Sep-2018 15:20 Special Session: Young Statisticians Inviting
    Our business as statisticians and data scientists is to find answers by investigating data with suitable methods. On the one hand, correct and reliable results have to be provided. On the other hand, these results need to be presented in order to meet the expectations of the customer. Most often, two different deliverables are desired by our customers: the answer to a special question by the investigation of a given data set, or a software routine the customer wants to apply to further data sets on its own. For the first deliverable, the preparation and the presentation of the outcomes are of utmost importance. Here, the discipline called “Information design” provides approaches on how to deal with the appropriate visualization of information. For the second deliverable, the software routine, an intuitive interface is needed. The discipline called “Usability” deals with questions of how such interfaces should look like.

    This interactive session consists of two interconnected parts:

    First, the concept of Information design, its do’s and don’ts, enriched with some tips and tricks are presented by Claudia Korizek. Further, the concept of Usability is presented by Wei-Ting Yang, who already has some experience in that field.

    In the second part of this session, everyone is invited to share his/her experiences in Information design and Usability in small discussion groups. Also listeners are warmly welcome!

    The second part of this session, guided by Stefanie Feiler, is on putting things into practice: in small groups you will tackle a project yourself. Onlookers are of course warmly welcome, too!

    This special session, organized by young statisticians, addresses both young and experienced statisticians to share their experiences and to learn from each other. With your participation, we are looking forward to a lively and informative session.
  • Online NMF with Minimum Volume Constraint for Hyperspectral Pushbroom Imaging Systems and the Estimation of the Regularization Parameter

    Authors: Ludivine Nus (CRAN), Sebastian Miron (CRAN), David Brie (CRAN)
    Primary area of focus / application: Design and analysis of experiments
    Keywords: Hyperspectral imaging, Pushbroom imager, Online non-negative matrix factorization, Minimum volume constraint, Pareto front, Minimum distance criterion
    Submitted at 15-May-2018 12:57 by Ludivine NUS
    Accepted
    3-Sep-2018 14:00 Online NMF with Minimum Volume Constraint for Hyperspectral Pushbroom Imaging Systems and the Estimation of the Regularization Parameter
    This work aims at developing real-time hyperspectral image unmixing methods. These methods are required in industrial applications for controlling and sorting input materials. One of the most employed technical solutions makes use of a pushbroom imager; the hyperspectral data cube is acquired slice by slice, sequentially in time, by moving the objects directly on the conveyor belt.

    A relevant method dedicated to this type of applications is on-line Non-negative Matrix Factorization (NMF), which is an adaptive version of the classical NMF. For a non-negative matrix X, the NMF consists in finding two non-negative matrices S and A such that: X~SA. The goal of on-line NMF methods is to sequentially update in real-time the endmembers (S) and the abundances (A) for each new acquired sample. In general, the NMF suffers from non-uniqueness of the solution. In order to reduce the set of admissible solutions, we integrate a minimum volume simplex (MVS) constraint, resulting in the on-line MVS-NMF method.

    However, the effectiveness of the online MVS-NMF is hampered by the optimal determination of the strength of minimum volume simplex term. To answer this problem, we formulate it as a bi-objective optimization problem, resulting in a linear plot (response curve) of the data fitting versus regularization cost. In order to estimate the optimal value of the MVS hyperparameter, we propose to use the Minimum Distance Criterion (MDC); This choice of MDC is motivated by the fact that the MDC solution is unique under mild conditions, unlike other criteria (e.g., the maximum curvature of L-curve). By performing experiments on a simulated image and on real hyperspectral wood data, we show that our method is well-suited for the estimation of the optimal value of the MVS hyperparameter.
  • Sequential Detection of Transient Changes and Its Application to Spectral Analysis

    Authors: Blaise Guepie (UTT), Edith Grall (UTT), Pierre Beauseroy (UTT), Igor Nikiforov (UTT), Frédéric Michel (CEA)
    Primary area of focus / application: Process
    Keywords: Sequential detection, Transient signal, Sodium fast reactor, Spectral analysis
    Submitted at 17-May-2018 17:35 by Blaise GUEPIE
    Accepted
    4-Sep-2018 10:10 Sequential Detection of Transient Changes and Its Application to Spectral Analysis
    This presentation addresses the sequential detection of an abrupt change of a finite duration, often called a transient change. In contrast to the traditional sequential change-point detection, where the post-change period is assumed to be infinitely long, the detection of a transient change should be done with a short detection delay. The detection of transient changes with a delay greater than a prescribed value is considered as missed. Hence, the traditional quickest detection criterion, minimizing the average detection delay provided that the average run length to false alarm is lower bounded by a given constant is compromised.

    The considered optimality criterion minimizes the worst-case probability of missed detection provided that the worst-case probability of false alarm during a certain period is upper bounded. This kind of criterion is typical for safety-critical applications such as cyber-physical systems security (see for example [1]) or nuclear reactors security.

    The problem of sodium fast reactors (SFR) heat exchanger monitoring is studied in the actual presentation. The SFR use sodium-heated steam generators or a sodium-gas heat exchanger to transfer energy from the secondary to the tertiary circuit. In both cases, the heat exchanger should be permanently monitored in order to detect a leak of water or nitrogen into sodium circuit, which can affect the SFR performance or safety.

    The system of monitoring uses accelerometers installed on the heat exchanger. The goal is to detect small leaks of the heat exchanger in the presence of a high normal operating noise coming from different equipment (pumps, turbine, steam generator ...).

    The proposed solution is based on the spectral analysis of accelerometer’s signals. The previously developed suboptimal CUSUM-type transient change detection algorithm [2], applied to the fast Fourier transform, is studied. The worst-case probability of missed detection and the worst-case probability of false alarm are calculated and analyzed as functions of the spectral densities of normal and abnormal operating modes of the heat exchanger.

    References:
    1. V. L. Do, L. Fillatre, I. Nikiforov and P. Willett Security of SCADA Systems Against Cyber–Physical Attacks. IEEE Aerospace & Electronics Systems Magazine, v. 32, n.5, pp. 28 - 45.
    2. B. K. Guépié, L. Fillatre and Igor Nikiforov Detecting a Suddenly Arriving Dynamic Profile of Finite Duration. IEEE Transactions on Information Theory, v. 63, n. 5, pp. 3039 - 3052
  • A Bayesian Self-Starting Shiryaev Statistic for Phase I Data

    Authors: Panagiotis Tsiamyrtzis (Athens University of Economics and Business), Konstantinos Bourazas (Athens University of Economics and Business)
    Primary area of focus / application: Modelling
    Secondary area of focus / application: Process
    Keywords: Bayesian Statistical Process Control and monitoring, AMOC, Persistent shifts, Phase I, Short runs
    Submitted at 18-May-2018 10:57 by Panagiotis Tsiamyrtzis
    Accepted
    4-Sep-2018 10:10 A Bayesian Self-Starting Shiryaev Statistic for Phase I Data
    In Statistical Process Control & Monitoring (SPC&M), our interest is in detecting when a process deteriorates from its “in control” state, typically established during a phase I exercise. Thus the phase I data play a very crucial role, as it is assumed to be a random sample from the in control distribution and are used to calibrate a control chart that will evaluate the process in phase II.

    In this work, we focus our attention on detecting persistent shifts in the parameters of interest under at most one change (AMOC) scenarios during phase I, where low volume data are available. We propose a Bayesian scheme, which is based on the cumulative posterior probability that a step change has already occurred. The proposed methodology is a generalization of Shiryaev’s methodology, as it allows both the parameters and shift magnitude to be unknown. Furthermore, the Shiryaev’s assumption that the prior probability on the location of the change point is constant will be relaxed. Posterior inference for the unknown parameters and the location of a (potential) change point will be provided.

    A real data set will illustrate the Bayesian self-starting Shiryaev’s scheme, while a simulation study will evaluate its performance against standard competitors in the case of Normal data.
  • Point Processes for Studying Failures Distribution on Linear Networks

    Authors: Nicolas Dante (Institut Elie Cartan de Lorraine (IECL)), Bérengère Sixta Dumoulin (Syndicat des eaux d’Ile-de-France (SEDIF)), Radu Stefan Stoica (Institut Elie Cartan de Lorraine (IECL))
    Primary area of focus / application: Modelling
    Keywords: Point processes, Linear networks, Statistical inference, Modelling failures on linear networks
    Submitted at 18-May-2018 16:05 by Nicolas Dante
    Accepted
    4-Sep-2018 11:40 Point Processes for Studying Failures Distribution on Linear Networks
    Point process on linear network were already applied in social and cognitive sciences. At a first glance, the transcription of a point process from a classical Euclidean space to a space induce by a network is straightforward. Nevertheless, the use of an appropriate distance and the structure of the network itself alters the classical concept of stationarity, hence the direct use of the already developed tool for studying point processes (Baddeley et al., 2016).

    This talk presents the application of point processes on linear networks for the spatial spread of failures in the water distribution network.

    First, the point processes on linear networks and main results of characteristics are re-called. Next, simulation algorithms are discusses, and then, inference methods are proposed. Results are presented on simulated and real data. Finally, conclusions and perspectives are depicted.