ENBIS-16 in Sheffield

11 – 15 September 2016; Sheffield Abstract submission: 20 March – 4 July 2016

My abstracts


The following abstracts have been accepted for this event:

  • Big Data in the Shipping Industry

    Authors: Ibna Zaman (Newcastle University), Shirley Coleman (Newcastle University), Kayvan Pazouki (Newcastle University), Rose Norman (Newcastle University)
    Primary area of focus / application: Mining
    Secondary area of focus / application: Design and analysis of experiments
    Keywords: Sensor technology, Data-oriented, Correlation, Operational activities
    Submitted at 20-May-2016 14:24 by Ibna Zaman
    Accepted (view paper)
    13-Sep-2016 09:40 Big Data in the Shipping Industry
    Shipping is one of the oldest industries in the world and the backbone of global commercial trade. The industry nowadays faces challenges due to the current economic situation, new regulations, and lack of innovation. Automation and sensor technology are creating a huge amount of data. By 2020, the annual data generation will be accelerated up to 4300%. Big data has become a buzzword in the industry. It is produced in large volumes in the shipping industry from multiple sources including power systems, engine sensors, navigation and meteorological input. Analysing this big data provides real-time transparency, predictive analysis of the performance and support in decision making. This analysis identifies correlations of different measurable or unmeasurable parameters, discovers hidden patterns and trends. Those analyses will create a great impact in vessel performance monitoring. This paper presents “Auto-Mode Detection” and “ECO Speed” which are data-oriented systems for marine services. Auto Mode Detection system detects the vessel’s mode automatically based on the operational activities. It removes human intervention from the system and helps to monitor the vessel performance based on different operational activities. “ECO Speed” informs the optimum speed of the vessel, estimated fuel consumption and duration for the voyage distance. So the operators can make a decision on the vessel speed for the upcoming journey. Both systems demonstrate how the shipping data is being turned into value. In this way operational data is extending its usefulness into giving real help to management decision making and planning.
  • Random Forest-Based Approach for Physiological Functional Variable Selection for Driver's Stress Level Classification

    Authors: Naska Haouij (CEA-LinkLab, Telnet Innovation Labs), Raja Ghozi (U2S-ENIT, CEA-LinkLab), Jean-Michel Poggi (Univ. Paris Descartes et Univ. Paris Sud), Sylvie Sevestre Ghalila (CEA-LinkLab), Mériem Jaidane (U2S-ENIT, CEA-LinkLab)
    Primary area of focus / application:
    Keywords: Random forests, Functional data, Variable selection, Physiological signal, Stress level classification
    Submitted at 22-May-2016 19:16 by Naska Haouij
    14-Sep-2016 10:50 Random Forest-Based Approach for Physiological Functional Variable Selection for Driver's Stress Level Classification
    With the increasing urbanization and technological advances, urban driving is bound to be a complex task that requires higher levels of alertness. Thus, the driver’s mental workload should be optimal in order to manage critical situations in such challenging driving conditions. Past studies relied on driver’s performances used subjective measures. The new wearable and non-intrusive sensor technology, is not only providing real-time physiological monitoring, but also is enriching the tools for human affective and cognitive states monitoring.

    This study focuses on a driver’s physiological changes using portable sensors in different urban routes. Specifically, the Electrodermal Activity (EDA) measured on two different locations: hand and foot, Electromyogram (EMG), Heart Rate (HR) and Respiration (RESP) of ten driving experiments in three types of routes are considered: rest area, city, and highway driving issued from physiological database, labelled "drivedb", available online on the PHYSIONET

    Several studies have been done on driver's stress level recognition using physiological signals. Classically, researchers extract expert-based features from physiological signals and select the most relevant features in stress level recognition. This work aims to provide a random forest-based method for the selection of physiological functional variables in order to classify the stress level during real-world driving experience. The contribution of this study is twofold: on the methodological side, it considers physiological signals as functional variables and offers a procedure of data processing and variable selection. On the applied side, the proposed method provides a "blind" procedure of driver's stress level classification that do not depend on the expert-based studies of physiological signals.
  • Prior Information, but no MCMC: A Bayesian Normal Linear Regression Case Study

    Authors: Katy Klauenberg (Physikalisch-Technische Bundesanstalt), Gerd Wübbeler (Physikalisch-Technische Bundesanstalt), Bodo Mickan (Physikalisch-Technische Bundesanstalt), Peter Harris (National Physical Laboratory), Clemens Elster (Physikalisch-Technische Bundesanstalt)
    Primary area of focus / application: Metrology & measurement systems analysis
    Keywords: Bayesian inference, Prior knowledge, Conjugate prior distribution, Linear regression, Normal inverse Gamma distribution, Gaussian measurement error, Sonic nozzle calibration
    Submitted at 23-May-2016 13:11 by Katy Klauenberg
    12-Sep-2016 11:50 Prior Information, but no MCMC: A Bayesian Normal Linear Regression Case Study
    Regression is a very common task and frequently additional a priori information exists. Bayesian inference is well suited for these situations. However, often the need for MCMC methods and difficulties in eliciting prior distributions prevent the application of Bayesian inference.

    Oftentimes prior knowledge is available from an ensemble of previous, similar regressions. Pooling the posterior distributions from previous regressions and approximating the average by a wide distribution from a parametric family is suitable to describe practitioners' a priori belief of a similar regression. Likewise for linear regression models with Gaussian measurement errors: prior information from previous regressions can often be expressed by the normal inverse Gamma distribution - a conjugate prior. Yielding an analytical, closed form posterior distribution, one can easily derive estimates, uncertainties and credible intervals for all parameters, the regression curve as well as predictions for this class of problems. In addition, we describe Bayesian tools to assess the plausibility of assumptions behind the suggested approach - also without Markov Chain Monte Carlo (MCMC) methods.

    A typical problem from metrology will demonstrate the practical value. We apply the Normal linear regression with conjugate priors to the calibration of a flow device and illustrate how prior knowledge from previous calibrations may enable robust predictions even for extrapolations. In addition we provide software and suggest graphical displays to also enable practitioners to apply Bayesian Normal linear regression.
  • A Two-Stage Bayesian Approach for the Analysis of Multispectral Camera Measurements

    Authors: Marcel Dierl (Physikalisch-Technische Bundesanstalt), Timo Eckhard (Chromasens GmbH), Bernhard Frei (Chromasens GmbH), Maximilian Klammer (Chromasens GmbH), Sascha Eichstädt (Physikalisch-Technische Bundesanstalt), Clemens Elster (Physikalisch-Technische Bundesanstalt)
    Primary area of focus / application: Metrology & measurement systems analysis
    Keywords: Multispectral imaging, Color inspection, Inverse problems, Reconstruction techniques, Bayesian statistics
    Submitted at 24-May-2016 08:59 by Marcel Dierl
    Accepted (view paper)
    12-Sep-2016 12:10 A Two-Stage Bayesian Approach for the Analysis of Multispectral Camera Measurements
    Estimation of spectral reflectance from responses of multispectral imaging systems is important for numerous applications in imaging science and several reconstruction principles have been proposed to this end. These principles have in common that the calculation of spectral reflectance from measurement data requires the solution of an ill-posed inverse problem. In many practical situations appropriate prior knowledge is available that should be utilized in the reconstruction procedure for regularization. However, this is not straightforward for many of the data analysis methods currently applied in spectral measurement, and it is often realized by using suitable training data and special kernel functions.

    Here we present a two-stage Bayesian approach that allows us to incorporate prior knowledge about spectral content. Such prior knowledge can originate from previous monochromator or spectrophotometer measurements. For the Bayesian analysis we apply truncated normal distributions that ensure the physical constraint of positivity and use special designed prior covariance matrices to provide smooth recovered spectra. In the first step, called calibration stage, spectral sensitivity curves for each camera channel of the multispectral imaging system are determined that connect camera responses with spectral reflectances. In the subsequent measurement stage these results are used for the estimation of the reflectance spectrum from the camera’s response. The approach yields analytical expressions for a fast and efficient estimation of spectral reflectance and is thus suitable for real-time applications. Besides point estimates, also probability distributions are obtained which completely characterize the uncertainty associated with the reconstructed spectrum.

    We demonstrate the performance of our approach by using simulated data for the camera responses and spectral curves. It is shown that through incorporation of prior knowledge the Bayesian treatment yields improved reconstruction results for a wide range of standard deviations of the prior compared to methods that resort to training data only. Reflectance spectra which are not fully captured by training data can also be well estimated with our approach.
  • Association Rules and Compositional Data Analysis: An Odd Couple?

    Authors: Josep Antoni Martín-Fernández (University of Girona), Marina Vives-Mestres (University of Girona), Ron S. Kenett (KPA Group, Raanana, Israel; University of Torino, Torino, Italy; NYU Center for Risk Engineering, New York, USA)
    Primary area of focus / application: Mining
    Secondary area of focus / application: Business
    Keywords: Data mining, Log ratio, Measures of interestingness, Simplex
    Submitted at 24-May-2016 09:01 by Josep Antoni Martín-Fernández
    13-Sep-2016 16:00 Association Rules and Compositional Data Analysis: An Odd Couple?
    Many modern organizations generate a large amount of transaction data, on a daily basis. Association rule (AR) mining is a powerful semantic data analytic technique used for extracting information from transaction databases. AR was originally developed for basket analysis where the combination of items in a shopping basket are evaluated. An itemset is a set of two of more items. To generate an AR, we first detect the set of more frequent itemsets. Then, as a second step, all possible association rules are generated form each itemset. Any AR that does not satisfy a minimum confidence threshold is removed. Typically, too many AR are found and, after initial filtering, one has to rank rules using additional measures of interest. The R package “arules” provides a broad variety (more than a dozen) of interest measures for AR. In this work we exploit the fact that an AR can be expressed as a contingency table with compositional data (CoDa) structure so that AR and CoDa are not “an odd couple”. We present the properties of AR related compositional measures. Then we show how to confirm the significance of an AR and provide an interpretation of the effects between the itemsets. We contrast CoDa visualization techniques with classical examples to show how this approach proves helpful in analyzing a transaction database.
  • What Statistical Handbooks do not Teach about Shewhart Control Charts?

    Authors: Vladimir Shper (Moscow Institute of Steel & Alloys), Yuri Adler (Moscow Institute of Steel & Alloys)
    Primary area of focus / application: Education & Thinking
    Keywords: Shewhart Control Chart, Statistical thinking, Simulation, Study
    Submitted at 24-May-2016 21:28 by Vladimir Shper
    Accepted (view paper)
    14-Sep-2016 09:20 What Statistical Handbooks do not Teach about Shewhart Control Charts?
    In this paper we’d like to discuss some questions about the application of Shewhart Control Charts (ShCC) in real life. Such application is based on the information one gets out of ShCC. This information may be divided into two groups: about the values of the process under study and about the structure which these values elucidate. Which group of information is more important for right interpretation of ShCC – is the goal of our discussion. In fact we discuss the process of transmission from Phase I of preliminary analysis in order to design the control chart to Phase II of using the designed chart for process monitoring and improving. Our approach begs a simple but interesting question: does the order of statistics sampled from the process under study matter or not? Or another question closely connected with the advices given by many textbooks about comparative performance of different types of ShCC. Such comparison is often based on the simulation of real processes (for example, Average Run Length studies) so one may ask: when a statistician is simulating a process by taking random values from an appropriate distribution and neglecting the order of points - at what degree he/she may hope to get the copy close enough to the original? Our answer is obviously very predictable: it depends! But we are going to move further than that. Our paper discusses the conditions when it is reasonable to think that traditional way of designing a control chart is appropriate, and when the tacit assumption that the order of points does not matter is unrealistic. Examples from real processes and some results of simulations are presented as well.