ENBIS-18 in Nancy

2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018

My abstracts

 

The following abstracts have been accepted for this event:

  • Statistical Engineering: An Idea Whose Time Has Come?

    Authors: Roger Hoerl (Union College)
    Primary area of focus / application: Other: Statistical Engineering
    Keywords: Engineering, Problem solving, Complex problems, Unstructured problems
    Submitted at 29-May-2018 16:13 by Roger Hoerl
    Accepted
    3-Sep-2018 11:00 Statistical Engineering: An Idea Whose Time Has Come?
    Most problems discussed in statistics textbooks, journals, and conference sessions tend to be well-defined, fairly narrow in scope, and have a single “correct” analysis. In the words of Xiao-Li Meng, they “...correspond to a recognizable textbook chapter.” However, real problems faced by statisticians, and other professionals utilizing statistics, are often large, complex, and unstructured. For example, there may not be agreement on exactly what the real problem actually is. Further, these problems are usually too complex to be solved with one statistical method, and require a sequential approach integrating multiple methods in an overall problem solving strategy. If individual tools, no matter how powerful, are not effective at addressing such problems, what approaches should practitioners use? Can these approaches be studied, researched, and perfected over time? I argue that the discipline known as statistical engineering, which focuses on creative integration of multiple methods, is a viable approach for attacking such problems. It is perhaps the only viable approach. This will be illustrated with a large, complex, unstructured problem from GE Global Research. In addition, the current state of the theory and practice of statistical engineering will be presented.
  • A New ISO Standard for Isolated Lot Inspection by Variables Sampling

    Authors: Rainer Göb (University of Wuerzburg)
    Primary area of focus / application: Other: Sampling
    Secondary area of focus / application: Quality
    Keywords: ISO, Isolated lot inspection, Variables sampling, Design principles
    Submitted at 29-May-2018 16:44 by Rainer Göb
    Accepted
    5-Sep-2018 11:10 A New ISO Standard for Isolated Lot Inspection by Variables Sampling
    The ISO (International Organization for Standardization) acceptance sampling standards and their national versions are the most widely used statistical standards. ISO 2859 is a multi-part series of standards for attributes sampling.
    ISO 3951 is a multi-part series of standards for variables sampling where the proportion nonconforming is determined as the probability of a normally distributed measurement falling outside a specification range.

    ISO 2859 includes the process oriented attributes sampling standard 2859-1 and the isolated lot oriented standard 2859-2. ISO 3951 contains only the process oriented part, namely 3951-1 which is a descendent of the US military standard MIL-STD-414. For some times stakeholders particularly from the pharmaceutical and food industry have been requesting a variables sampling standard for isolated lot inpection. A draft for such a standard has recently been developed by the responsible technical committee TC 69 "Application of statistical methods'' at ISO. The talk outlines the new standard and explains the technical background and design principles, in particular technical details of matching sampling plans between the existing attributes sampling standard 2859-2 and the new variables sampling standard.
  • Short One-Sided Confidence Intervals for a Proportion

    Authors: Jens Bischoff (University of Wuerzburg), Rainer Göb (University of Wuerzburg)
    Primary area of focus / application: Other: Sampling
    Secondary area of focus / application: Quality
    Keywords: One-sided confidence intervals, Average coverage, Local restrictions, Prior information, Frequentistic
    Submitted at 29-May-2018 16:47 by Jens Bischoff
    Accepted (view paper)
    5-Sep-2018 10:50 Short One-Sided Confidence Intervals for a Proportion
    Shortest frequentist confidence intervals for a proportion p under prior information have been discussed in the literature. In industrial practice, one-sided confidence intervals for p are often more relevant than two-sided intervals. In particular, in quality control and auditing one-sided upper limits are most relevant. Attempts for developing one-sided confidence intervals shorter than the classical Clopper-Pearson intervals have not been made yet. We present a concept of the volume of one-sided confidence intervals under prior information so that minimum volume intervals can be defined. The technique for constructing minimum volume intervals with prescribed pointwise coverage cannot be adapted from the two-sided case. As an alternative we consider the average coverage and derive intervals with smaller volume than the classical Clopper-Pearson under additional restrictions on the local pointwise coverage.
  • ARIMA Time Series Analysis of Nano Exposure Measurements

    Authors: Wouter Fransman (TNO)
    Primary area of focus / application: Other: statistical methods in industrial hygiene
    Keywords: Time series, Nanomaterials, Exposure, ARIMA
    Submitted at 30-May-2018 11:42 by Wouter Fransman
    Accepted
    4-Sep-2018 11:40 ARIMA Time Series Analysis of Nano Exposure Measurements
    Real-time measurement strategies of nano-exposure at the workplace result in large amounts of data, typically with sampling intervals of 1-30 seconds. These data are, for instance, of interest to determine the effect of a specific task on a worker’s exposure to nano particles. Such data have generally been analyzed by comparing means using summary statistics like the (geometric) mean and standard deviation, t-tests, or standard regression techniques. However, in this paper we argue that such methods neglect important aspects of exposure measurement sequences like autocorrelation and the dynamics in the data and therefore can lead to erroneous conclusions. To overcome those problems, we propose the use of time-series methods using standard ARIMA models and extensions of those models to include explanatory variables to statistically test for task effects on exposure. After illustrating the advantages of this methodology with a simulation study, we present the results for some real-data examples. In conclusion, we suggest a step-wise approach for the analysis of real-time exposure measurements that does account for the dynamic nature of the data. We argue that the recording of process information during exposure measurement can substantially improve the conclusions that can be drawn from such a measurement series.
  • Choice of Number of Whole Plots and Number of Runs in the Design of Split-Plot Experiments

    Authors: Jacqueline Asscher (Kinneret College)
    Primary area of focus / application: Design and analysis of experiments
    Secondary area of focus / application: Consulting
    Keywords: Split-plot, DOE, Consulting, Components of variance
    Submitted at 30-May-2018 21:11 by Jacqueline Asscher
    Accepted
    4-Sep-2018 14:30 Choice of Number of Whole Plots and Number of Runs in the Design of Split-Plot Experiments
    A common example of a split-plot experiment is an experiment conducted to improve a process that runs in batches (“whole plots”), where some factors are changed between batches and other factors are changed within batches, resulting in two components of variance. The split-plot structure may be inherent in the process being investigated or in the experimental set up, or may be adopted by choice either to save time or money, or to improve the design.

    While split-plot designs are familiar to experts in Design of Experiments (DOE), they are typically not fully understood by practitioners or clients, although they may be very attractive due to the lure of savings and convenience.

    A key decision in the design of a split-plot experiment is the choice of how many whole plots and how many runs to include. The issues that arise when this decision is to be made are defined and discussed here.

    Before considering the statistical properties of the design, the following questions are addressed: How do we calculate how this choice affects savings in time and/or money? How do we present the calculations? How do we clarify and display the design options to the owner of the experiment?

    The variance of the effects depends on the number of whole plots and runs and on the size of the two components of variance, the variation between whole plots and the variation between runs within whole plots. The power of the tests to identify active effects depends on the size of the variance of the effects, the size of the effects, and on our ability to estimate the two components of variance. The latter is also determined by the number of whole plots and runs.

    Questions that arise here include: What happens if I assume that the between plot variation is relatively small, or relatively large? What happens if I don’t know? What happens when the estimation of certain effects is of critical importance?

    Other considerations in the choice of how many whole plots and how many runs to include are the proportion of factors that can only be changed between whole plots and the choice of model that is to be fitted.

    Finally, the relationship between all of the issues involved is discussed.
  • Autoencoding any Data through Kernel Autoencoders

    Authors: Pierre Laforgue (Télécom ParisTech)
    Primary area of focus / application: Modelling
    Keywords: Kernel methods, Autoencoders, Operator-valued kernels, Representation learning
    Submitted at 31-May-2018 11:22 by Kevin Elgui
    Accepted (view paper)
    5-Sep-2018 09:00 Autoencoding any Data through Kernel Autoencoders
    This work investigates a novel algorithmic approach to data representation based on kernel methods. Assuming the observations lie in a Hilbert space X, this work introduces a new formulation of Representation Learning, stated as a regularized empirical risk minimization problem over a class of composite functions. These functions are obtained by composing elementary mappings from vector-valued Reproducing Kernel Hilbert Spaces (vv-RKHSs), and the risk is measured by the expected distortion rate in the input space X. The proposed algorithms crucially rely on the form taken by the minimizers, revealed by a dedicated Representer Theorem. Beyond a first extension of the autoencoding scheme to possibly infinite dimensional Hilbert spaces, an important application of the introduced Kernel Autoencoders (KAEs) arises when X is assumed itself to be a RKHS: this makes it possible to extract finite dimensional representations from any kind of data. Numerical experiments on simulated data as well as real labeled graphs (molecules) provide empirical evidences of the performance attained by KAEs.