ENBIS-18 in Nancy

2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018

My abstracts


The following abstracts have been accepted for this event:

  • O2S2 - Object Oriented Spatial Statistics: a Review with Examples

    Authors: Piercesare Secchi (Politecnico di Milano)
    Primary area of focus / application: Other:
    Keywords: Object oriented data analysis, Kriging for object data, Random domain decomposition
    Submitted at 11-Jun-2018 12:55 by Piercesare Secchi
    5-Sep-2018 12:00 Closing Keynote: Piercesare Secchi on: "O2S2 - Object Oriented Spatial Statistics: A Review with Examples"
    I will review recent advances in Object Oriented Spatial Statistics, a system of ideas, algorithms and methods that allows the analysis of high dimensional and complex data when their spatial dependence is an important issue. At the intersection of different disciplines - including mathematics, statistics, computer science and engineering - Object Oriented Spatial Statistics provides the right perspective to address key problems in varied contexts, from Earth and life sciences to urban planning. Through examples, I will illustrate a few paradigmatic methods applied to problems of prediction, classification and smoothing, giving emphasis to industrial applications of the key ideas Object Oriented Spatial Statistics relies upon.
  • Identification of Outliers and Influential Obervations: An Application

    Authors: Rodrigo S. Von Doellinger (Division of Methods and Quality, Brazilian Institute of Geography and Statistics), Maysa S. De Magalhães (National School of Statistical Sciences, Brazilian Institute of Geography and Statistics), Pedro N. Silva (National School of Statistical Sciences, Brazilian Institute of Geography and Statistics)
    Primary area of focus / application: Other: South American Session
    Keywords: Selective editing, Outliers, Influential observations, Score function
    Submitted at 19-Jun-2018 14:44 by Maysa de Magalhães
    4-Sep-2018 14:30 Identification of Outliers and Influential Obervations: An Application
    For good quality statistical information be provided, it may not be necessary to identify all the errors presented in the data. It is just sufficient to detect influential observations, that is, those which when included or excluded from the analysis, significantly impact on the estimate of the parameter of interest.

    The approach generally used to identify influential observations is called selective editing (Latouche and Berthelot, 1992; Lawrence and McKenzie, 2000). In the methods of selective editing, potentially influential observations are ranked based on values of a score function, which expresses the impact of the error in the estimate of parameter of interest. The observations with scores above a pre-set threshold are considered critical and should be revised. The definition of the score function implies in determining the probability of the observation to present error (risk component), as well as the magnitude of the error (component of influence). Risk and influence components are used by score functions presented in the literature (Jader and Norberg, 2005). According to Di Zio et al. (2008) the methods commonly employed to obtain the risk and influence components are based on comparison of the observed values of a given variable and the predicted values for a particular model. The differences between observed and predicted values are used in the calculation of scores for identifying observations that generate greater impact on the estimated of parameter of interest.

    Di Zio et al. (2008) proposed a multivariate model to estimate the probability of error and as well as the error magnitude. The method is based on contaminated normal models (Little, 2008). The data observed are described by a mixture of two multivariate normal distributions that represent the erroneous or contamined data and the data without errors. It is assumed that the distribution of the contaminated data can be obtained by the distribution of the data without errors with an increase in the variance (Ghosh-Dastidar and Schafer, 2006).

    In this paper, the method of selective editing proposed by Di Zio et al. (2008) was applied to identify outliers and influential observations in the Household Budget Survey (HBS 2008/2009) of the Brazilian Institute of Geography and Statistics (IBGE) through the use of the following variables, the monthly household income and the annual household expenditure.

    Di Zio M., Guarnera U. and Luzi, O. (2008). Contamination models for the detection of outliers and influenetial errors in continuous multivariate data. UNECE, Conference of European Statistician, Work Session on Statistical Editing, Vienna.
    Ghosh-Dastidar B., Schafer J.L. (2006). Outlier Detection and Editing Procedures for Continuous Multivariate Data. Journal of Official Statistics, 22 (3), 487–506.
    Jäder A., Norberg A. (2005). A Selective Editing Method considering both suspicion and potential impact, developed and applied to the Swedish Foreign Trade Statistics, UN/ECE Work Session on Statistical Data Editing, Ottawa. (http://www.unece.org/stats/documents/2005.05.sde.htm).
    Latouche M., Berthelot J.M. (1992). Use of a score function to prioritize and limit recontacts in editing business surveys. Journal of Official Statistics, 8 (3), 389- 400.
    Lawrence D., McKenzie R. (2000). The General Application of Significance Editing. Journal of Official Statistics, 16 (3), 243-253.
  • Effect of Neglecting Autocorrelation in Regression CUSUM Charts to Monitor Counts Time Series

    Authors: Orlando Yesid Esparza Albarracín (University of São Paulo), Airlane Pereira Alencar (University of São Paulo), Linda Lee Ho (University of São Paulo)
    Primary area of focus / application: Other: South American Session
    Keywords: Health surveillance, Average run length, Autocorrelation, Time series, Control charts
    Submitted at 25-Jun-2018 23:07 by Linda Ho
    4-Sep-2018 14:50 Effect of Neglecting Autocorrelation in Regression CUSUM Charts to Monitor Counts Time Series
    In practice, it is usual to monitor count time series proposing Poisson or Negative Binomial regression models (the last to control overdispersion). The main concern is that the usual generalized linear models (GLM) assume independence and the data are autocorrelated in general time series. One possibility is to fit the generalized autoregressive and moving average (GARMA) model to model counts under the negative binomial distribution with time varying means and include lagged terms to take into account the autocorrelation.

    The main contribution of our research is to measure the impact, (in terms of the average run length (ARL)), on the performance of CUSUM charts with different statistics, when the serial correlation is neglected in a regression model. This is performed simulating correlated process using GARMA, fitting independent GLM models and building the corresponding CUSUM charts. High autocorrelation leads to an increase of false alarms. This analysis may help practitioners to implement control charts taking into account the serial correlation with no extra cost to fit an appropriate model
  • RUL Prediction and Maintenance Police for Wind Turbine Component

    Authors: Jinrui Ma (University of technology of Troyes), Mitra Fouladirad (University of technology of Troyes), Antoine Grall (University of technology of Troyes)
    Primary area of focus / application: Other: Reliability
    Keywords: Wind speed model, Wind turbine, Deterioration modelling, RUL predication, Maintenance
    Submitted at 26-Jun-2018 19:35 by Jinrui MA
    Accepted (view paper)
    4-Sep-2018 09:40 RUL Prediction and Maintenance Police for Wind Turbine Component
    Due to the inherent characteristics of wind, the wind turbine operation is nonlinear and random. The wind turbine works in a various wind speed range and supports continuously changing stochastic loads. Hence the deterioration of wind turbine component can be considered as a stochastic process. Considering the influence of the wind speed in components deterioration, the location of wind farm and its accessibility maintenance operations are planned. The aim is to propose a predictive maintenance policy. Remaining Useful Lifetime predication of wind turbine component is the main key to this problem.

    In this paper a flexible wind speed model is proposed to generate long-term continuous wind speed data using to simulate the real wind speed of a wind farm. A deterioration model considering the wind speed influence is proposed. A RUL prediction model which takes into account the stochastic deterioration and the future’s wind influence is studied. A predictive maintenance policy is proposed.
  • Generalized Regression – A Unified Framework for Linear Models

    Authors: Chris Gotwalt (Director of Statistical Research and Development, SAS Institute - JMP Division)
    Primary area of focus / application: Other: Software
    Secondary area of focus / application: Modelling
    Keywords: Generalized regression, Demonstration, User interface, Modelling
    Submitted at 27-Jun-2018 13:05 by Chris Gotwalt
    3-Sep-2018 11:00 Generalized Regression – A Unified Framework for Linear Models
    Linear and generalized linear models are the central modeling tools of the applied statistician. Although there is a lot in common between, say, logistic regression, the analysis of designed experiments, and the application of modern variable selection methods like the Lasso, learning how to use these methods is made more complicated than it should.

    This is because, even within the same software package, the terminology and user interface to these methods is often quite different. The Generalized Regression platform in JMP Pro, unlike previous linear modeling tools, provides a common framework for a vast array of models. It has been designed so that wherever the same concepts apply in different classes of models they have the same names and the output matches closely. Furthermore, with its Interactive Solution Path, one is able to instantly explore and evaluate the tradeoffs of different models.

    In this session, we will illustrate the use of Generalized Regression on several types of models. We will give special attention to how it can be used to demonstrate visually to students the consequences of underfitting a model, the poor generalization performance of overfitting, as well as the immediate practical consequences of mishandling multicollinear data. We believe that this user interface can streamline the way that statistical modeling is taught in a way that makes the concepts much more clear to students.
  • On Some Conservative Bounds for the Barrier Crossing Probability

    Authors: Igor Nikiforov (Institut Charles Delaunay, ROSAS, Université de Technologie de Troyes)
    Primary area of focus / application: Other: Sequential detection
    Keywords: Barrier crossing probability, Distribution overbounding, Autoregressive process, Numerical integration
    Submitted at 28-Jun-2018 17:18 by Igor Nikiforov
    Accepted (view paper)
    4-Sep-2018 10:50 On Some Conservative Bounds for the Barrier Crossing Probability
    For some safety-critical applications, it is important to calculate the probability that a discrete time scalar (or vector) autoregressive (AR) process leaves an open interval (or an open ball) during a certain period of time. Let us consider the two following scenarios where the conservative bounds for the barrier crossing probability are useful.

    The first possible scenario is related to the control charts: the Shewhart chart and the Geometric Moving Average (GMA) chart, which are used for detecting the abrupt changes. Traditionally, the Average Run Length (ARL) to a false alarm and the worst-case mean detection delay are used as statistical performance measures of the Shewhart and GMA charts. The disadvantage of the ARL criterion for safety-critical applications consists in the existence of the right ``tail'' of the detection delay distribution. For safety-critical applications, it is more convenient to use the probability of missed detection and the probability of false alarm defined with respect to some periods. These probabilities are reduced to the barrier crossing probability for the AR process.

    The second scenario is related to the barrier crossing probability for a certain risk indicator, which is the AR process generated by some estimation errors. The safety of the system is compromised if the probability that this risk indicator leaves a given confidence zone at least once during a certain period becomes too important. Sometimes, we are also interested in the calculation of the instantaneous risk probability.

    In practice, the main difficulty for the above-mentioned scenarios is that the Cumulative Distribution Functions (CDFs) (with infinite support) of the innovation noise in the above-mentioned AR model and its initial state are unknown and only their upper and lower bounds are available.

    Numerical methods to compute the conservative bounds for the above-mentioned barrier crossing probability are considered in the presentation.