ENBIS: European Network for Business and Industrial Statistics
Forgotten your password?
Not yet a member? Please register
ENBIS-18 in Nancy
2 – 25 September 2018; Ecoles des Mines, Nancy (France) Abstract submission: 20 December 2017 – 4 June 2018The following abstracts have been accepted for this event:
-
O2S2 - Object Oriented Spatial Statistics: a Review with Examples
Authors: Piercesare Secchi (Politecnico di Milano)
Primary area of focus / application: Other:
Keywords: Object oriented data analysis, Kriging for object data, Random domain decomposition
Submitted at 11-Jun-2018 12:55 by Piercesare Secchi
Accepted
-
Identification of Outliers and Influential Obervations: An Application
Authors: Rodrigo S. Von Doellinger (Division of Methods and Quality, Brazilian Institute of Geography and Statistics), Maysa S. De Magalhães (National School of Statistical Sciences, Brazilian Institute of Geography and Statistics), Pedro N. Silva (National School of Statistical Sciences, Brazilian Institute of Geography and Statistics)
Primary area of focus / application: Other: South American Session
Keywords: Selective editing, Outliers, Influential observations, Score function
Submitted at 19-Jun-2018 14:44 by Maysa de Magalhães
Accepted
The approach generally used to identify influential observations is called selective editing (Latouche and Berthelot, 1992; Lawrence and McKenzie, 2000). In the methods of selective editing, potentially influential observations are ranked based on values of a score function, which expresses the impact of the error in the estimate of parameter of interest. The observations with scores above a pre-set threshold are considered critical and should be revised. The definition of the score function implies in determining the probability of the observation to present error (risk component), as well as the magnitude of the error (component of influence). Risk and influence components are used by score functions presented in the literature (Jader and Norberg, 2005). According to Di Zio et al. (2008) the methods commonly employed to obtain the risk and influence components are based on comparison of the observed values of a given variable and the predicted values for a particular model. The differences between observed and predicted values are used in the calculation of scores for identifying observations that generate greater impact on the estimated of parameter of interest.
Di Zio et al. (2008) proposed a multivariate model to estimate the probability of error and as well as the error magnitude. The method is based on contaminated normal models (Little, 2008). The data observed are described by a mixture of two multivariate normal distributions that represent the erroneous or contamined data and the data without errors. It is assumed that the distribution of the contaminated data can be obtained by the distribution of the data without errors with an increase in the variance (Ghosh-Dastidar and Schafer, 2006).
In this paper, the method of selective editing proposed by Di Zio et al. (2008) was applied to identify outliers and influential observations in the Household Budget Survey (HBS 2008/2009) of the Brazilian Institute of Geography and Statistics (IBGE) through the use of the following variables, the monthly household income and the annual household expenditure.
References
Di Zio M., Guarnera U. and Luzi, O. (2008). Contamination models for the detection of outliers and influenetial errors in continuous multivariate data. UNECE, Conference of European Statistician, Work Session on Statistical Editing, Vienna.
Ghosh-Dastidar B., Schafer J.L. (2006). Outlier Detection and Editing Procedures for Continuous Multivariate Data. Journal of Official Statistics, 22 (3), 487–506.
Jäder A., Norberg A. (2005). A Selective Editing Method considering both suspicion and potential impact, developed and applied to the Swedish Foreign Trade Statistics, UN/ECE Work Session on Statistical Data Editing, Ottawa. (http://www.unece.org/stats/documents/2005.05.sde.htm).
Latouche M., Berthelot J.M. (1992). Use of a score function to prioritize and limit recontacts in editing business surveys. Journal of Official Statistics, 8 (3), 389- 400.
Lawrence D., McKenzie R. (2000). The General Application of Significance Editing. Journal of Official Statistics, 16 (3), 243-253. -
Effect of Neglecting Autocorrelation in Regression CUSUM Charts to Monitor Counts Time Series
Authors: Orlando Yesid Esparza Albarracín (University of São Paulo), Airlane Pereira Alencar (University of São Paulo), Linda Lee Ho (University of São Paulo)
Primary area of focus / application: Other: South American Session
Keywords: Health surveillance, Average run length, Autocorrelation, Time series, Control charts
Submitted at 25-Jun-2018 23:07 by Linda Ho
Accepted
The main contribution of our research is to measure the impact, (in terms of the average run length (ARL)), on the performance of CUSUM charts with different statistics, when the serial correlation is neglected in a regression model. This is performed simulating correlated process using GARMA, fitting independent GLM models and building the corresponding CUSUM charts. High autocorrelation leads to an increase of false alarms. This analysis may help practitioners to implement control charts taking into account the serial correlation with no extra cost to fit an appropriate model -
RUL Prediction and Maintenance Police for Wind Turbine Component
Authors: Jinrui Ma (University of technology of Troyes), Mitra Fouladirad (University of technology of Troyes), Antoine Grall (University of technology of Troyes)
Primary area of focus / application: Other: Reliability
Keywords: Wind speed model, Wind turbine, Deterioration modelling, RUL predication, Maintenance
In this paper a flexible wind speed model is proposed to generate long-term continuous wind speed data using to simulate the real wind speed of a wind farm. A deterioration model considering the wind speed influence is proposed. A RUL prediction model which takes into account the stochastic deterioration and the future’s wind influence is studied. A predictive maintenance policy is proposed. -
Generalized Regression – A Unified Framework for Linear Models
Authors: Chris Gotwalt (Director of Statistical Research and Development, SAS Institute - JMP Division)
Primary area of focus / application: Other: Software
Secondary area of focus / application: Modelling
Keywords: Generalized regression, Demonstration, User interface, Modelling
Submitted at 27-Jun-2018 13:05 by Chris Gotwalt
Accepted
This is because, even within the same software package, the terminology and user interface to these methods is often quite different. The Generalized Regression platform in JMP Pro, unlike previous linear modeling tools, provides a common framework for a vast array of models. It has been designed so that wherever the same concepts apply in different classes of models they have the same names and the output matches closely. Furthermore, with its Interactive Solution Path, one is able to instantly explore and evaluate the tradeoffs of different models.
In this session, we will illustrate the use of Generalized Regression on several types of models. We will give special attention to how it can be used to demonstrate visually to students the consequences of underfitting a model, the poor generalization performance of overfitting, as well as the immediate practical consequences of mishandling multicollinear data. We believe that this user interface can streamline the way that statistical modeling is taught in a way that makes the concepts much more clear to students. -
On Some Conservative Bounds for the Barrier Crossing Probability
Authors: Igor Nikiforov (Institut Charles Delaunay, ROSAS, Université de Technologie de Troyes)
Primary area of focus / application: Other: Sequential detection
Keywords: Barrier crossing probability, Distribution overbounding, Autoregressive process, Numerical integration
The first possible scenario is related to the control charts: the Shewhart chart and the Geometric Moving Average (GMA) chart, which are used for detecting the abrupt changes. Traditionally, the Average Run Length (ARL) to a false alarm and the worst-case mean detection delay are used as statistical performance measures of the Shewhart and GMA charts. The disadvantage of the ARL criterion for safety-critical applications consists in the existence of the right ``tail'' of the detection delay distribution. For safety-critical applications, it is more convenient to use the probability of missed detection and the probability of false alarm defined with respect to some periods. These probabilities are reduced to the barrier crossing probability for the AR process.
The second scenario is related to the barrier crossing probability for a certain risk indicator, which is the AR process generated by some estimation errors. The safety of the system is compromised if the probability that this risk indicator leaves a given confidence zone at least once during a certain period becomes too important. Sometimes, we are also interested in the calculation of the instantaneous risk probability.
In practice, the main difficulty for the above-mentioned scenarios is that the Cumulative Distribution Functions (CDFs) (with infinite support) of the innovation noise in the above-mentioned AR model and its initial state are unknown and only their upper and lower bounds are available.
Numerical methods to compute the conservative bounds for the above-mentioned barrier crossing probability are considered in the presentation.