ENBIS-16 in Sheffield

11 – 15 September 2016; Sheffield Abstract submission: 20 March – 4 July 2016

My abstracts

 

The following abstracts have been accepted for this event:

  • From Text to Clouds and Meaning: Exploring Unstructured Data in JMP

    Authors: Volker Kraft (SAS Institute / JMP Division)
    Primary area of focus / application: Other: Software
    Secondary area of focus / application: Mining
    Keywords: JMP, Unstructured data, Text mining, Software
    Submitted at 24-Feb-2016 23:27 by Volker Kraft
    Accepted
    12-Sep-2016 14:30 From Text to Clouds and Meaning: Exploring Unstructured Data in JMP
    All data is unstructured but still has exploitable information available. For example, unstructured text data could result from comment fields in surveys or incident reports. You want to explore this unstructured text to better understand the information that it contains. Text Mining, based on a transformation of free text into numerical summaries, can pave the way for new findings.

    During a live demonstration of the new text mining feature in JMP we will start with a multi-step text preparation using techniques like stemming and tokenizing. This data curation is pivotal for the subsequent analysis phase, exploring data clusters and semantics. Finally, sharing text mining results with other JMP platforms takes familiar multivariate analysis and predictive modeling to a next level.
  • How to Evaluate Uncertainty of Categorical Measurements?

    Authors: Emil Bashkansky (ORT Braude College of Engineering), Tamat Gadrich (ORT Braude College of Engineering)
    Primary area of focus / application: Metrology & measurement systems analysis
    Secondary area of focus / application: Quality
    Keywords: Classication matrix, Categorical scale, Bayesian approach, Acceptance sampling
    Submitted at 25-Mar-2016 06:52 by Emil Bashkansky
    Accepted
    We show how to interpret sampled measurement results when they belong to a categorical scale. The proposed approach takes into account the sampled nature of observations and observation errors, and combines both with prior information (if it exists) about the studied population. The appropriate mathematical tools are presented, considering all these aspects, and providing an adequate description of the partition of the studied property by categories and its parameters. We demonstrate that the most likely or expected estimators may differ signicantly from those observed in the sample, and sometimes even conflict with the assumed confusion matrix. The technique of determining the confict-free region is presented, as well as the two-stage procedure of assessment updating, based on the verication of the accordance of the new observed information to the already available one. The main propositions of the paper are supported by numerical examples and graphs.
  • Evaluating Latent Ability by Binary Testing

    Authors: Emil Bashkansky (ORT Braude College of Engineering), Vladimir Turetsky (ORT Braude College of Engineering)
    Primary area of focus / application: Metrology & measurement systems analysis
    Secondary area of focus / application: Quality
    Keywords: Binary testing, Latent ability, Test item difficulty, Item response model, Maximum likelihood estimation, Placebo, Replications
    Submitted at 25-Mar-2016 07:01 by Emil Bashkansky
    Accepted
    Binary tests designed to measure abilities of the objects under test (OUT) are widely used in different fields of measurement theory and practice. The number of test items in such tests is usually very limited. The response to each test item provides only one bit of information per OUT. The problem of correct ability assessment is even more complicated, when the levels of difficulty of the test items are unknown beforehand. This fact makes the search for effective ways of planning and processing the results of such tests highly relevant. In recent years, there has been some progress, caused by both the development of computational tools and the emergence of new ideas. The latter are associated with the use of so-called "scale invariant item response models". Together with maximum likelihood estimation (MLE) approach, they helped to solve some problems of engineering and proficiency testing. However, a number of issues related to the assessment of uncertainties, replications scheduling, the use of placebo, as well as the evaluation of the multidimensional abilities are still representing a challenge for researchers. The authors attempt to outline the ways to solve the above problems.
  • Strategies for Sequential Experimental Design

    Authors: Rachel Silvestrini (Rochester Institute of Technology)
    Primary area of focus / application: Design and analysis of experiments
    Secondary area of focus / application: Other: Invited by Tim Robinson
    Keywords: Design of Experiments, Linear models, Optimal design, Military application
    Submitted at 31-Mar-2016 14:10 by Rachel Silvestrini
    Accepted (view paper)
    12-Sep-2016 14:00 Strategies for Sequential Experimental Design
    Sequential experimental design is the process in which trials or runs are added to a pre-existing set of experiments. There are many considerations regarding the nature of sequential experimentation such as how many runs should be added at a time and when should the experiment be deemed complete? In this presentation, two optimal experimental approaches are discussed; power optimal and D-optimal. Power optimal experiments are those in which the statistical power of a parameter in a linear model is maximized for the next set of experiments chosen. D-optimal experiments are those in which the variance of parameter estimates in a linear model is minimized. Military test and evaluation applications are used as the basis for examples and application of the sequential methods discussed.
  • Disaggregated Electricity Forecasting using Wavelet-Based Clustering of Individual Consumers

    Authors: Jean-Michel Poggi (University of Paris Sud - Orsay), Jairo Cugliari (Univ. Lyon 2), Yannig Goude (EDF R&D, Paris Saclay)
    Primary area of focus / application: Mining
    Secondary area of focus / application: Modelling
    Keywords: Forecasting, Clustering, Functional data, Electricity consumption
    Submitted at 1-Apr-2016 06:59 by Jean-Michel Poggi
    Accepted (view paper)
    13-Sep-2016 15:10 Disaggregated Electricity Forecasting using Wavelet-Based Clustering of Individual Consumers
    Electricity load forecasting is crucial for utilities for production planning as well as marketing offers. Recently, the increasing deployment of smart grids infrastructure requires the development of more flexible data driven forecasting methods adapting quite automatically to new data sets.

    We propose to build clustering tools useful for forecasting the load
    consumption. The idea is to disaggregate the global signal in such a way that
    the sum of disaggregated forecasts significantly improves the prediction of the whole global signal. The strategy is in three steps: first we cluster curves
    defining super-consumers, then we build a hierarchy of partitions within which the best one is finally selected with respect to a disaggregated forecast criterion.

    The proposed strategy is applied to a dataset of individual consumers from the French electricity provider EDF. A substantial gain of 16% in forecast accuracy comparing to the 1 cluster approach is provided by disaggregation while preserving meaningful classes of consumers.
  • The Effect of Miss-Specified Suspensions in Lifetime Prediction with Heavily Censored Data

    Authors: Nikolaus Haselgruber (CIS Consulting in Industrial Statistics GmbH)
    Primary area of focus / application: Reliability
    Secondary area of focus / application: Consulting
    Keywords: Lifetime, Suspended items, Reliability, Prediction
    Submitted at 4-Apr-2016 10:22 by Nikolaus Haselgruber
    Accepted (view paper)
    13-Sep-2016 14:30 The Effect of Miss-Specified Suspensions in Lifetime Prediction with Heavily Censored Data
    To provide point and interval estimates for lifetime problems, a representative random sample with lifetime observations is required. In frequent applications, at the point of analysis only a part of the individual sample items has reached end of life while the rest are suspended items, i.e., they are still alive at that time. If the lifetime information of the sample is taken from a warranty database, usually complete information about failed items is available while from the suspended items only the start time is known. In cases where the lifetime is not measured in absolute calendar time but, e.g., in operation time for industrial equipment or mileage for some vehicle components, the age of the suspended items at the point of analysis has to be estimated.
    This presentation shows the effect on miss-specified suspensions on the result based on simulated data and discusses possibilities to avoid this problem.