ENBIS-8 in Athens

21 – 25 September 2008 Abstract submission: 14 March – 11 August 2008

My abstracts

 

The following abstracts have been accepted for this event:

  • Multivariate Expected Improvement Using Two-Sided Desirabilities

    Authors: Simone Wenzel and Joachim Kunert
    Affiliation: Technical University of Dortmund, Germany
    Primary area of focus / application:
    Submitted at 14-May-2008 06:45 by Simone Wenzel
    Accepted (view paper)
    22-Sep-2008 14:40 Multivariate Expected Improvement Using Two-Sided Desirabilities
    Efficient global optimization has become a widely used technique in engineering applications, especially in automotive and aerospace industry. In 1998 Schonlau, Welch and Jones introduced an efficient global optimization algorithm (EGO). Based on a small initial design a surrogate model is fitted and updated sequentially to find the global optimum. Updating points are obtained balancing the exploitation of the model information (which area is optimal) with the need of exploration where the uncertainty is high, i.e. the point with the largest expected improvement. Since 1998, many variations of the expected improvement criterion have been published adapting the EGO-algorithm for noisy or multi-objective problems. We consider a multivariate situation, where quality is measured by several variables. A common practice for such multi-objective problems is to transform the multivariate data into a univariate desirability index. However, the distribution of the desirability index and hence the calculation of an expected improvement is not easy. Some results on the distribution exist for one-sided desirabilities, but not for the two-sided cases. A full Monte Carlo Simulation is not feasible, because we need the expected improvement for every possible design point. Hence, a Monte Carlo Simulation would lead to too many simulations and be too time consuming. We therefore propose to get only a rough impression by calculating the improvement for only a small number of virtual observations at each design points. These observations are chosen using the mean and the variances of the surrogate model.
    The usability of the approach is demonstrated with a multi-objective optimization problem from mechanical engineering.
  • Modeling Mortality Pattern using Support Vector Machines

    Authors: Anastasia Kostaki , Javier M. Moguerza , Alberto Olivares , Stelios Psarakis
    Affiliation: Dept. of Statistics, Athens University of Economics and Business & Dept. of Statistics and Operational Research, Rey Juan Carlos University (Spain)
    Primary area of focus / application:
    Submitted at 14-May-2008 10:20 by Alberto Olivares
    Accepted
    22-Sep-2008 11:55 Modeling Mortality Pattern using Support Vector Machines
    A topic of interest in process modeling, in particular for demographic, biostatistical analysis as well as in actuarial practice is the graduation of the age-specific mortality pattern. A classical graduation technique extensively used by demographers and actuaries is to fit parametric models that accurately reproduce it. Recently, particular emphasis is given in graduation using non parametric techniques such as kernel estimators. Support Vector Machines (SVM) is an alternative methodology that could be utilized for mortality graduation purposes. This paper evaluates the SVM techniques as tools for graduating mortality rates. In that we apply this methodology to empirical death rates from a variety of populations and time periods. Additionally, for comparison reasons we also apply kernel techniques and fit the Heligman-Pollard model to the same empirical data sets.
  • A comparative study of tests for scale equality with application in industrial statistics

    Authors: Marco Marozzi
    Affiliation: Università della Calabria
    Primary area of focus / application:
    Submitted at 14-May-2008 10:36 by Marco Marozzi
    Accepted (view paper)
    23-Sep-2008 15:20 A comparative study of tests for scale equality with application in industrial statistics
    Situations where scale parameters are not nuisance factors to control but outcomes to explain arises often in quality control and industrial statistics. For example a measurable characteristic of a raw material must have some specified average value, but the variability should also be small to keep the characteristics of the end product within specifications, and so it is central to determine whether two samples of products are significantly different for what concern the variability.
    Comparing variances or other measures of scale is much harder than comparing means or other measures of location. There are two reasons for this (Boos and Brownie, 2004). The first reason is that normal theory test statistics for detecting location shifts are standardized to be robust to non normality via the central limit theorem, and then the corresponding test procedure have approximately the correct level. This is not true for normal theory test statistics for detecting scale shifts, which are not asymptotically distribution free, but depend on the kurtosis of the parent distributions. The second reason is that for mean comparisons the hypothesis that the populations may differ only in location is often appropriate allowing the use of permutation methods that have the exact level for any distributions, on the contrary for variance comparisons, the hypothesis that the populations may differ only in scale rarely makes sense since one usually wants to allow mean differences. Given that, it is necessary to adjust for unknown means or locations by subtracting means or other location measures, but the transformed data are not exchangeable and then permutation tests provide only approximately exact solutions (Good, 2000).
    The literature on tests for the equality of variances is vast (Conover et al., 1981), a test which usually stands out in terms of power and robustness against non normality is the W50 Brown-Forsythe (1974) modification of the Levene (1960) test in which the sample median replaces the sample mean as an estimate of the location parameter. In this paper we focused on the two-sample scale problem and in particular on Levene type tests. We consider ten Levene type tests: the W50 test, its bootstrap and permutation versions, the M50 (Pan, 1999) test, the L50 (Pan, 1999) test along with its bootstrap and permutation versions, the R (O’Brien, 1979) test its bootstrap and permutation versions. We consider also the F test, the modified Fligner-Killeen (1976) FK test and the two approaches of Shoemaker (1995 and 1999). Type-one error rate and power of the tests are investigated. We discuss the application of the tests to real data sets in the context of quality control and industrial statistics.

    Boos, D. D. and Brownie, C. (2004) Comparing variances and other measures of dispersion, Statistical Science, 19, 4, 571-578.
    Brown, M. B. and Forsythe, A. B. (1974) Robust tests for the equality of variances, Journal American Statistical Association, 69, 364–367.
    Conover, W. J., Johnson, M. E. and Johnson, M. M. (1981) A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data, Technometrics, 23, 351–361.
    Fligner, M. A. and Killeen T. J. (1976) Distribution-free two-sample tests for scale, Journal of the American Statistical Association, 71, 210-213.
    Good, P. (2000) Permutation tests, a practical guide to resampling methods for testing hypotheses 2nd ed. Springer-Verlag, New York.
    Levene, H. (1960) Robust tests for equality of variances. In Contributions to Probability and Statistics (I. Olkin, ed.) 278–292, Stanford Univ. Press, Stanford.
    O’Brien, R. G. (1979) A general ANOVA method for robust tests of additive models for variances, Journal of the American Statistical Association, 74, 877-880.
    Pan, G. (1999) On a Levene type test for equality of two variances, Journal of Statistical Computation and Simulation, 63, 59-71.
    Shoemaker, L.H. (1995) Tests for differences in dispersion based on quantiles, The American
    Statistician, 49, 2, 179-82.
    Shoemaker, L.H. (1999) Interquantile tests for dispersion in skewed distributions, Communications in Statistics – Simulation and Computation, 28, 189-205.
  • On-line Monitoring and Classification of Paper Formation using Image Analysis

    Authors: Marco S. Reis, Armin Bauer
    Affiliation: University of Coimbra, Voith Paper Automation
    Primary area of focus / application:
    Submitted at 14-May-2008 14:09 by Marco P. Seabra dos Reis
    Accepted
    23-Sep-2008 16:10 On-line Monitoring and Classification of Paper Formation using Image Analysis
    Paper formation (the distribution and intermixing of fibres in a paper sheet), plays a central role in paper products, and is usually evaluated off-line, with a significant delay relative to the high production rates achieved in modern paper machines.

    In this work, we address an approach for evaluating and monitor paper formation using images acquired with an especially designed sensor, in-line, in-situ and in real time. The methodology essentially consists of applying wavelet texture analysis (WTA) to raw images (Bharati et al., 2004), in order to compute a wavelet signature vector for each image, based on which the discrimination of images regarding different formation quality levels can be performed. A principal component analysis (PCA; Jackson, 1991) of such features confirms the differences in formation quality levels defined a priori, from visual inspection, and, furthermore, suggests a new subclass for abnormal samples, related to the bulkiness of fibre flocks.

    A PCA-MSPC monitoring approach is also proposed, providing good preliminary results when applied to the available images, as analyzed with the ROC curve for the method and confirmed with a Monte Carlo study using subimages with 1/4 of the size of the original ones.

    References
    Bharati, M. H., Liu, J. J. & MacGregor, J. F. (2004). Image Texture Analysis: Methods and Comparisons. Chemometrics and Intelligent Laboratory Systems, 72, 57-71.

    Jackson, J. E. (1991). A User's Guide to Principal Components. New York: Wiley.
  • Multivariate Class Prediction with Gene Expression Data

    Authors: Marco S. Reis
    Affiliation: University of Coimbra
    Primary area of focus / application:
    Submitted at 14-May-2008 14:12 by Marco P. Seabra dos Reis
    Accepted
    23-Sep-2008 15:20 Multivariate Class Prediction with Gene Expression Data
    Gene expression profiling has been widely used to perform genome wide studies with several purposes, such as: studying the molecular mechanisms of diseases and cell biology, find biomarkers for certain organism malfunctions, classify certain traits on the basis of gene expression patterns and discover new ones, etc.

    Gene expression data is acquired through DNA microarray technology, where genomic DNA sequences from genes immobilized in a solid matrix (probes) are hybridized with labelled mRNA representative of different cells states (targets). The magnitude of signal intensity at each probe location is then interpreted as a measure of the expression level of that particular gene, at the state corresponding to the label being analyzed.

    Microarray data have been widely analyzed through univariate techniques. This class of techniques try to identify those genes that most differentiate between the states under analysis (usually two), through F and t statistics, or through other sort of univariate methodologies, such as the “signal to noise ratio” (Golub et al., 1999) and the SAM (“Significance Analysis of Microarrays”; Tusher et al., 2001) methods. The simplicity underlying these methodologies enables them to adequately control classification error rates, such has the False Positive Rate (FPR), Family-Wise Error Rate (FWER) and the False Discovery Rate (FDR). However, they do tend to disregard the cooperative behaviour of gene expression, i.e., their combined activity under cell certain conditions. This turns out to be a significant drawback of the univariate methodologies, as it is well known that gene activity is rarely an isolated result of the action of a single gene, but a consequence of a cascade of events where several genes clusters participate.

    In this context, multivariate approaches offer more flexibility for describing gene co-expression patterns, but also present some methodological limitations. For instance, Fisher Discriminant Analysis (FDA) requires the number of variables (genes in microarray data) to be less than the number of observations, a condition not met in practice. Therefore, such multivariate techniques do require a preliminary stage of variable selection, usually based on univariate approaches, where data dimensionality is reduced until the necessary condition for applying multivariate methods are met. On the other hand, it is not expected that all genes participate in each physiological response, but only clusters of functionally related genes, and therefore the methods should be able to identify such clusters of genes

    In this work, an intrinsic multivariate approach is presented where the preliminary variable reduction stage is not required, but that can still be conducted after a first run of the proposed methodology, on the basis of multivariate information generated in such first trial. The approach combines PLS-DA and FDA (PLS-DA standing for “Partial Least Squares for Discriminating Analysis”), and has incorporated a “non-classification” analysis, enabling the assessment of the uncertainty for each class prediction, according to two distance measures of the expression profile under analysis to training dataset entities. We also propose a genes VIP (variable importance in projection) metric for the combined PLS-DA/FDA methodology, in order to identify key genes segregating the different classes.

    The approach is illustrated using a well known data set (Golub et al., 1999), where different expression phenotypes were measured in samples from patients with different types of leukaemia: acute lymphoblastic leukaemia (ALL), subdivided according to their lineage (ALL-B and ALL-T) and acute myeloid leukaemia (AML).


    References
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeeck, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537.

    Tusher, V.G., Tibshirani, R., Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA, (98), 5116-5151.
  • Optimal Change-Point Detection for EWMA Processes

    Authors: S. Psarakis and A.N. Yannacopoulos
    Affiliation: Department of Statistics - Athens University of Economics and Business
    Primary area of focus / application:
    Submitted at 14-May-2008 14:30 by Stelios Psarakis
    Accepted
    23-Sep-2008 09:40 Optimal Change-Point Detection for EWMA Processes
    We revisit the problem of change-point detection in EWMA processes by employing the reformulation as an optimal stopping problem. The optimal stopping rules are obtained by the solution of variation inequalities. In certain limiting cases the connection with optimal stopping problems of associated Brownian motion is made. Furthermore, we consider the application of the results in selected problems.