ENBIS-7 in Dortmund

24 – 26 September 2007

My abstracts


The following abstracts have been accepted for this event:

  • Analysis of the efficiency of the new pattern recognition’s methods for the control charts

    Authors: Adam HAMROL, Agnieszka KUJAWIŃSKA
    Primary area of focus / application:
    Submitted at 23-Aug-2007 11:48 by
    The paper concerns the analysis of process stability with the use of process control charts. A new idea of pattern recognition and two original methods of data processing, called OTT and MW are described. The software application CCAUS (Control Charts - Analysis Unnatural Symptoms) supporting process control charts analysis with OTT and MW is presented as well. The paper contains also results of the verification of the proposed methods performed on the basis of data obtained from two machining operations.
    Process control charts are used in order to identify occurring of special-causes disturbing a monitored process. When the process is under statistical control, the points on the control chart should follow a random pattern and measurements are to have a normal distribution. There have been many special patterns on the chart indicating that the process lost its stability and a process operator should take a corrective action. As a result the operator has to track the pattern on the control charts and decide whether the process should be corrected or not. He should have a profound knowledge about the process, about the possible sources of special causes and about efficiency of correcting actions. There is always a risk that an experienced worker may resign from his post and the company will thus lose his knowledge.
    Problems above-mentioned, concerning the analysis of control charts have been removed with substitution for human intelligence with the artificial intelligence tools. It was possible by designing and programming certain methods of pattern’s classification on the process control charts, called OTT and MW.
    Developing the OTT and MW methods provided the author with good results of process stability assessment. The methods proved to be more efficient than the operator- the human. The developed methods let the experts create a set of unconventional patterns of process instability, which significantly widen the range of their application.
    Practical Implications (if possible)
    The verification of the developed methods was carried out on the basis of data obtained from grinding and superfinish processes. It turned out that both methods are more effective then human process operator. A special software application was developed in order to support data processing.
  • Implementation of a Kolmogorov-Smirnov-type test for the partial homogeneity of Markov processes with application to credit ratings

    Authors: Frederik Kramer and Rafael Weissbach (University of Dortmund, Dortmund, Germany)
    Primary area of focus / application:
    Submitted at 25-Aug-2007 19:41 by
    In banking, the default behaviour of the counterpart is not only of
    interest for the pricing of transactions under credit risk but also for
    the assessment of a portfolio credit risk. Typically, the estimation of
    credit rating transitions is based on a homogeneous Markov process, i.e.
    the assumption that the migrations have constant default intensities. The
    estimate is only unbiased if the assumption holds. However, the recent
    release of new regulatory capital requirements for financial
    intermediaries, named Basel II, requests estimating the probability of
    default devoid of any (known) bias. We use a test against the hypothesis
    that default intensities are chronologically constant within a group of
    similar counterparts, in this case a rating class. The
    Kolmogorov-Smirnov-type test builds up on the asymptotic normality of
    counting processes in event history analysis. Right-censoring accommodates
    for Markov processes with more than one no-absorbing state. A simulation
    study confirms the consistency as well as the sufficient power of the test
    in practice. We demonstrate the implementation of the test and show some
    computational problems and numerical effects while calculating the test
    statistic. The final test statistic is based on the maximization of
    statistics. While the maximization must be performed on a discrete grid,
    we show the effect of using different numbers of grid points. For smaller
    numbers the maxima are more likely to slip through the grid and the test
    looses the actual level (and power). Two examples of rating systems show
    inhomogeneities for few migrations to neighbouring rating classes.

    Specifics: Mr Kramer would prefer to give a talk.
  • Industrial Data Mining - A real life example of simulation and optimizing an entire semiconductor fab with heavy duty 6 sigma data mining tools.

    Authors: Marc Anger (StatSoft (Europe), Hamburg, Germany)
    Primary area of focus / application:
    Submitted at 28-Aug-2007 08:59 by
    Production of wafers is a real headache in semiconductor industry. Despite accurate planning and controlling there are still systematic and random effects, which influence the production yield. Considering the life cycle of semiconductor designs, it is key to commercial success to shorten the production ramp up and then have a maximum yield. Sometimes it is more productive to have simple and actionable engineering rules than deep understanding of the root causes.

    Traditional tools like wafer maps suffer from just representing the status quo, when actually the damage is already done. Therefore the production was optimized, using additional data mining techniques:

    - Visualization of response and influencing variables to understand the characteristics.

    - Feature selection to find the relevant ones among 1.400 equipments (each with 10 to 300 tools).

    - Prediction models like CaRT, CHAID, gradient boosting trees, MARSplines and neural network were used to show interactions between the equipments and between the tools and to find good and bad combinations. Learning from this, brand new equipment can be classified, whether it raises the yield or tends to produce scrap.

    - STATISTICA QC Miner was the software behind the scenes.

    - Simple and actionable rules were derived from the analysis and yield was significantly boosted.
  • Validating Clinical trials protocols with Simulations

    Authors: Tony Greenfiled, Ron S. Kenett
    Primary area of focus / application:
    Submitted at 29-Aug-2007 12:40 by
    Clinical trials are on the critical path of drug and treatment development. They are expensive in time as well as in money. A clinical trial is essential before any new, and perhaps revolutionary, product can reach the market. The trial protocol is a statement of the design of the clinical trial and how it will be managed, and how a multitude of assumptions will be tested, empirically. The trial will determine if the proposed treatment is actually doing what its sponsors claim it can achieve.

    Clinical trials raise complex statistical and ethical issues. A clinical trial that is not properly designed statistically, for example with very low power, can be considered unethical. But an over-designed trial, which lasts a long time and involves too many patients, is also unethical. The former may fail to show that a drug is more effective than its comparator, so patients will have been submitted to a trial with little hope of a useful result. The latter will require some patients to continue receiving the less effective treatment longer than necessary and it will delay the marketing of the more effective drug.

    Protocols of clinical trials are traditionally designed by medical experts with the help of statisticians. The main role of a statistician has typically been to determine sample sizes. However, the evaluation of the trial strategy involves many parameters not addressed by simple power calculations based on t-tests or ANOVA.

    In this work we describe how, using specially designed simulations, we can evaluate a clinical trial protocol and assess the impact of various assumptions such as drop out rates, patient presentation rates, compliance, treatment effects, end point dependencies, exclusion criteria and distributions of population and response variables. The evaluation will focus on the overall power of the trial to detect clinically significant differences and its cost. We demonstrate the approach with a case study.
  • Paired Comparisons in Visual Perception Studies using Small Sample Sizes

    Authors: J. Engel and R. Rajae-Joordens
    Primary area of focus / application:
    Submitted at 31-Aug-2007 10:02 by
    Paired comparisons is a useful method for experimental design with various applications like perceived crime seriousness and measurement of health status. Industrial applications concern the relative importance of factors before including them into an experiment, consumer tests in the food industry and visual perception research. Visual perception researchers perform experiments on display systems and they ask subjects to compare or rank displays according to a specified criterion, such as brightness or sharpness. In this way, they investigate perceived differences between displays to hopefully understand these from physical specifications. A classical model for such experiments is the Thurstone model of the form H(pAB) = a – b, where pAB is the probability that display A is preferred over display B, and a and b are scores for the two displays. It is important to estimate scores, to test differences of scores and to test the effect of factors like image content and gender of subjects. For these purposes, Generalized Linear Models (GLM) appear to be very useful.

    We shall firstly embed the Thurstone model into GLM and discuss a multiple testing procedure for differences of scores, controlling the family wise error rate. Further, we present tests for factors on scores. Secondly, in a simulation study we find testing power as a function of the number of subjects. Finally, a case study will be worked out as an example and we end with some discussion points.
  • Notes on Experimental Design for Statistical Calibration

    Authors: Paolo Cozzucoli
    Primary area of focus / application:
    Submitted at 31-Aug-2007 10:12 by
    Let's consider for example the problem involving measurements of a specific pollutant in samples taken to monitor the air pollution; suppose that we can use two different instruments and/or methods to do that: one very precise but slow and expensive and another very quick, cheap but less precise. Using measures less precise Y we want to estimate the true value of pollutant X; this is a typical calibration problem. A statistical calibration problem is usually carried out in two distinct stages. At a first stage, responses are observed corresponding to known regressor values; using these observations the operator obtains useful information about the calibration curve. At a second stage, one or more responses are observed which corresponding to an unknown value of the regressor; the estimation of this unknown value is of primary importance for prediction. In general, the methodology was developed for point and interval estimation and extensively applied in Chemistry, Biology and Engineering. For a general review and references on the calibration see Osborne (1991). In this paper we consider a specific experimental calibration design for statistical calibration, by assuming a linear model, that improves the estimation of the calibration curve. We investigate this improvement in the estimation considering the corresponding confidence intervals. We are interested to show that the confidence intervals are are shorter than those obtained by the standard design.