ENBIS9 Goteborg

20 – 24 September 2009 Abstract submission: 1 February – 31 May 2009

Generic implementation format of predictive statistical models using Python

23 September 2009, 12:00 – 12:15


Abstract

Submitted by
Ilmari Juutilainen
Authors
Juutilainen, Ilmari
Affiliation
Data Mining Group, University of Oulu
Abstract
Sharing and exchange of fitted statistical models between different systems and modeling software has recently retained attention as the variety of applications of statistical modeling has increased. A standardized implementation format for predictive models would enable the full distinction of statistical model and the software which actually utilizes the model. This would be a clear advantage in the maintenance of software applications: the updating of models could be separated from the updating of application logic.

In this study, a standardized format to present and implement predictive statistical models is presented. The format is applicable to models having a single response variable measured in real scale and multiple explanatory variables measured also in real scale. The proposed format comprises of a Python script file that determines the conditional distribution of the response, and a supplementary XML-file that document the explanatory variables of the model.

The actual implementation of each predictive model is a Python file. The model file must implement a public application programming interface (API) consisting of three functions. The functions get an input variable vector as argument and return the corresponding conditional mean, deviation and cumulative distribution function. Optionally, the API can be extended by a function that returns the inverse cumulative distribution.

A case implementation of the proposed format is presented and analyzed. The results show that the format allows easy integration of predictive models into different software. The execution speed is satisfactory for most applications as thousands of predictions per second are produced.

The proposed implementation format provides several advantages: The model implementation is a Python-file that can be read, understood and edited by human. Python-implemented models are system and platform independent and accessible from other programming languages. And finally, the proposed approach is general and can be employed with any model families.
View paper

Return to programme