ENBIS: European Network for Business and Industrial Statistics
Forgotten your password?
Not yet a member? Please register
ENBIS9 Goteborg
20 – 24 September 2009 Abstract submission: 1 February – 31 May 2009Generic implementation format of predictive statistical models using Python
23 September 2009, 12:00 – 12:15Abstract
- Submitted by
- Ilmari Juutilainen
- Authors
- Juutilainen, Ilmari
- Affiliation
- Data Mining Group, University of Oulu
- Abstract
- Sharing and exchange of fitted statistical models between different systems and modeling software has recently retained attention as the variety of applications of statistical modeling has increased. A standardized implementation format for predictive models would enable the full distinction of statistical model and the software which actually utilizes the model. This would be a clear advantage in the maintenance of software applications: the updating of models could be separated from the updating of application logic.
In this study, a standardized format to present and implement predictive statistical models is presented. The format is applicable to models having a single response variable measured in real scale and multiple explanatory variables measured also in real scale. The proposed format comprises of a Python script file that determines the conditional distribution of the response, and a supplementary XML-file that document the explanatory variables of the model.
The actual implementation of each predictive model is a Python file. The model file must implement a public application programming interface (API) consisting of three functions. The functions get an input variable vector as argument and return the corresponding conditional mean, deviation and cumulative distribution function. Optionally, the API can be extended by a function that returns the inverse cumulative distribution.
A case implementation of the proposed format is presented and analyzed. The results show that the format allows easy integration of predictive models into different software. The execution speed is satisfactory for most applications as thousands of predictions per second are produced.
The proposed implementation format provides several advantages: The model implementation is a Python-file that can be read, understood and edited by human. Python-implemented models are system and platform independent and accessible from other programming languages. And finally, the proposed approach is general and can be employed with any model families.