ENBIS-16 in Sheffield

11 – 15 September 2016; Sheffield Abstract submission: 20 March – 4 July 2016

Bayesian Networks or Regression: Which Model is more Useful in my Case?

12 September 2016, 10:00 – 10:20


Submitted by
Jan-Willem Bikker
Jan-Willem Bikker (CQM)
The Dutch consultancy firm CQM (Consultants in Quantitative Methods, ~35 consultants) supports many customer’s projects in R&D, but also in planning and supply chain. Naturally, we also look at trends in machine learning and data science. For one relatively new type of project, we tried two approaches: a statistical approach using regression/ANOVA type models, and one using probabilistic networks or Bayesian networks, a machine learning technique popular in e.g. computer science. Both approaches can be applied to a project where a user’s overall rating of a product is linked to assessments of underlying aspects of the product. We use such models to identify the product’s most important aspects, realizing the limitations of an observational study. One difference between both modeling approaches is a quantification of sampling uncertainty that Bayesian networks lack, which we find important when reporting the results. Another is the type of effect that is estimated: regression focuses on the underlying truth whereas Bayesian Networks focus on prediction of future observations, which leads to different effect sizes. The latter can be adapted, and then we see that these two models (and a few others as well) then give very similar results. This reassuring result is possibly typical of classical statistical methods vs machine learning methods. Finally, we also encountered a variant of this case study in which 95% of the predictor values are missing values, posing some modeling challenges. The discussion includes possibilities of imputation, SEM with missing values, and an analysis with data in long format.

Return to programme