Ethical considerations

Statistical analysis can be misleading. Hence expressions like: how to lie with statistics. A simple example is prediction outside the range of the original data. To avoid such mistakes, the model designer should always keep in mind where the underlying data came from, accept the associated limitations and, above all, use his or her common sense. When building or evaluating a regression model, plausability questions should always be asked, such as:

• Are the signs and values of the regression coefficients in line with a priori intuition? This can be of importance to identify invisible multi-collinearity.

• Is the fit of the data in line with expectations? A bad fit could be improved by refining the model. An extremely nice fit could point at modelling a trivial relationship or at deleting too many observations that do not fit nicely.

• Can observations that do not fit be explained? If not, is the adopted theoretical framework underlying the regression possibly incorrect?

In short, the modeller should always display genuine self-criticism.

Ethical considerations come into play when the model designer is deliberately manipulating when constructing the regression model (Berenson and Levine, 1996). The key here is intent. Unethical behaviour occurs when regression analysis is used to:

1. Forecast a response variable of interest with the wilful intent of possibly excluding certain variables from consideration in the model;

2. Delete observations from the model to obtain a better model without giving reasons for deleting these observations;

3. Make forecasts without providing an evaluation of assumptions when he or she knows that the assumptions of least squares regression have been violated.

Such manipulations are in the longer term disastrous, as even without them it is already difficult enough to achieve that layman decision makers will accept the outcomes from computer models presented by experts.

