Avoid these mistakes with interaction models

When we’re trying to statistically work out whether a correlation, effect or treatment outcome is particularly strong (or even whether the sign varies) among certain types of units or is dependent on some other variable, we typically use interaction models – and more specifically, multiplicative interaction terms in some linear regression framework. Interaction models, like any model, provide a rich variety of possible mistakes and pitfalls. In this post I have gathered a few of the mistakes that we should avoid, of varying degrees of sophistication. Throughout, I assume that the reader is familiar with the basics of linear regression analysis.

I’ll probably be adding more to the list – feel free to holler if you think there is anything that should be on!

1. Failing to include constitutive terms

A mistake that perhaps isn’t very common nowadays is to neglect to include the constitutive terms. That is, to specify a model like the following:

Y=\gamma_{0}+\gamma_{1} XZ +e

rather than the correctly specified

Y=\beta_{0}+\beta_{1} X+\beta_{2} Z + \beta_{3}XZ +\epsilon.

The intuitive argument often goes along the lines of “well I’m not interested in the effects of X or Z by themselves, but only in their interaction, so therefore I should leave them out.”

This, however, is a misspecification that produces meaningless results. Why? Consider as an example a case where the true \beta_{2}=\beta_{3}=0, so that Y is truly just a linear function of X (and further, assume for simplicity that X and Z are uncorrelated). What will the model Y=\gamma_{0}+\gamma_{1} X Z+e really pick up then? The correlation between X and Y, so that \gamma_{1}=\beta_{1}. This extends even when \beta_{2}\ne 0, and in the presence of correlation between X and Z. In summary: an interaction model without the constitutive terms will force the interaction term to pick up the constitutive linear correlations.

2. Interpreting coefficients for constitutive terms as unconditional effects

A related error when interpreting coefficients from the correctly specified Y=\beta_{1} X+\beta_{2} Z + \beta_{3}XZ+\epsilon, is to take \beta_{1} and \beta_{2} as the unconditional effects of X and Z.

This is conceptually nonsensical since the point of including an interaction in the first place is to let the effect of X vary. We can also have a look at the marginal effect:


That is, \beta_{1} is the effect of X only when Z=0 (and conversely for Z when X=0). Obviously, this matters when we do get a significant interaction effect, but even when we don’t, the marginal effect of X is going to vary somewhat over Z, even if just due to randomness (it could even vary a lot when sample sizes are too small to detect a significant interaction).

3. Using linearly specified controls

Let’s say we suspect that the interaction effect between X and Z is in fact driven by some other variable, \varphi: suppose that the effect of some blood pressure medication decreases with age, but really the moderating factor is BMI (which increases with age). So we include a control!

Y=\beta_{1} X+\beta_{2} Z + \beta_{3}XZ + \beta_{4}\varphi +e.

Done? Unfortunately not: we have controlled for the linear effect of \varphi, but it may still be confounding the interaction between X and Z. To correctly control for a potential confounder in an interaction model, we have to include both of its interactions with X and Z as well:

Y=\alpha+\beta_{1} X+\beta_{2} Z + \beta_{3}XZ + \beta_{4}\varphi +\beta_{5}X\varphi+\beta_{6}Z\varphi.

There, that’s better!

4. Missing non-linear main effects: correlated IV’s

This one I’ve described before in much more  detail in this post. To recap briefly: if X and Z are also correlated with each other, any unspecified non-linearity in the main effects (X\rightarrow Y or Z\rightarrow Y) will be picked up by the interaction term instead. With large sample sizes, this means that even very small deviations from the specified linear effects are going to yield significant interaction terms. If your main independent variables are correlated, you have to make sure that their functional forms are correctly specified, or the interaction term may very well just be an artifact of violations of the linearity assumption.

5. Missing non-linear main effects: range of X dependent on Z

A special case of nr 4 above is when the main effect of X is truly non-linear, but the range of X is dependent on Z. Suppose, for example, that the functional form of Y=f(X) is positive and increasing (X+X^{2}, let’s say), and we only observe the higher values of X for higher values of Z. We will then tend to find larger linear coefficients for X when Z is higher – i.e., a positive interaction coefficient, that is yet again an artifact of the misspecified functional form.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s