Confounders, mediators, moderators and covariates

I recently put together some slides to explain mediators and mediation analysis to some people who knew slightly less than I did on the topic.

I started looking for some nice examples that would describe what a mediator was. I found plenty. Of course, it was also important to pre-empt confusion between similar and related terms, and since mediators and confounders are regularly mixed up I also looked for nice examples of confounders. Again, i found lots of good examples. For completeness I looked for examples of moderators and covariates. What struck me was that there were nice examples for each term separately but i couldn’t find a really comprehensive example that included them all and clearly delineated the difference between them. So I put my mind to it. And failed.

But then I asked someone cleverer than me (one of my former PhD supervisors) and he provided the bones of a nice example to which I have added. I can’t claim that the result is as clean and clear as I would like, but it is the best I have and I would welcome corrections and clarifications.

So here is the basic formulation.

Slide 3

We have an exposure (or treatment if you are trials minded) that we think is associated with an outcome of some type. Mediators and confounders are similar except for the direction of effect between them and the exposure/treatment. Mediators are additionally characterised by lying on the causal pathway between exposure and outcome. Moderators are simply interaction terms that change the size or direction (or both) of the effect of the exposure on outcome. I have represented them here using a vertical line between exposure and moderator that feeds into an arrow leading to outcome (which isn’t conventional but represents the relationship better). Meanwhile, covariates are variables that might affect outcomes but are not associated with anything else. So far, so confusingly boring. But hopefully grounding this stuff in a concrete example will help. Here’s the one I have.

 

Slide 4Okay, so maternal deprivation is associated with mothers giving birth to babies with lower birthweight. This is well established. So well established I can’t find the reference. Let’s move on.

Next the mediator. The causal pathway through which deprivation might act could be through diet, i.e. being poor might mean mothers can’t afford good food and their diet consequently suffers. This sub-optimal diet is the same diet nourishing the unborn child who is smaller as a result. Mediation analysis can formally test whether this hypothesis is true.

The confusion between mediators and confounders arises from the fact that both have associations between the exposure and outcome. Now the confounder I have chosen is age. According to figure 2 here, there is an association between maternal age and deprivation. Logically, we do not allow deprivation to influence someone’s age, so the arrow only goes from age to deprivation. I like that about this example. We are also assuming that maternal age is associated with low birthweight which it seems to be.

Smoking is what I have chosen for my moderator. Now this study investigated black smoke (so air pollution) and did indeed find a significant interaction (so moderation) between deprivation and black smoke exposure on birthweight. This means that deprived mothers who smoke produce babies even smaller than the separate expected effects of  smoking and deprivation. I’m gonna go ahead and call that a win.

Finally, there may be additional covariates that we want to control for such as maternal height. This could just as easily be called genetic factors or something else, but the point being that maybe there are things we need to account for as we think they are related to outcome (but nothing else).

DSCN1481
A simplified pathway diagram

Of course while cherry picking systematically reviewing the literature I found this which observes that there is an interaction between maternal age and deprivation. This would indicate that age may be a moderator as well as a confounder. And smoking and deprivation are also linked. And this really is where it becomes clear that in most situations the pathway diagram is complicated and uncertain. Some of the relationships will be stronger than others, there will be associations between practically all of the variables and there will be feedback loops all over the place.

Moreover, I have come to the conclusion that mediators and confounders are kind of intuitive. People understand them when given an example. Clear examples of interactions/moderation are rarer and I suppose that is because interactions are counter-intuitive. They are effects that are different from the sum of their parts and real-life examples that everyone recognises are not commonplace (prove me wrong commenters – I’d love some better examples). [UPDATE: i have come up with an example that parents might appreciate. You have two kids. Each on their own is well behaved and adorable. Put them together and you might expect double the cuteness, but that is often not the case- they bicker and argue and vie for supremacy. Their cuteness is not the sum of their cutenesses, they are each less well-behaved than on their own. This is your classic qualitative interaction]

Nevertheless, and I’d like to finish on an ill-deserved upbeat tone, the discipline of clearly describing what we think the relationships and pathways are is incredibly powerful and important to hone. In particular pre-specifying the order we expect things will happen allows us to identify and test causal hypotheses. In most situations, the process of committing this to paper is the hardest part. It is only once that step has been made that statistics can really begin to be applied (of course statistical thinking can help you get there too). To paraphrase Indiana Jones’ dad* once the pathway is clear the appropriate statistical approach often presents itself. (I would like to thank Zoe Kelson and Daniel Farewell for constructive comments on earlier drafts)

Solution presents itself

* I couldn’t get a picture of Indiana Jones’ dad with this text. Here’s one of Han Solo instead.

18 thoughts on “Confounders, mediators, moderators and covariates

  1. Found this through google while looking for mediation moderation vs. confounding interaction–suspect I’ll be a regular visitor!

    Like

  2. Thanks so much for this posting! I have been searching for this information as this issue was recently raised in response to some analyses I have been working on.
    I was wondering if you might be able to provide a citation that discusses this? Any information would be great! Thanks!

    Like

    1. Hi Paul,

      I’m afraid i don’t have a single citation that covers all of this stuff. If i come across one i will update

      Mark

      Like

  3. beautifully explained. Thanks a ton !
    Few more question, please:
    1. Partial and complete mediation?
    I think what you’ve mentioned is partial mediation and the example of complete mediation is:
    x1 affects x2 which affects Y.
    x1 has no direct impact on Y.

    Please correct, if I am wrong. Please also provide suitable example.

    2 Also, somewhere I found: Intervening variables are also called mediating variables.

    3. And there is a phrase ‘control variable’. So control variable is moderator or mediating variable.

    Waiting for your answer, as nowhere I can find such a simple explanation. Thanks again.

    Like

    1. Hi Learner!

      1. Yes, your interpretation of partial and complete mediation is correct. My example does indeed have partial mediation. I guess an example of full mediation would be something like x1 = “time spent outdoors”, x2 = “being bitten by mosquitoes” and Y being “contracting Zika virus”. This assumes that Zika is only delivered through mosquitoes. You could see that time spent outdoors might increase your chance of contracting Zika, but only through being bitten. I suppose i would say that full mediation is similar to confounding (i.e. the relationship is entirely driven by something else). The only difference is the direction of the relationship. Thanks for this comment.

      2. Yes, mediator, mediating variable, intervening variable and intermediary variable are all synonyms

      3. A control variable is a term from basic science which refers to a variable held constant throughout the trial in both arms. So, if you were checking the effect of fertilisers on plant growth, you would want to keep the exposure to light the same in both groups. Light exposure would be a control variable.

      Thanks

      Mark

      Like

  4. I guess I’m not the only one to find this site via web search of proper terminology!

    Recently I started a text analysis project that aims to check if any study controlled for confounders that might distort what the true connection between two variables are.

    I’m doing this by using python scripts to check for the absence or presence of keywords associated with those third variables.

    This is the best overview of how those statistical terms relate to one another, I’ll cite this in the future if anyone asks what they mean. Thank you for posting this!

    Like

    1. Thanks AnalyticAscent,

      See the response to Learner for some more synonyms Sounds like a challenging project. Best of luck

      Mark

      Like

  5. My understanding is moderator is the same as effect modifier, and effect modifiers are also in the causal pathway as mediators. Am I right?

    Like

    1. Yes, i think most people use moderator in the same way as effect modifier.
      You could have an effect modifier that was not a mediator (using the Baron and Kenny approach).

      Like

  6. Hi, I was looking for an explanation of the difference between mediators and confounders and came across your blog. Thank you! The diagram is especially helpful to understand the difference.

    Like

  7. Hi,

    I thought of a real-life example of a moderator that I think is pretty clear, and helps me when thinking of interaction:

    For asbestos exposure, the outcome of lung cancer is moderated by tobacco smoke. The risk of lung cancer for those who are exposed to asbestos AND who are smokers is 50-90 times higher than for those who are exposed to asbestos and who are not smokers.

    As a historical footnote, the Lorillard company used to sell a brand called “Kent”, which had the most dangerous form of asbestos in the filter in the 1950s. They sold 11.4 billion of those cigarettes.

    https://www.asbestos.com/asbestos/smoking/

    Like

  8. I just came across this… thank you Mark.

    I do have an important comment. I confounder must not be on the causal pathway. In contrast, a mediator must be on the causal pathway.

    Why does this matter? In analysis, such as regression, if you control for a variable as if it were a confounder, but it is actually on the causal pathway, your results could show false positives, false negatives, possibly even a false protective effect from a toxic exposure. If you control for a mediator you can have the same problem. However a mediator can be valuable for explaining how an exposure affects an outcome.

    This business of getting causal paths right and not controlling for things on a causal path is really important for getting causality right. E.g. if you manipulate X you can expect a specific effect on Y.

    You said something similar in your last paragraph, which I really like.

    Like

Leave a comment