Mendelian Randomisation and the Prevention of Spurious Findings

19 April 2012 by Suzi Gage, posted in Uncategorized

Epidemiology uses statistical methods to investigate patterns in public health. Or, if you believe certain news articles, it splits every possible life influence into things that either cause or cure cancer (or occasionally do both). Epidemiologists investigate health outcomes in a number of different ways, ranging from looking at changes in whole populations to conducting randomised controlled trials (RCTs), testing one intervention against another, or against the placebo effect. RCTs are the ‘Gold Standard’ of Epidemiological research, as their design means they can provide the strongest evidence for or against a hypothesis. Other epidemiological methods involve observation, rather than manipulation, and this can be problematic.

One week there may be a paper suggesting vitamin pills protect against heart disease, and the next week evidence emerges that they don’t. This is often because of misleading findings from observational studies. People who decide to take vitamin supplements are likely to be different in a number of ways from people who don’t take them. They might lead a more healthy lifestyle, exercise more, smoke less, and eat more vegetables. These differences are likely to impact on their chance of getting heart disease. Epidemiologists call these differences confounding variables, and although it’s possible to take them into account in various statistical ways, the scientist has to know what they are in order to control for them. Miss one confounder out of your analysis, and it may inflate any relationship between the two factors you’re interested in, so you’ve got a spurious finding. One way of getting round this problem is to conduct RCTs, so different types of people are randomly distributed across your intervention condition. Although this may be fine to do with a vitamin pill, it’s sometimes not ethical or even possible. If withholding the intervention being investigated is thought to lead to harm, it is unethical to do so, and if the intervention is something like drinking alcohol, it’s impractical to get people to either drink or not drink for an experiment.

Investigating alcohol in observational studies has the same problems mentioned above; people who drink are different to people who don’t. Socioeconomic status, likelihood to smoke and education level may all affect the relationship you’re interested in. There are added complications in that people may have stopped drinking because they are unwell, so instead of drinking affecting the disease, the disease is changing alcohol use. This is called reverse causation, and may explain why we hear that a glass of wine is good for you (sorry to be the bringer of bad news).

But advances in the understanding of genetics can help. Everyone has 23 pairs of chromosomes, and coded in these are genes; the building blocks for who we are. My genes are practically identical to yours, but there are some key genes which have different DNA code in different people. Some of these differences can lead to diseases, such as Cystic Fibrosis, caused by a faulty gene not properly making an important protein. Some have less extreme, but still very interesting, effects. There has been a gene variation found, common in East Asian populations, which means a protein needed to break down a metabolite of alcohol is not produced. People with this variation get unpleasant symptoms when they drink due to a build up of acetaldehyde in their blood, so very often they avoid alcohol. By looking at someone’s DNA in the location known here, we can investigate the effect of alcohol use on whatever we’re interested in, using the gene as a proxy variable instead of directly analysing alcohol intake, as people with the unusual variation will be less likely to drink. All well and good, but surely this is still just observing, so how do we stop the interference of confounding?

It turns out that our genes have some very useful properties which make them perfect for this task. The genes that you have are all unrelated to environment, as you got them before you were born, so environmental confounders should be randomly distributed between your gene categories. Also, when your parents’ chromosomes divided to create the egg or sperm that contained the genes you inherited from them, each gene splits independently of all the others, so you have a random chance of also inheriting a genetic confounder. Because of these neat properties, you can assume that your proxy gene will be independent of any confounding variable affecting the exposure you’re interested in, and therefore where confounder levels would have been uneven across people grouped by alcohol consumption, they will be randomly distributed across gene variation. Results using this technique have shown evidence against alcohol being a gateway drug leading to illicit drug use, as the gene variation was not associated with illicit drug use.

Of course, nothing’s perfect. There are a few conditions where this technique will fall down, but as long as you’re aware of them, you should be able to avoid the problems. Firstly, occasionally ‘linkage disequilibrium’ occurs. Certain genes are more likely to move together during meiosis, meaning they are not inherited independently. If your proxy gene travels with a gene which affects your outcome of interest, this will impact on your findings. There is a method to check for linkage disequilibrium, so you can ensure it’s not a problem. Also, there are certain genes which have an impact on a number of different traits (this is called pleiotropy), so if your gene has a direct effect on the outcome you are interested in, it is unsuitable for Mendelian Randomisation. Finally, the technique fails to work effectively if there are systematic differences in the genetics of the population you are investigating. For example, if a population is made up of two groups of peoples that used to live separately, but now live together, there will be non random genetic differences between the groups due to selective mating over the time when they were separate. This may mean other differences between the groups will not be randomly distributed across the gene you’re interested in, making the population unsuitable.

However, although this technique can only be used in very specific circumstances, where a gene is known to affect the intervention you’re interested in, and doesn’t suffer from the limitations mentioned above, it is a really elegant technique which will hopefully stop the spurious results from observational studies becoming newspaper fodder.

Check this article out for an overview of MR.

9 Responses to “Mendelian Randomisation and the Prevention of Spurious Findings”

  1. Pete Etchells Reply | Permalink

    Great blog post Suzi, really interesting. How difficult was it to get to grips with these techniques? It all sounds kinda scary!

  2. Suzi Gage Reply | Permalink

    Hi Pete!
    It’s not difficult to get to grips with at all! The analysis itself is much easier than a normal regression because you don’t need to put in all the confounders, as you normally would. The difficult part is finding your genetic proxy!

  3. leo oberoi Reply | Permalink

    Good Evening Suzi
    it is a great informative piece of work you are sharing with us it is really informative.
    Mendelian Randomization is like a new term to me, but as a biology person i have a little knowledge about genes etc,

    i would like to add one more aspect in your information. as the epidemiologists, medical geneticists also doing the same job of getting the fruitful informative pattern out of the vast genetic information we have.
    bioinformatics is the key to do this. i would like to hear from your side about your thinking regarding the same

  4. Suzi Gage Reply | Permalink

    Hi Leo.
    Thanks so much for your comment – I’m really glad that you found the article informative.
    Bioinformatics is a field I know little about. I agree though that it and epidemiology can fit hand in hand – finding relevant genotypes and suchlike is essential to techniques like Mendelian Randomisation. How do you think bioinformatics can help us in medicine?

  5. Mat Hickman Reply | Permalink

    Hi Suzi,

    Interesting article and always good to read about good applications of genetics! 23andMe has a good website, including a guest login, that illustrates many of the characteristics that genes contribute to (with a big health warning that lots of these associations are *not* definitely proven -- similar companies may report entirely contradictory effects from the same gene!)

    I had a couple of thoughts:
    - We tend to try and avoid saying that people with Mendelian conditions like cystic fibrosis have 'faulty genes' as it can suggest that they are themselves in some way 'faulty' or 'at fault' -- in particular the parents (who must also have [at least] one copy of that gene variant) might feel at fault for passing on a faulty gene. Usually we'd just talk about a changed version of a gene. (Just to really tie you in vocabulary knots, 'normal' is also a word best left avoided as it immediately suggests people without the normal version of a gene are abnormal...!)

    - I'm sure you're aware that epigenetics is being studied increasingly; that is, the overlying control exerted on genes, often through environmental influences. I realise it would've made the article a lot longer to put in all the caveats, but I think it's important to pay lip service to things like imprinting and other epigenetic mechanisms, which mean that at least some of the genes you inherit *are* directly affected environmental influences.

  6. Suzi Gage Reply | Permalink

    Hi Matt, thanks for the comment. Of course you're right that 'faulty gene' is an unhelpful term, in fact I think I will edit that out as you're right, having family connections with CF, the last thing I want to do is add to any confusion/unhelpfulness there.

    Personally, I'm a bit wary of websites like 23andme, I think talk of 'the gene for X' is pretty unhelpful. Indeed, in terms of the relationships of genes mentioned with alcohol and tobacco use, they're by no means 'the gene for smoking' or 'the gene for drinking', and it's important to keep that in sight at all times. There are plenty of other influences, particularly on environmental factors like smoking and drinking behaviour.

    As for epigenetics, that's true in terms of the expression of genes, but as I understand it epigenetics doesn't affect whether you inherit certain genes or not. Of course your genes will interact with your environment, for example, you cannot become addicted to tobacco if you never try a cigarette.

  7. Mat Hickman Reply | Permalink

    Yes, 23andme etc. largely paint a simplified picture of things, which can be misleading. It's very difficult to get a handle on the extent to which many genes influence characteristics/susceptibility to different conditions (as a very broad catch-all). And absolutely right about 'the gene for', which is only appropriate if followed by 'the protein/RNA...', but not by a phenotype -- suggests deterministic action of genes that is far from accurate.

    Yup, epigenetics doesn't affect whether or not you inherit that gene, but if a gene is completely switched off (from conception), it's tantamount to not inheriting it. I suspect it's rare for both maternal and paternal copies of a gene to be both completely silenced; but nonetheless, presumably genes do run the risk of becoming confounding if they're expressed at significantly different levels from birth.

  8. Reply | Permalink

    May І ѕimply just say what а relief to discoverr someone that reаlly knows what they are diѕcussing
    online. You definitely ҝnoա how to bting a problem tto light and make it important.
    A lot more people ѕhould look at this and understand thiѕ side of the story.
    I can't believe yоu are not more pоpular since you
    definitely possess thhe gift.

Leave a Reply

+ three = 8