DecodeME is the biggest study ever on myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). In our previous blog post, we explained the genetic study’s methods and main results. In this second article, we delve deeper into the genes associated with ME/CFS. The clearest signals point to genes such as CA10, SHISA6, SOX6, LRRC7, and DCC, which are involved in neuronal development and communication in the brain.

Fine-mapping and gene priorization
It’s far from easy to determine which genes are causally related to a disease. The challenge is that most DNA signals lie in non-coding regions, often between genes. They do not change the gene itself but slightly alter how much it is expressed, like a volume knob. We do not know which gene(s) the DNA signal points to, and the processes of figuring this out are called ‘fine-mapping’ and gene priorization.
The gene priorization that DecodeME used thus far focused on gene expression data from the Genotype-Tissue Expression (GTEx) project. Here’s how that works. In the GTEx project, researchers created a large database from regular people who passed away. From each donor, they extracted DNA and RNA from multiple tissues. RNA is the middleman between DNA and proteins, so it provides a snapshot of how much a gene is being expressed. Researchers measure this by counting the number of short RNA fragments, called sequencing reads, that align to each gene’s transcript. By matching that info with DNA, one gets a database of how well DNA patterns correlate with the expression of a certain gene.
This is precisely what we need. The signals from DecodeME are DNA letters (the expert term is ‘single nucleotide polymorphisms’ or SNPs) that form a certain pattern near genes. Suppose, for example, that the DecodeME data suggest that gene X might be associated with ME/CFS. In that case, we can check if the SNP pattern matches that for the expression of gene X in the GTEx database.
It’s a reasonable approach, but unfortunately, scientists discovered that it doesn’t work as well as expected. It’s been a bit of a disappointment. Studies found that only around 40% of DNA signals can be matched with gene expression data. One possible reason is that genes that are causally related to a disease are closely fine-tuned, as turning their expression too high or too low would become pathological. As a consequence, they have a limited range of gene expression data in the GTEx database. Another potential explanation is that gene expression needs more context: perhaps it only matters at specific times, such as during infection or in early development.
Whatever the reasons, there are limitations to using gene expression data to determine which genes are implicated in disease. The DecodeME preprint used this approach to estimate which genes are likely associated with ME/CFS, categorizing them into Tier 1 or Tier 2 groups. It also highlights certain genes, such as RABGAP1L, because the GTEx data matched well in multiple tissues. This is a valid and useful approach, but we think there’s quite some uncertainty as to whether these genes are truly involved in ME/CFS.

What are the alternatives? There are machine learning algorithms for fine-mapping with funny names such as SuSIE, CALDERA, or FLAMES. These take the correlation among SNPs into account and add biological information to estimate which genes are likely the relevant ones. Unfortunately, these methods are quite complex to set up, so we’ll leave them to the experts.
In this blog, we’ll use a simple approach that is easy to follow. It’s based on two principles:
- Often, the relevant gene(s) are close to the SNP signal. This is certainly not always the case, but previous studies have shown that proximity is a good first guess. We’ll focus on the two closest genes.
- Genes in regions with few other genes have less competition. We have more certainty that these genes are related to ME/CFS than those in genomic regions that are stacked with genes.
Zooming in on the signal
Take the DNA pattern on chromosome 1 that was associated with ME/CFS. DecodeME found eleven Tier 1 protein-coding genes in this region. That’s a lot. It makes it hard to say which one(s) are causally related to ME/CFS and which ones are accidental bystanders. The same is true for the chr6p22.2 region on chromosome 6, which has seven Tier 1 genes. In contrast, the hit on chromosome 17 only has 1 gene close by: CA10. This is a strong indication that this gene is involved in ME/CFS pathology.
CA10 has been associated with widespread chronic pain, and the protein it makes plays a role in the central nervous system, especially in brain development. When we go through the list of genes potentially associated with ME/CFS, a lot of them point to the brain, neurons, and their synapses. This seems like a recurring theme that hasn’t been fully highlighted yet in the DecodeME preprint.
Let’s go over the eight SNP signals that reached statistical significance. We have added plots of these genomic regions at the end of the article if you’re interested in a graphical overview. A summary is available in the table below.
| Top SNP | -Log10p | Implicated protein-coding genes | Relation to neurons and synapses |
| 1: 173,846,152 T/C | 7.59 | Unclear, too many genes | / |
| 6: 26,239,176 A/G | 8.39 | Unclear, too many genes | / |
| 6: 97,984,426 C/CA | 7.31 | Unclear, no genes nearby | / |
| 12: 118,202,773 CTTTTTTTTTTTT/C | 6.78 | TAOK3, PEBP1 | PEBP1 makes a protein that is processed into HCNP which may be involved in neural development. |
| 13: 53,194,927 GT/G | 6.94 | OLFM4 | OLFM4 helps regulate neural development and synaptic function, and has been implicated in depression and insomnia. |
| 15: 54,866,724 A/G | 8.12 | UNC13C | UNC13C is predicted to be involved in glutamatergic synaptic transmission |
| 17: 52,183,006 C/T | 8.67 | CA10 | CA10 makes a protein thought to play a role in the central nervous system, especially in brain development. Associated with chronic pain. |
| 20: 48,914,387 T/TA | 11.02 | Unclear, too many genes (ARFGEF2, CSE1L, STAU1) | / |
A big caveat is that the genes highlighted above are involved in other functions besides neural development (more about that later). The pattern, however, is repeated if we look at more signals just below the genome-wide significance threshold of 5*10^-8 (7.3 on the -Log10p scale).
This threshold divides the default p-value of 0.05 by 1 million, as there are approximately 1 million independent SNPs tested in genome-wide association studies (GWAS). However, this is just a rough approximation, and there is no hard threshold: when you go below it, the chance of a false positive gets slightly bigger, but in a continuous fashion. So, it’s worth having a look there too. Note that the hits on chromosomes 12 and 13 are actually below the threshold too, in the main DecodeME analysis. They only reached significance in different analyses using other controls or with only ME/CFS patients who reported an infectious onset.
The table below shows the extra regions that reached a p-value of 5*10^-7. We found several interesting SNP signals where there was only one protein-coding gene in the region. Even though we lost signal strength (we’re using less significant p-values), we gained more in the interpretation of the signal (more certainty on which gene it points to).
| Top SNP | -Log10p | Implicated protein-coding genes | Relation to neurons and synapses |
| 17: 11,325,637 G/C | 7.08 | SHISA6 | SHISHA6 is involved excitatory chemical synaptic transmission and located on glutamatergic, excitatory synapse. |
| 11: 16,217,844 C/G | 6.968 | SOX6 | SOX6 encodes a protein that is required for normal development of the central nervous system |
| 1: 73,126,414 C/CA | 6.925 | LRRIQ3, NEGR1 (uncertain, wide interval) | NEGR1 predicted to be involved in nervous system development and regulation of synapse assembly. |
| 1: 91,028,158 C/T | 6.724 | ZNF644, BARHL2 | BARHL2 is involved in generation of neurons and regulation of axon extension. |
| 1: 69,696,474 A/G | 6.685 | LRRC7 | LRRC7 is involved in regulation of neuron projection development |
| 12: 123,924,955 G/A | 6.614 | ZNF664, CCDC92 (uncertain, many genes) | None |
| 18: 53,232,948 C/T | 6.606 | DCC | DCC makes a protein that mediates axon guidance of neuronal growth |
| 6: 4,336,259 T/C | 6.537 | Unclear, too many genes | / |
Where have we seen those genes before?
The interesting ones are those that are the sole gene close to the SNP signal: these have a higher chance of being causally related to ME/CFS. Using the GWAS catalog, we can see in which previous GWAS studies these genes showed up. We highlight some in the table below, focusing on traits with the clearest effect and those closest to the SNP signal from DecodeME.
| Top SNP | Gene | Also implicated in the following traits |
| 13: 53,194,927 GT/G | OLFM4 | Insomnia, major depressive disorder, BMI, educational attainment, etc. |
| 15: 54,866,724 A/G | UNC13C | Level of protogenin in blood, educational attainment, restless legs syndrome, breast cancer, severe COVID-19 infection, etc. |
| 17: 52,183,006 C/T | CA10 | Insomnia, smoking initiation, restless legs syndrome, educational attainment, multisite chronic pain, etc. |
| 17: 11,325,637 G/C | SHISA6 | Sleep duration, insomnia myopia, smoking initiation, etc. |
| 11: 16,217,844 C/G | SOX6 | Serum albumin amount, bilirubin measurement, height, testosterone, blood pressure, etc. |
| 1: 69,696,474 A/G | LRRC7 | Educational attainment, bone tissue density, BMI, adolescent idiopathic scoliosis, depression/anxiety, etc. |
| 18: 53,232,948 C/T | DCC | Insomnia, pain measurement, depression, intelligence, autism spectrum disorder, schizophrenia, etc. |
A first conclusion is that these genes have already been associated with multiple human traits. This is especially the case for those that have been tested with enormous sample sizes, such as BMI, educational attainment, smoking initiation, and height. Medical conditions that repeatedly show up are insomnia, depression, pain, and restless legs syndrome. OLFM4 and DCC were also found in a recent genetic study of fibromyalgia.
The genes highlighted by DecodeME are thus clearly not uniquely linked to ME/CFS. But there is a caveat: the DNA signal around them is often different, suggesting that the same genes may be used in different ways. OLFM4, for example, has been associated with both depression and ME/CFS, but DecodeME found that their signals do not match. This could mean that the same gene points to different pathways or tissues.
Immune-related genes
In our previous article, we wrote that there are also DecodeME genes pointing to the immune system, but that these were less clear. What we mean is that they show up in regions stacked with other genes so that it’s hard to know which ones are causally related to ME/CFS.
An example is RABGAP1L, a gene that helps to expel bacteria from cells and limits viral replication. It showed a good match with the GTEx data, but lies in the region chr1q25.1, where there are 11 gene candidates. Another example are the genes of the butyrophilin family on chromosome 6, such as BTN2A2. These lie within the major histocompatibility complex (MHC) region, a region packed with interesting immune-related genes. It could be that these are related to ME/CFS, but there are a lot of other candidate genes nearby that create ambiguity.

The gene TAOK3 wasn’t included in the list of DecodeME gene candidates, but it lies closest to the signal on chromosome 12. It plays a role in T-cell activation and has been linked to Lupus. Another interesting lead for the immune system is OLFM4, which we mentioned earlier. We highlighted its potential link to the brain, but it has a clearer connection to the innate immune system and was found to be a biomarker for the severity of infectious diseases. It forms a good illustration of how the same gene may point in different directions.
Lastly, the DecodeME preprint also found that an HLA allele (HLA-DQA1*05:01) was associated with ME/CFS at genome-wide significance. HLA refers to a group of genes that help your immune system tell the difference between your own cells and foreign invaders. Unfortunately, the HLA region is notoriously difficult to measure because many of its genes look alike. Because the HLA results in DecodeME weren’t very clear, the researchers will do additional analyses to verify the association.
In conclusion, genetic clues to the immune system are likely present in the DecodeME data, but we might need further fine-mapping and gene priorization to confirm them.
Autophagy and intracellular transport
There are other intriguing patterns in the genes highlighted by DecodeME, but most are rather speculative.
Several genes are related to autophagy, the process that degrades and recycles parts of a cell. An example is FBXL4 on chromosome 6, which is involved in mitophagy (autophagy of mitochondria). Researchers at La Trobe University in Melbourne, Australia, have plans to explore this pathway further. CCPG1 on chromosome 15 is another example, as it facilitates autophagy in the endoplasmic reticulum (ER-phagy). A third gene, RABGAP1L on chromosome 1, has been linked to autophagy during bacterial infection. These genes might point to a defect in cleaning up cellular components.
Others noted genes involved in vesicle transport within cells. Vesicles are protective bubbles that carry molecules to different cellular compartments. RABGAP1L likely plays a role in this vesicle trafficking, as does ARFGEF2 on chromosome 20. The locus on chromosome 20 is worth exploring further because it has by far the strongest signal in DecodeME (an odds ratio of 1.095 and a -Log10p-value of 11.02). Although the region has multiple candidate genes, it’s quite likely that ARFGEF2, CSE1L, or STAU1 are involved in ME/CFS pathology because the signal around them is so strong. The gene-based test of MAGMA, a tool that helps you estimate which genes are relevant, highlighted all three of them. CSE1L shuttles certain proteins back into the cytoplasm after they’ve transported their cargo into the nucleus, while STAU1 is involved in the transport of messenger RNA within the cell. Perhaps these genes point to an issue with intracellular traffic in ME/CFS.
The most consistent link, however, is the one pointing to neuronal development and communication in the brain with genes such as CA10, SHISA6, SOX6, LRRC7, and DCC. Interestingly, this aligns quite well with another major genetic study on ME/CFS. The team of Mark Snyder at Stanford University published it as a preprint earlier this year. It used a very different approach to DecodeME. Rather than screening all common DNA variants, it focused on rare ones that are likely to lead to a loss of function. It used a neural network trained on biological data, such as protein interactions, and mapped genes into networks based on what they do. Several of the risk genes associated with ME/CFS (such as NLGN2 and SYNGAP1) are involved in synaptic function: how neurons communicate with each other in the brain and nervous system. This aligns well with many of the genes found in DecodeME.
Confounding: Can we trust the results?
Lastly, we’ll explore whether the DecodeME results are truly robust. Some questioned, for example, if the results could be due to group differences other than ME/CFS. Perhaps patients were accidentally selected from a different ancestral group than the controls. Their DNA might be different because of historical migration patterns and selection bias rather than disease-related biology. This is known as population stratification, and it can create false signals in genetic studies if not properly controlled.
Luckily, the DecodeME researchers took care of this issue. They included only British participants with European ancestry so that they were as similar as possible to the controls in the UK Biobank. Next, they used principal component analysis (PCA) and included the first 20 principal components as covariates in the regression analysis. These 20 dimensions capture most of the genetic variation in all participants, patients and controls combined. The graph below shows what these look like. There was good matching between the patients (green) and controls (black) across all the components, suggesting there were no large population differences between them.

But even if there were group differences, these principal components would have soaked them up. Including these components in the regression analysis is like asking: beyond the variance captured by the first 20 dimensions, what other group differences do you see? That’s how the DecodeME results were calculated and why we’re pretty confident that they don’t reflect population stratification.
There’s also a formal test for this using p-value statistics.* If there are no population differences, the p-values would be uniform. Each value between 0 and 1 would be equally likely. So, researchers can check if the median p-value statistic in their results is similar to the expected value of 0.5. With population differences, the statistics would be inflated across the board, including the median. If the signal reflects susceptibility to a disease, however, only a small proportion would be abnormal. You would get some significant results with low p-values, but the median remains close to the expected value.
The ratio of the median p-value statistic* and its expected value is called the genomic inflation factor, and it should be close to 1. In DecodeME, it was 1.066, which indicates no significant problem with inflation and thus likely no population differences driving the results.

A more subtle point argues that the ME/CFS group might have differed from controls in socio-economic status, intelligence, education level, or some other confounding factor. Perhaps DecodeME mainly reflects the genetic susceptibility to these factors rather than ME/CFS? It is, for example, notable how many of the DecodeME genes have previously been linked to intelligence or educational attainment. However, if there was a difference between the ME/CFS and the control group in, say, intelligence, it would likely be quite small compared to genetic studies that studied the entire spectrum of intelligence. So, one would expect the effect size in DecodeME to be much smaller than those found in studies on intelligence, which doesn’t appear to be the case.**
A last point is that the genetic hits might represent pain, depression, or sleep problems rather than ME/CFS. Some of the genes have previously showed up in genetic studies of people who have these symptoms, but not ME/CFS. This is a more interesting conundrum, but one that can be answered using the questionnaire data DecodeME collected. Hopefully, the researchers will provide more clarity on this in future publications, for example, by testing if the results also apply in ME/CFS without depression.
Bonus section: the graphs
Below are graphs of the genomic regions that we made ourselves in R using the DecodeME summary data. Protein-coding genes are added using the Ensembl database. Plots usually use a region of 250-300 thousand base pairs (kb) on each side of the top SNP. We zoomed out using a bigger window of 1000 kb on each side to spot recurrent patterns in the SNP signal and get a better view of the entire region. Most of the causal genes, however, are much closer to the SNP signal than the width of the graph.
The 8 hits
Let’s go through each of the loci. The one below on chromosome 1 might have more than one signal. The significant SNPs lie above the genes ZBTB37, which is involved in regulating the immune system, and DARS2, which encodes a mitochondrial enzyme. The points to the right lie above RABGAP1L but did not reach significance.

The next region on chromosome 6 is the one we already saw. It’s so stacked with genes that it’s hard to say which ones are relevant. The DecodeME preprint highlights BTN2A2, which regulates T-activity.

The other signal on chromosome 6 has no protein-coding genes close by, making it difficult to say which one it is influencing. FBXL4 is the gene linked to mitophagy, while POU3F2 is involved in neuronal differentiation.

The signal on chromosome 12 was not significant in the main analysis and has several genes nearby. Likely candidates are TAOK3, SUDS3, and PEBP1. The latter is suspected to be involved in neural development but has many other roles. TAOK3 is the immune-related gene linked to Lupus. SUDS3 helps regulate gene expression by modifying how tightly DNA is packaged.

The locus on chromosome 13 also didn’t reach significance. The closest gene is OLFM4, which we discussed above.

The FUMA-defined region for chromosome 15 only includes the gene UNC13C, but it isn’t exactly close to the signal, and if you zoom out as we do below, there are other candidate genes.

In chromosome 17, CA10 is the most likely candidate.

Chromosome 20 has the strongest signal in DecodeME where ARFGEF2, CSE1L, or STAU1 are potential candidates.

Extra regions
The following regions had p-values below 5*10^-7, just below the significance threshold. On chromosome 1, we have LRRC7 as a likely candidate. It was the most significant gene out of the MAGMA gene-based testing.

Another signal on chromosome 1 has no genes nearby. NEGR1, however, is a familiar gene. It plays a role in neuronal development and brain connectivity and has been linked to brain diseases such as schizophrenia, insomnia, and depression.

In a third locus on chromosome 1, the closest gene is ZNF644, which has been linked to myopia.

There is also an extra signal on chromosome 6, which seems to lie between protein-coding genes, making it hard to decipher.

On chromosome 11, SOX6 seems the most likely candidate.

The locus on chromosome 12 is stacked with genes so hard to determine which ones are related to ME/CFS.

Lastly, on chromosome 18, DCC is the only one close to the signal.

Acknowledgement
Thanks to forestglip and others on the Science for ME forum for their analysis of the DecodeME results and for helping us explore many complex tools for genetic analysis such as FUMA, MAGMA, Locus Zoom, and BIGA GWAS.
Notes
* More precisely: the median chi-square value rather than the p-value is used to calculate the genomic inflation factor. The expected median chi-square statistic is 0.45 using 1 degree of freedom.
** It’s difficult to compare effect sizes because GWAS use continuous measures of intelligence while ME/CFS is coded binary (you have it or not).
Updates
We changed the text from: “The one below on chromosome 1 is likely to have more than one signal.” to: “The one below on chromosome 1 might have more than one signal” . Rather than two signals it’s quite likely that the pattern is caused by linkage disequilibrium that stretches out above the RABGAP1L region.
We initially only used the term fine-mapping in our first section but this refers to identifying the causal variant out of all the significant variants, not identifying the gene that the variant affects. The latter is usually referred to as gene prioritization.
We also adjust the sentence: “Although the region has multiple candidate genes, it’s quite likely that ARFGEF2, CSE1L, and STAU1 are involved in ME/CFS pathology” and changed ‘and’ into ‘or’. A strong signal on this doesn’t imply that multiple genes are involved. It may just be that only one of the genes in the region has a strong effect.
You suggest that a series of genes in the CNS and immune system are affected, but what is the suggested disease model? ME/CFS is an acquired disease, not a disease from birth and those CNS genes you cited tend to be involved in neurodevelopment or regeneration after injury. So what is the approximate model?
Thanks Angie. Working out a disease model might be something for a future blog post!
Suspect that these genes point to a susceptibility for developing the disease for example in how your brain is wired, how neurons communicate or how the nervous system responds to peripheral input. The evidence for a immune-trigger such as Epstein-Barr virus or COVID-19 also seems quite strong. So the challenge for a disease model of ME/CFS will be about connecting those two pieces of evidence.
This is really interesting. I performed my own analysis on the DecodeME summary statistics, including fine-mapping of the main cohort (GWAS-1) and tissue and cell-type analysis. The results are on GitHub, and some of them are summarized here:
https://paolomaccallini.wordpress.com/2025/10/04/cell-type-analysis-of-decodeme-gwas-data/
Interesting thanks! Would like to use FLAMES as it seems like the latest new tool for gene prioritization by the Dutch team of Danielle Posthuma which also developed FUMA and MAGMA. Unfortunately, the coding is rather difficult to set up. Would you be able to do it?
https://github.com/Marijn-Schipper/FLAMES
https://pubmed.ncbi.nlm.nih.gov/39930082/
Hi, I didn’t know Flame, it seems interesting. MAGMA is brilliant, to me: the idea of performing tissue and cell-type analysis by regression instead of by classic enrichment (hypergeometric distribution) is really captivating. If Flame is developed by the same group, it must be really good. I’ll try to study it.
My fine-mapping attempt of DecodeME was performed using SusieR with Linkage Disequilibrium matrices from the original UK Biobank (downloaded from the Broad Institute repository). In order to use them, I had to lift over the DecodeME summary statistics from GRCh38 to GRCh37. This is not a perfect approach because there is a loss of about half of the variants. But it is the best I could do. I then used GTEx v10 to map the fine-mapped variants to genes. The result is a set of 18 genes that is a subset of the original 32 candidate genes proposed in the DecodeME preprint.
“In order to use them, I had to lift over the DecodeME summary statistics from GRCh38 to GRCh37. This is not a perfect approach because there is a loss of about half of the variants.”
This is curious. I used the GenomicRanges and rtracklayer packages in R and only lost about 25.000 variants out of almost 9 million. FUMA/MAGMA report the same in the log file: “25262 positions did not align with the GRCh37 reference.”
I also attempted a meta analysis using DecodeME, UK Biobank, and Million Veteran Project (European ancestry only). This generates a meta-GWAS of 21,500 cases and some interesting results:
https://github.com/paolomaccallini-hub/MetaME
There is a problem though: DecodeME and UK Biobank have overlapping controls. Now I am trying to solve this problem using a correction of the weights used in the meta analysis.