Tartu scientists map how genes affect metabolism

Researchers at the University of Tartu have mapped how genes influence human metabolism using data from more than half a million people. The large-scale study gives scientists a more precise picture of the causes of disease and could help speed up drug development.
People often assume that each gene acts like a single switch affecting only one specific trait. The new research clearly overturns this view, showing that a single genetic variant can influence the levels of hundreds of compounds and molecules in the body.
"I like to think of metabolic markers as indicators of a building's indoor climate. If something — say ventilation — doesn't work, it's not just one thing that goes wrong; all the indoor climate indicators shift at once," said Kaur Alasoo, Associate Professor of Bioinformatics at the University of Tartu.
The pharmaceutical industry often treats resulting metabolic abnormalities as direct targets, trying to restore the original state. However, a disease may not subside if a drug artificially suppresses just one marker, because the abnormality may be a consequence rather than a root cause. The new dataset helps researchers distinguish true causes of disease from accompanying symptoms. In other words, if the ventilation is broken, doctors no longer need to chase individual mold spores or excess carbon dioxide in the air.

To create the map, researchers analyzed data from the Estonian Biobank and the UK Biobank, covering a total of 619,372 individuals. "Essentially, we combined all metabolic associations with all diseases in both biobanks. There were around 100 million comparisons, maybe even a billion — we didn't count exactly," Alasoo said.
Such a large dataset promises major time and cost savings in future drug development. "In the age of biobanks, it's been shown that if drugs are developed not from chemical molecules but based on target pathways and hypotheses tested with existing data, the process is actually faster and about 2.5 times cheaper for pharmaceutical companies," said Priit Palta, Senior Researcher at the University of Tartu. Companies can now evaluate the success potential of drug candidates at a very early stage.
Early decisions can prevent costly failures later in lab and animal testing, reducing business risk. "A senior scientific executive at a pharmaceutical company recently said they have to make billion-dollar bets for each drug development project. With our results, it may become possible to decide with the first million dollars whether it's worth continuing," Alasoo recalled.
Practical benefits
The research team demonstrated the dataset's usefulness with several case studies. Among other topics, they examined type 2 diabetes and branched-chain amino acids, whose levels rise alongside disease onset. Because earlier observational studies showed strong links, pharmaceutical companies have long invested heavily in developing inhibitors to suppress these amino acids.
However, the Tartu gene map revealed that lowering these amino acids does not actually reduce diabetes risk. The analysis showed they are merely a symptom of the disease. Findings like this could save the pharmaceutical industry from unsuccessful clinical trials, highlighting the value of the dataset in early drug development.
Researchers also studied the link between lactic acid (lactate) levels and blocked blood vessels. They identified three separate genetic regions that simultaneously increase lactate levels and the risk of pulmonary embolism. The results showed that lactate directly reflects whether platelets are activated. This indirect link may allow labs to avoid complex and expensive cell-level procedures in the future.
Previously, measuring such activity required special effort and cell samples. "If we wanted to measure platelet activation directly, we'd need a separate experiment and collect cells from people. Now we can instead use lactate to indirectly observe what platelets are doing," Alasoo explained. This means clinicians can use lactate as a convenient indicator of the body's internal state.
Technical challenges and solutions
The massive dataset also pushed researchers' computing systems to the limit. Existing programs could not process the data efficiently. "With big data, it often happens that when something new appears that algorithm developers haven't accounted for, we end up breaking them," said Palta. To overcome this, the team developed a new algorithm that sped up data comparisons by about a thousandfold.
The scope of the analysis was also limited by the cost of measurement methods. To map metabolic markers in blood, researchers can choose between nuclear magnetic resonance (NMR) and mass spectrometry. While mass spectrometry can detect thousands of molecules, it is very expensive per sample, usually limiting studies to a few hundred or thousand participants and missing rare genetic variants.
To include more people and detect rare variants, the Tartu team used the cheaper NMR technique, which typically measures a few hundred key metabolic indicators. In total, they analyzed 249 different markers. "It was pure pragmatism," Alasoo said.
Broader implications
Large sample sizes are essential for capturing rare mutations. Some variants may occur in only one in 10,000 people, making them difficult to detect without very large datasets. The study showed that rare mutations often have strong biological effects, frequently disrupting protein structure and normal bodily function. While each individual variant is uncommon, such mutations are collectively widespread in the population.
This highlights limitations in consumer genetic tests, which may miss such rare variants even though they help estimate certain disease risks.
"These blind spots show the inherent challenges of personalized medicine," Alasoo said. "People can have many gene variants indicating high risk, but also a unique rare variant that further modifies that risk. We'll never be able to create a perfect risk model — that's a fundamental limitation."
Nevertheless, the dataset provides a strong foundation for many new medical discoveries. "From a practical standpoint, the main value isn't that a single metabolite immediately becomes a clinical biomarker or drug target, but that this collection of results and associations helps guide decisions in molecular medicine about where to look next," Palta said.
The dataset has also been made freely available to other research groups to support further scientific progress.
What's next
The researchers' next major step is to apply machine learning to uncover additional biological patterns. "One direction is definitely using AI to assemble these massive result sets and find all kinds of associations, because it's not possible to do manually," said Palta.
Alasoo agreed: "If we keep digging into this data — and others will too, since it's publicly available — more and more examples will emerge. But that requires someone to come in with a focused research question."
In summary, the Estonian research team has created a powerful new tool for the scientific community. As Palta put it: "The biobank is not just a large collection of data, but a research infrastructure that enables world-class questions about human biology, health, and disease."
The study was published in the journal Nature.
Editor: Argo Ideon









