Improved Estonian AI voice cloning making phone scams harder to spot

New research and a TV experiment suggest AI-generated voices are becoming increasingly convincing in Estonian, raising concerns over a new wave of phone scams.
Despite increasing warnings, victims across the country continue to fall for voice-based phone scams. One recent case saw the Estonian Artists' Association (EKL) lose €700,000, with experts warning AI could soon erase one of the last remaining safeguards — the Estonian language itself.
Research led by Tanel Alumäe, associate professor of speech processing and head of the Laboratory of Language Technology at Tallinn University of Technology (TalTech), shows AI voice synthesis is improving quickly, with tests focused on whether listeners can still distinguish human speech from machine-generated audio.
"Speech synthesis doesn't require knowledge of grammar; you just convert text to speech," Alumäe said, predicting that within a few years, people may struggle to tell synthetic voices apart from real ones.
Researchers say convincing AI-generated voices are likely a welcome tool for criminals, even as the technology also brings clear benefits ranging from public transport announcements to assistive tools in healthcare.
TalTech IT student Annabel Kukk said these developments are very important, for example, for people who have lost their voice.
"They would be given the opportunity to communicate with the help of voice cloning," she said.
Last fall, Alumäe assigned the topic to Kukk, who wrote her bachelor's thesis on it this spring after extensive testing using multiple voice modulation programs.
She said her research focused on Estonian-language voice conversion, a relatively underexplored area.
Familiar voices easy to dupe
To train and test the systems, Kukk generated large sets of synthetic Estonian voice samples. Participants were then asked to listen to audio clips and judge whether each voice sounded human or artificial.
The study required volunteer voice donors to help train the software. In addition to ordinary speakers, Kukk included well-known voices, including those of ERR journalists Anu Välba, Taavi Eilat and Merilin Pärli, arguing that familiarity makes detection harder.

ETV's "Pealtnägija" worked with Kukk to run a brief experiment using the cloned voices of the three journalists, finding that even their own colleagues struggled to reliably identify whether the clips recorded in their voices were real or AI-generated.
Kukk warned that two of the biggest threats lie in scam calls and identity theft.
"Even if we recognize people's voices, even our own loved ones' voices, that's no guarantee," she said, stressing that victims can still be fooled even when listening carefully. "Synthetic voice technology has gotten that good already."
Researchers also warn the technology is evolving so quickly that newer voice systems have already surpassed those used in early experiments, underscoring how rapidly the landscape is shifting.
Alumäe warned that while AI-generated Estonian voices aren't yet as convincing as in English, it's only a matter of time until they are.
PPA: Start using code words
As stories circulate of cloned voices used in scam calls to urgently beg for money, police say it is still difficult to confirm how often AI is already involved. Either way, they say the trend is accelerating.
"When someone claims that was their child or loved one's voice, it's very hard for us to disprove that," said Elari Haugas, Serious Crimes Unit chief at the Police and Border Guard Board (PPA).
"But we can see the technology is advancing day by day and week by week, and of course criminals are keeping up with it and will try to take advantage of these possibilities," he added.
AI tools that can clone or synthesize voices are also becoming increasingly widespread, making voice generation accessible to almost anyone. Even voice messages or recorded calls can later be manipulated using AI.
As a result, authorities are urging people to verify unexpected requests for money or personal data using safeguards such as agreed code words with family members.
"If the caller can't provide the code word, the best thing to do is end the call," Haugas said.
--
Editor: Marko Tooming, Aili Vahtla









