This is what a scam sounds like: the rise of voice hacking

Scroll to see more

One day, you receive a call that seems to be from your bank. The voice on the other end belongs to the manager of your usual branch, asking you to urgently transfer money to avoid an overdraft. The request is so convincing that you proceed without hesitation. However, when you try to confirm, you realize something is off: the call is fraudulent. The voice you heard wasn’t who you thought it was—it was a clone created using artificial intelligence.

This type of scam, known as voice hacking, is a growing threat that uses advanced technologies to manipulate the human voice for fraudulent purposes. In recent years, voice hacking has gained traction as cybercriminals use artificial intelligence and voice synthesis techniques to create exact imitations of someone’s voice.

What is Voice Hacking?

Voice hacking is a cybercrime technique that uses advanced voice synthesis technology to impersonate someone’s voice for fraudulent ends. Attackers can replicate voices using artificial intelligence, enabling them to create fake voice messages that sound authentic. These attacks can target individuals, companies, and even customer service systems with the goal of stealing confidential information, performing fraudulent transactions, or emotionally manipulating the victim.

The example at the beginning isn’t the only case in which scammers use what’s also known as deep voice to trick their victims. Two examples of this tactic are the fake kidnappings of Jennifer DeStefano’s daughter and Ruth Card’s grandson. Using AI-generated synthetic voices, both women received distress calls from their loved ones pleading for help. Fortunately, both discovered in time that their relatives were safe.

In the business world, the fake CEO case has become increasingly common. Someone working at a company receives a call from their superior, instructing them to transfer funds to a specific account. Who would question a direct order? In this article you can learn more about CEO fraud and other similar scams.

What are the targets?

Voice hacking poses a significant risk when used maliciously. As we’ve seen, the increasing sophistication of these technologies makes it crucial to develop advanced detection and analysis measures to mitigate their impact. While some attacks target individuals, criminals also know they can extract greater rewards from businesses and public administrations. The main objectives of voice hacking include:

Unauthorized access to systems and services

Voice manipulation can bypass voice-based authentication systems, allowing access to bank accounts, payment platforms, and other digital services relying on voice verification.

Social engineering and information manipulation

By accurately replicating a person’s voice, attackers can deceive employees, customers, or automated systems into disclosing sensitive data, altering account details, or executing fraudulent instructions undetected.

Creation of fake content for disinformation

Voice deepfakes can be used to fabricate statements attributed to public figures, business leaders, or influencers to manipulate public opinion, sow distrust, or damage reputations.

Financial fraud and targeted scams

Voice cloning facilitates unauthorized transactions, fraudulent payment orders, and employee deception, often leading to significant financial losses.

Evasion of security and surveillance systems

In environments where identity verification depends on voice, these techniques can be used to impersonate individuals, access restricted areas, or bypass voice-dependent control systems.

Voice Hacking: understanding the technical process

Not all voice scams are the same. Cybercriminals use different techniques and technologies in each case. Here’s an overview of the most common types, how they work, and their security impact:

Voice Cloning

This attack focuses on replicating a specific person’s voice. Using AI and deep learning models, attackers analyze recordings of the person’s voice to study their speech pattern, tone, rhythm, and accent. Generative neural networks help create a voice model that precisely replicates the victim’s vocal features. With enough recordings to train the model, attackers can make voice authentication systems recognize a fake voice as legitimate.

Voice Synthesis

Voice synthesis involves generating artificial voices from text. Using specialized software, an attacker can create a voice that sounds human but doesn’t belong to any real person. These systems are trained on large volumes of audio data to generate voices that mimic human speech patterns. Unlike voice cloning, synthesized voices may not belong to real individuals but can still be highly convincing. This technique is often used to manipulate automated systems like virtual assistants.

Replay Attacks

In replay attacks, cybercriminals record legitimate voice interactions between a person and an authentication system and replay them later to try to gain unauthorized access. This is particularly effective when the authentication system only verifies voice and doesn’t consider other interaction dynamics. Even though modern systems use temporary keys or dynamic voice traits, many are still vulnerable to these attacks.

Voice Modulation Attacks

This technique involves real-time alteration of the attacker’s voice using voice modulation software. These programs can change pitch, speed, and other parameters in real time, making it difficult to detect the forgery. For instance, an attacker can alter their voice to sound like someone else or even manipulate a voice recording during a live call. Modulation tools like pitch shifting or time-stretching allow attackers to infiltrate systems without being detected by voice authentication tools.

All these techniques—voice cloning, voice synthesis, replay attacks, and voice modulation—fall under the umbrella of voice hacking: a set of methods aimed at compromising authentication and communication systems through voice manipulation or impersonation. Whether replicating a real voice or generating convincing artificial ones, these attacks are a growing threat in a world where voice is increasingly used for identification and control.

How to protect yourself from voice hacking

Gradiant applies advanced artificial intelligence and multimedia forensic analysis technologies to tackle the challenge of deepfakes. In industries such as insurance, fintech, media, and beyond—where voice authentication and virtual assistants are increasingly integrated—manipulation detection is crucial to prevent frauds like the CEO scam described earlier. Tools capable of identifying these threats in real time allow companies to protect customer trust and minimize operational and reputational risks.

Gradiant leads research and is developing a suite of multimodal AI-based tools to combat the threats posed by the malicious use of AI in multimedia content. In an ecosystem where voice deepfakes can compromise digital payments, financial services, or the accuracy of published information, insurers and media outlets need robust solutions to verify the authenticity of interactions and multimedia content. Combining advanced technologies with prevention strategies is key to strengthening security and ensuring operational integrity in the digital world.


This publication is part of fAIr (Fight Fire with fAIr), funded by the European Union through NextGeneration-EU and the Recovery, Transformation and Resilience Plan (PRTR) via INCIBE.

Logos_instituciones_cpp_cpi_incibe_2025_digital.png-1024x82

The views and opinions expressed are those of the author(s) and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the European Commission is responsible for them.