Audio Deepfake Detection Methods: A Review

Janani, W.V.M.; Nissanka, N.B.A.S.C.; Thisaravi, K.G.A.

Audio Deepfake Detection Methods: A Review

Janani, W.V.M.; Nissanka, N.B.A.S.C.; Thisaravi, K.G.A.

URI: http://drr.vau.ac.lk/handle/123456789/1395

Date: 2025

Abstract:

The emergence of audio deepfake technologies has raised new concerns in the area of digital security, privacy, and trust in media. Audio deepfakes are AI-generated synthetic audio clips that are capable of accurately imitating a real human voice and can be used for malicious applications like voice phishing, impersonation, and misinformation. This research presents a detection system based on Convolutional Neural Networks (CNNs) trained on multiple engineered audio features, including Mel-Frequency Cepstral Coefficients, mel-spectrograms, and chroma features. The system is evaluated with public datasets including ASVspoof 2019, Wave Fake, and FoR and utilizes preprocessing techniques like normalization, resampling, and fixed-length trimming to standardize the input. The CNN model is constructed using several convolutional layers, pooling layers, and fully-connected layers trained with binary cross entropy loss, and tested and validated using a cross-validation framework. In testing, the system demonstrated high accuracy and strong generalization over several spoof types. Overall, through experimentation, this study illustrated the potential of deep learning-based audio feature analysis to achieve efficient scaling of audio deepfake detection for real-time deployment in security, forensic, and media verification applications.

Show full item record