While recent advancements in deepfake audio detection and countermeasure strategies show promise, it’s crucial to note that many of these solutions have been developed and tested using static audio recordings. These recordings typically last between 2 to 10 seconds and have limited variations in background noise, speaker count, artifacts, and recording conditions.
We refer to models trained on such static audio recordings as static deepfake audio models. However, these models may not consistently perform well in real-time scenarios, like continuous audio streams from applications on a device or communication platforms. For example, in a Teams group call, static deepfake models may struggle due to their lack of adaptability to dynamic variations in real-time conversational speech data.
This study systematically evaluates the viability of using static deepfake audio detection models in real-time and continuous conversational speech scenarios across communication platforms.
02
06
Our audio deepfake models achieved notable results, with the leading model achieving an EER of 7.39 and a t-DCF score of 0.215 on the ASVspoof 2019 dataset, surpassing the baseline results of the ASVspoof 2019 challenge. However, when evaluated on real-time and continuous data using our Teams dataset, weaknesses in their performance were revealed.
The Audio Deepfake Detector project underscores RediMinds’ commitment to advancing AI in a manner that prioritizes security, privacy, and ethical considerations. Through a meticulous, step-by-step approach, we have created a cutting-edge tool that significantly enhances our ability to distinguish between real and manipulated audio. As we move forward, our focus remains on continuous improvement and adaptation to emerging threats, ensuring our solutions remain at the forefront of the fight against digital misinformation.
More details of our work can be found in our associated publication.