Symmetric Saliency-Based Adversarial Attack to Speaker Identification

You are here

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

Symmetric Saliency-Based Adversarial Attack to Speaker Identification

Jiadi Yao; Xing Chen; Xiao-Lei Zhang; Wei-Qiang Zhang; Kunde Yang

Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this letter, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.


Speaker recognition is vulnerable to spoofing attacks [1]. Many spoofing attack techniques to speaker recognition, including replay, voice conversion, impersonation and text-to-speech synthesis, and adversarial attacks [2], have been developed. On the contrary, various detection [3][4][5] and countermeasures [6] against spoofing attacks are in full swing. In this letter, we focus on developing adversarial attacks to speaker identification. An adversarial attack to speaker identification aims to make an identification system wrongly recognize the adversarial voice of a source speaker as a targeted imposter speaker, where the adversarial voice, a.k.a. adversarial example, is produced by adding human-imperceptible noise to the speech of the source speaker. It shows great threat to modern speaker identification systems based on deep learning.

SPS on Twitter

  • DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting…
  • ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in…
  • The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat…
  • CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special…
  • Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:…

SPS Videos

Signal Processing in Home Assistants


Multimedia Forensics

Careers in Signal Processing             


Under the Radar