An Approach for Identification of Speaker using Deep Learning

  • Syeda Rabia
  • Syed Mujtaba Haider
  • Abdul Basit
Keywords: Speaker Identification, Deep Learning, Automatic Speaker Recognition (ASR)

Abstract

The audio data is getting increased on daily basis across the world with the increase of telephonic conversations, video conferences, podcasts and voice notes. This study presents a mechanism for identification of a speaker in an audio file, which is based on the biometric features of human voice such as frequency, amplitude, and pitch. We proposed an unsupervised learning model which uses wav2vec 2.0 where the model learns speech representation with the dataset provided. We used Librispeech dataset in our research and we achieved our results at an error rate which is 1.8.

 

References

[1] Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael AuliJ. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations, NeurIPS 2020 .
[2] Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, Michael Auli. Unsupervised Speech Recognition.
[3] Qiantong Xu, Tatiana Likhomanenko, Jacob Kahn, Awni Hannun, Gabriel Synnaeve, Ronan Collobert. ITERATIVE PSEUDO-LABELING FOR SPEECH RECOGNITION.
[4] Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. UNSUPERVISED CROSS-LINGUAL REPRESENTATION LEARNING FOR SPEECH RECOGNITION
[5] A. Baevski, S. Schneider, and M. Auli. vq-wav2vec: Self-supervised learning of discrete speech representations. ICLR 2020.
[6] Chorowski, Jan, et al. "Unsupervised speech representation learning using wavenet autoencoders." IEEE/ACM transactions on audio, speech, and language processing (2019).
[7] M. Ravanelli, J. Zhong, S. Pascual, P. Swietojanski, J. Monteiro, J. Trmal, and Y. Bengio. Multi-task self-supervised learning for robust speech recognition. arXiv, 2020.
[8] D. Jiang, X. Lei, W. Li, N. Luo, Y. Hu, W. Zou, and X. Li. Improving transformer-based speech recognition using unsupervised pre-training. arXiv, abs/1910.09932, 2019.
[9] D. S. Park, Y. Zhang, Y. Jia, W. Han, C.-C. Chiu, B. Li, Y. Wu, and Q. V. Le. Improved noisy student training for automatic speech recognition. arXiv, abs/2005.09629, 2020.
Published
2023-01-31
How to Cite
Rabia, S., Haider, S., & Basit, A. (2023). An Approach for Identification of Speaker using Deep Learning. International Journal of Artificial Intelligence & Mathematical Sciences, 1(2), 7-11. https://doi.org/10.58921/ijaims.v1i2.36