A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation

Authors

  • Anup Kadam Indian Institute of Information Technology, Pune image/svg+xml
  • Sagar Rane Indian Institute of Information Technology, Pune image/svg+xml
  • Arpit Kumar Mishra Indian Institute of Information Technology, Pune image/svg+xml
  • Shailesh Kumar Sahu Indian Institute of Information Technology, Pune image/svg+xml
  • Shubham Singh Indian Institute of Information Technology, Pune image/svg+xml
  • Shivam Kumar Pathak Indian Institute of Information Technology, Pune image/svg+xml

DOI:

https://doi.org/10.4108/eai.14-4-2021.169187

Keywords:

Video Synthesis, Voice Cloning, Lip Synchronization, Video Generation Application

Abstract

The fields like Media, Education and Corporations etc have started focusing on content creation. This has led to the huge demand for synthetic media generation using less data. To synthesize a high-grade artificial video, the lip must be synchronized with the audio. Here we have compared the various methods for voice-cloning and lip synchronization. Voice cloning procedure include state of the art methods like wavenet and other text-to-speech approaches. Lip synchronization methods describe constrained and unconstrained methods. Various recent research like LipGan, Wav2Lip are discussed. The methods are compared and the best method is suggested. Apart from studying and comparing the various methods, their drawbacks, future scopes, and application are also there. Different social and ethical issues are also discussed.

Downloads

Published

14-04-2021

How to Cite

1.
Kadam A, Rane S, Mishra AK, Sahu SK, Singh S, Pathak SK. A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation. EAI Endorsed Trans Creat Tech [Internet]. 2021 Apr. 14 [cited 2024 Apr. 18];8(28):e2. Available from: https://publications.eai.eu/index.php/ct/article/view/1417