In a stride toward blending reality with simulation, a research team from the NTU School of Computer Science and Engineering introduces an artificial intelligence (AI) program named "DIverse yet Realistic Facial Animations" (DIRFA).
This program takes an audio clip and a static face photo to produce a 3D video that accurately reflects the facial expressions and head movements of the person in the audio.
Realistic Talking AI Faces
Unlike existing approaches that struggle with pose variations and emotional control, DIRFA has undergone extensive training on over 1 million audiovisual clips from more than 6,000 individuals sourced from an open-source database, according to the research team.
The goal was to predict cues from speech and seamlessly synchronize them with facial expressions and head movements. The result is a program that enhances the creation of highly realistic videos.
Assoc Prof Lu Shijian, the corresponding author and leader of the study, emphasized the potential impact of DIRFA on multimedia communication. He stated that the program could revolutionize this realm by combining techniques such as AI and machine learning.
"Our program also builds on previous studies and represents an advancement in the technology, as videos created with our program are complete with accurate lip movements, vivid facial expressions and natural head poses, using only their audio recordings and static images," the lead author said in a statement.
Dr. Wu Rongliang, the first author of the study and a Ph.D. graduate from NTU's SCSE, highlighted the complexity of speech variations and the wealth of information it conveys beyond linguistic content. He described their approach as a pioneering effort in enhancing performance in audio representation learning in AI and machine learning.
Read Also : Teachers, Students May Soon Use ChatGPT, OpenAI Explores Educational Applications for Classrooms
Potential Applications
The researchers believe that DIRFA's applications could extend across various industries and domains, including healthcare. By enabling more sophisticated and realistic virtual assistants and chatbots, the program could significantly enhance user experiences.
Furthermore, DIRFA could become a valuable tool for individuals with speech or facial disabilities, facilitating communication through expressive avatars or digital representations.
Assoc Prof Lu emphasized the need for further improvements to DIRFA's interface, allowing users more control over certain outputs. Despite its ability to generate talking faces with accurate lip movements, vivid facial expressions, and natural head poses, the team said that it is committed to refining the program's features and expanding its capabilities.
Looking ahead, the NTU researchers plan to fine-tune DIRFA's facial expressions using a broader range of datasets that incorporate more varied facial expressions and voice audio clips.
The study, titled "Audio-driven talking face generation with diverse yet realistic facial animations", was published in the journal Pattern Recognition.