New research by the University of Michigan explores the potential of AI to interpret the meaning behind dog barks, distinguishing between playful and aggressive tones.
According to the research team, the AI models used in this study can also extract other information from animal sounds, such as age, breed, and sex.
This research, conducted in collaboration with Mexico's National Institute of Astrophysics, Optics and Electronics (INAOE) in Puebla, showed that AI models initially designed for human speech could be modified to understand animal communication.
How Can AI Decode Dog Barks, Arfs, and Woofs?
Rada Mihalcea, a leading professor of computer science and engineering at the University of Michigan, emphasized that using speech processing models first trained on human language opens new possibilities for understanding the nuances of dog barks.
She pointed out that while much about animal communication remains unknown, advances in AI can significantly enhance our comprehension of it without needing to start from scratch. One of the primary challenges in creating AI models for analyzing animal sounds is the scarcity of publicly available data.
Artem Abzaliev, the lead author and a doctoral student at the University of Michigan, noted that animal sounds must often be passively recorded in natural settings or, in the case of domestic pets, with the owner's permission.
Hence, this logistical difficulty in obtaining data has hindered progress in creating effective AI models for animal vocalization analysis. Despite these obstacles, the researchers found a way to repurpose an existing AI model initially designed for human speech.
This strategy allowed them to utilize robust models that are the foundation of many current voice-enabled technologies, such as voice-to-text and language translation. Abzaliev explained that these models can learn and encode the complex patterns of human language and speech.
The team wanted to explore if this capability could be extended to decipher and interpret dog bark. They used a dataset of dog sounds recorded from 74 dogs of different breeds, ages, and sexes in different situations.
Wav2Vec2 For Understanding Dogs
Humberto Pérez-Espinosa, a collaborator at INAOE, gathered the dataset. Abzaliev then leveraged these recordings to refine a machine-learning model that recognizes patterns in large datasets.
The model chosen for this task was Wav2Vec2, initially trained using human speech data. The AI model was trained to identify different features of the dog barks, such as whether they indicated playfulness or aggression.
This approach proved promising, showing that AI models trained on human speech could be effectively adapted to understanding animal vocalizations. The researchers believe these models can reveal much about animal communication that remains unexplored.
"This is the first time that techniques optimized for human speech have been built upon to help with the decoding of animal communication," Mihalcea said in a statement.
"Our results show that the sounds and patterns derived from human speech can serve as a foundation for analyzing and understanding the acoustic patterns of other sounds, such as animal vocalizations," she added.
The study's findings were recently shared at the Joint International Conference on Computational Linguistics, Language Resources and Evaluation and are available on the arXiv preprint server.