Stability AI, known for its popular AI art generator Stable Diffusion, has introduced a new AI model for generating sounds and songs.
This model, named Stable Audio Open, can produce up to 47 seconds of audio based on text descriptions. Unlike other models, it was trained exclusively on royalty-free recordings from FreeSound and the Free Music Archive.
"We're excited to announce Stable Audio Open, an open source model optimised for generating short audio samples, sound effects and production elements using text prompts," Stability AI said in a statement.
"This release marks a key milestone as we further open portions of our generative audio capabilities to empower sound designers, musicians and creative communities," it added.
Stable Audio Open: Generative AI for Musicians
Like any classic generative AI tool, Stable Audio Open takes a text input and generates a corresponding audio recording. With a training set of approximately 486,000 samples from royalty-free music libraries, the model aims to be a versatile tool for creating various audio elements.
Users can generate drum beats, instrument riffs, ambient noises, and other production elements for media projects like videos, films, and TV shows. The model also offers the capability to edit existing songs or apply the style of one genre to another.
Stability AI emphasizes that Stable Audio Open is an open-source text-to-audio model tailored for generating short audio samples, sound effects, and production elements.
This tool is intended to be a resource for sound designers, musicians, and creative professionals. It allows them to create high-quality audio data from simple text prompts. The model's specialized training makes it particularly useful for creating various sounds, which are essential for music production and sound design.
One of the primary advantages of this open-source release is that it allows users to fine-tune the model with their custom audio data. For instance, a drummer could refine the model using samples of their drum recordings to produce new beats tailored to their style.
This feature provides higher customization and flexibility, enabling creators to generate unique audio elements that match their specific needs and artistic vision.
Read Also : Stability AI to Announce Layoffs, Major Restructuring Weeks After Audio Generation Tool Launch
Commercial Version of Stable Audio
The commercial version of Stable Audio offers more advanced capabilities, including generating full tracks with coherent musical structures up to three minutes in length, audio-to-audio generation, and multi-part musical compositions.
However, Stable Audio Open focuses on producing shorter audio samples and is not optimized for creating complete songs, melodies, or vocals.
Although it has this limitation, the open-source model offers essential insights into generative AI for sound design and focuses on responsible development with creative communities.
The dataset for training Stable Audio Open came from FreeSound and the Free Music Archive, guaranteeing that the model respects creator rights while offering a strong dataset for audio generation.
Stability AI encourages sound designers, musicians, developers, and audio enthusiasts to download the model from Hugging Face, delve into its features, and share their feedback.