In this video I share a quick demo of the music generation engine that you can try yourself:
Demo: https://huggingface.co/spaces/facebook/MusicGen
GitHub: https://github.com/facebookresearch/audiocraft
MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz. Unlike existing methods like MusicLM, MusicGen doesn't not require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, can predict them in parallel, thus having only 50 auto-regressive steps per second of audio
Share this post