Riffusion uses StableDiffusion to make music. The creators, Seth Forsgren and Hayk Martiros, tuned it to generate audio spectograms. The resulting images can be played back as sound, resulting in (decidedly lo-fi) magic—and, one suspects, a whole new batch of copyright lawyers' eyes glinting in the darkness of their recuperation sarcophagi.
From the about page:
Really? Yup.
This is the v1.5 stable diffusion model with no modifications, just fine-tuned on images of spectrograms paired with text. Audio processing happens downstream of the model.
It can generate infinite variations of a prompt by varying the seed. All the same web UIs and techniques like img2img, inpainting, negative prompts, and interpolation work out of the box.