In this paper, we present Visualyre, a web application that synthesizes images based on the semantics of the lyrics, and the mood of the music.

We use a multimodal approach, generating initial images with a text-to-image generative models from the lyrics (text) of a song, followed by a style transfer model conditioned to the mood of the music (audio).

Our target user base is the independent music artist community, by providing a means for composers and songwriters to generate suitable images for their music (such as album covers).

We discuss the possible usage of such application, as well as the possible improvements in future iterations.

This paper has been accepted for publication in the proceedings of the Audio Mostly 2021 conference.