In this work, we set up a novel task of playlist context prediction. From a large playlist title corpus, we manually curate a subset of multilingual labels referring to user activities (e.g. “jogging”, “meditation”, “au calme”), which we further consider in the prediction task.
We explore different approaches to calculate and aggregate track-level contextual semantic embeddings in order to represent a playlist and predict the playlist context from this representation. Our baseline results show that the task can be addressed with a simple framework using information from either audio or distributional similarity of tracks in terms of track-context co-occurrences.