Cover song detection has been mainly addressed using audio-based systems. Despite its potential impact in industrial contexts, low performances and lack of scalability have prevented such systems from being adopted in practice.

In this work, we study how textual music information (such as metadata and lyrics) can be used along with audio for large-scale cover identification problem. We show that these methods can significantly increase the accuracy and scalability of cover detection systems on Million Song Dataset (MSD) and Second Hand Song datasets.

By leveraging standard tf-idf based text similarity measures on song title and lyrics, we achieved 35.5% of absolute in crease in mean average precision compared to the current state of the art methods on MSD. These experimental results suggests that new methodologies can be encouraged among MIR community to leverage more sophisticated NLP-based techniques to improve current cover identification systems.

This work has been presented at the Late-Breaking Demo session of the 2018 ISMIR conference.