Week 1 - Moodify: Detecting the Mood of Music

Published in

BBM406 Spring 2021 Projects

4 min readApr 11, 2021

Hello everyone and welcome to our first blog post for our BBM406 Project — Moodify! In this post, we’re going to introduce you to our project, what it’s about and how we’re planning to implement it. We’re also going to talk about what we’re planning to work on until the next week’s post.

Introduction

In our daily lives, there may be times that we feel furious, and we simply want to listen to some calm songs to get rid of our anger and feel better. Or there may be times that we feel happy and energetic, and we’d like to listen to more thrilling songs. Because of this, when we boot up our favorite music application we don’t want it to play us a song that would ruin our mood, shall we? That’s where the idea of Moodify comes in.

The Idea of the Project

In this project, we’re going to design a classifier that will predict the dominant mood of a given music sample. We’re planning to use a Deep Learning model with raw audio features as an input (such as Mel Spectrograms) to do the classification. We may also use text data in the form of lyrics, alongside the musical data to create a hybrid model. There’s already quite a bit of research done for both audio and text data, but from what we’ve seen text data is more common in literature. We think raw audio data is more interesting to work on, as instead of working with text data just like how we process them in an NLP Project, we get to work with the audio data of the music itself.

One of the reasons for picking this subject for our project is that music mood detection can be used in a variety of different areas. As an example, it can be used quite well in Recommender Systems for, as the name suggests, recommending music to people based on their mood. Because of this, it can be used in improving the recommendation systems of the applications like Spotify as well.

Implementation and Related Works

When we were checking the related works with our project, we saw that both Machine and Deep Learning-based methods were used. From what we’ve seen, the papers that used Machine Learning methods[1, 2] were mostly working with lyrics and as such, using methods that work well for text classification such as Naïve Bayes Classifier or Support Vector Machines. In comparison to that, Deep Learning-based methods[3, 4] were mostly using raw audio data and used neural networks like Convolutional Neural Networks and Recurrent Neural Networks with Long-Short Term Memory architecture. Since we mainly want to use raw audio data, we decided on using one of the Deep Learning approaches. We didn’t decide which model to use just yet, so we need to check the literature to have a clearer idea. But for now, we do think that using a CNN model and processing the audio like it’s an image would be efficient, given how good CNN’s are for detecting patterns in classification problems. From what we’ve seen, using Mel Spectrograms as the input for the CNN would be a good start[5].

Dataset

For our dataset, we would like to use Google Research’s AudioSet ontology[6] which includes 16955 music videos labeled with 7 different moods (Happy, Funny, Sad, Tender, Exciting, Angry, Scary). The audio samples can also be reached from YouTube. Although the dataset is huge and has the mood labels we need, it doesn’t come with audio files and instead comes with already extracted features. And we don’t know if those features will be compatible with the model we’re going to use. And manually getting all the data from YouTube seems like a challenging task. So we still need to decide which dataset to use.