Week 1 - Moodify: Detecting the Mood of Music

Emir Kaan Kırmacı
BBM406 Spring 2021 Projects
4 min readApr 11, 2021

--

Hello everyone and welcome to our first blog post for our BBM406 Project — Moodify! In this post, we’re going to introduce you to our project, what it’s about and how we’re planning to implement it. We’re also going to talk about what we’re planning to work on until the next week’s post.

Introduction

In our daily lives, there may be times that we feel furious, and we simply want to listen to some calm songs to get rid of our anger and feel better. Or there may be times that we feel happy and energetic, and we’d like to listen to more thrilling songs. Because of this, when we boot up our favorite music application we don’t want it to play us a song that would ruin our mood, shall we? That’s where the idea of Moodify comes in.

The Idea of the Project

In this project, we’re going to design a classifier that will predict the dominant mood of a given music sample. We’re planning to use a Deep Learning model with raw audio features as an input (such as Mel Spectrograms) to do the classification. We may also use text data in the form of lyrics, alongside the musical data to create a hybrid model. There’s already quite a bit of research done for both audio and text data, but from what we’ve seen text data is more common in literature. We think raw audio data is more interesting to work on, as instead of working with text data just like how we process them in an NLP Project, we get to work with the audio data of the music itself.

One of the reasons for picking this subject for our project is that music mood detection can be used in a variety of different areas. As an example, it can be used quite well in Recommender Systems for, as the name suggests, recommending music to people based on their mood. Because of this, it can be used in improving the recommendation systems of the applications like Spotify as well.

Implementation and Related Works

When we were checking the related works with our project, we saw that both Machine and Deep Learning-based methods were used. From what we’ve seen, the papers that used Machine Learning methods[1, 2] were mostly working with lyrics and as such, using methods that work well for text classification such as Naïve Bayes Classifier or Support Vector Machines. In comparison to that, Deep Learning-based methods[3, 4] were mostly using raw audio data and used neural networks like Convolutional Neural Networks and Recurrent Neural Networks with Long-Short Term Memory architecture. Since we mainly want to use raw audio data, we decided on using one of the Deep Learning approaches. We didn’t decide which model to use just yet, so we need to check the literature to have a clearer idea. But for now, we do think that using a CNN model and processing the audio like it’s an image would be efficient, given how good CNN’s are for detecting patterns in classification problems. From what we’ve seen, using Mel Spectrograms as the input for the CNN would be a good start[5].

Dataset

For our dataset, we would like to use Google Research’s AudioSet ontology[6] which includes 16955 music videos labeled with 7 different moods (Happy, Funny, Sad, Tender, Exciting, Angry, Scary). The audio samples can also be reached from YouTube. Although the dataset is huge and has the mood labels we need, it doesn’t come with audio files and instead comes with already extracted features. And we don’t know if those features will be compatible with the model we’re going to use. And manually getting all the data from YouTube seems like a challenging task. So we still need to decide which dataset to use.

Plans for the Next Week

For the next week, we plan to work on these topics:

  • Do a more detailed literature review to find out which model to use
  • After choosing the model, decide on which features would be suitable to use
  • Decide on which dataset to use

That’s it for this week. Stay tuned for our next post, and have an amazing week!

Emir Kaan Kırmacı, Tuna Karacan, Cihad Özcan

References

[1] T. Dang & K. Shirai. (2009). “Machine Learning Approaches for Mood Classification of Songs toward Music Search Engine.”

[2] Raschka, Sebastian. (2016). “MusicMood: Predicting the mood of music from song lyrics using machine learning.”

[3] M. Malik et al. (2017). “Stacked Convolutional and Recurrent Neural Networks for Music Emotion Recognition.”

[4] R. Delbouys et al. (2018). “Music Mood Detection Based on Audio and Lyrics with Deep Neural Net”.

[5] Check out Valerio Velardo - The Sound of AI for more AI Audio related info

[6] Available here: https://research.google.com/audioset/ontology/music_mood_1.html

--

--