Overview
Created a machine learning model to predict the genre of a song the user inputs. The model was trained with a dataset of over 200,000 songs from Spotify, 24 genres, and 15 audio features (ex. acousticness, danceability, energy, loudness). Application is hosted on Flask app.
Recognition
Awarded "The Most Technically Challenging" and Voted "Class Favorite"
INTENT
I have been passionate about individualized music playlist and the algorithms behind Spotify's song recommendation. While Spotify helps me discover many new artists, I realize the speed I consume music is way faster than I can find an exact match of a song that matches my "mood" at the time. My "mood" varies depends on season, time in day, events that are happening and whether or not I am listening to the music solo or with others. When I listen to music solo, I sometimes enjoy the a type of rhythm and wish to continue to listen to the same tempo, beats per minute with very little variation. On Day 2 of the six-months long data analytics bootcamp, I pitched the idea of using machine learning model to write new music to my classmates later my close friends.
As the scope of the final project which we had three weeks to execute and debug, we limited to the prediction of music genre based on an input song. The web experience will allow user to type in any song that exists in Spotify, and the system will run the song through a trained machine learning model based on Random Forest Classier to output the most relevant predicted song genre as well as displaying the actual song genre per Spotify.
This final project, while being awarded "The Most Technically Challenging", it is the beginning of the more complex system which will generate licenseless music based on input and dataset.
Project Presentation and live demonstration
* Github repository is where open source community stores the source file for projects and research for public access and contribution across various computer programming languages.
Hightlights
1. Large data set: we apply data cleaning, data processing and analyzing to 232K songs from Spotify
2. The data set of songs spans 26 genres, including R&B, Hip-Pop, Soul, Electronic etc.
3. The data set includes 15 audio analysis features based on Spotify "Audio Features API". Features include: key, mode, time_signature, acoustics, danceability, energy, instrumentalness, liveness, loudness etc.
4. Random Forest Classifier model was the starting model we selected for this project
5. After the machine learning model is created, we fine-tuned the model by applying feature engineering and hyperparameters that yields the highest accuracy
6. Final trained machine learning model prediction of genre for each input song increased from baseline 4.6% to 34.356%.
Photos captured during presentation