MusicRating
Directory actions
More options
Directory actions
More options
MusicRating
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||
The task is to predict the rating a user will give to a song (https://www.kaggle.com/c/MusicHackathon). The interesting part is that this problem provides us with tremendous amount of data, including users's rating, profile, preferences etc.. And they are in various format, ratings, words, binary... So the big challange here is how to select features, which turns out to be the key to this problem. The basic idea of my approach is to create models for each artist (rather than each artist, track pair). For a particular artist, we extract all its ratings from train.csv, and the features for each user we create from both users.csv and words.csv. I first extract features from users.csv (the file contains users' profiles) for each user, the feature includes age, sex, and the answer for their habbit questions. And then from words.csv (survey for users), I use the score this user give to this song as additional features. Basically I combine this two, and use Lasso regression (L1 norm) to build model. Due to time issue, I do not fully optimize the algorithm and there are lots of work remains to be done. I finally got rmse 16.68 and the leader got 13.24.