FacebookMissingLinks
Directory actions
More options
Directory actions
More options
FacebookMissingLinks
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||
This is the code for Kaggle - Facebook Recruiting Competion. The task is about predicting missing links in asymmetric social network. (http://www.kaggle.com/c/FacebookRecruiting) My approach can be divided into two phases. The first phase is selecting candidates and the second is ranking the candidates. In the first phase, for each predicting node, I select all surrounding nodes up to 3 levels as the candidates, and statistics show that this method usually get only 8% missing rate and is quite efficient in time. In the second phase, all candidates are ranked based on the probability to be potentially followed by the given node. This turns out to be a classification problem. Features I use includes: whether it follows the given node, percent of followers of the node follow it, percent of followers of the node followed by it, percent of followees of the node follow it, percent of followees of the node followed by it. One thing that bothers me for a whole week is that this is a skewed classification, in which the fraction of postive training examples are less than 1%. So the classifier might get very very low recall, thus to be terrible. To mitigate this issue, I under-sampled the negative examples with about 1:10 (this ratio is achieved by experiments) and test the classifier in the original distribution. Logistic regression got a good result (about 73% recall and 23% precision). My best score is about 71.4% of mean average precision, and the leader is 72.98%.