🔥 Recommendations for C++ using collaborative filtering
- Supports user-based and item-based recommendations
- Works with explicit and implicit feedback
- Uses high-performance matrix factorization
🎉 Zero dependencies
Add the header to your project (supports C++20 and greater).
There is also support for CMake and FetchContent:
include(FetchContent)
FetchContent_Declare(disco GIT_REPOSITORY https://github.com/ankane/disco-cpp.git GIT_TAG v0.1.4)
FetchContent_MakeAvailable(disco)
target_link_libraries(app PRIVATE disco::disco)Include the header
#include "disco.hpp"Prep your data in the format user_id, item_id, value
disco::Dataset<std::string, std::string> data;
data.push("user_a", "item_a", 5.0);
data.push("user_a", "item_b", 3.5);
data.push("user_b", "item_a", 4.0);IDs can be integers, strings, or any other hashable data type
data.push(1, "item_a", 5.0);If users rate items directly, this is known as explicit feedback. Fit the recommender with:
auto recommender = disco::Recommender<std::string, std::string>::fit_explicit(data);If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Use 1.0 or a value like number of purchases or page views for the dataset, and fit the recommender with:
auto recommender = disco::Recommender<std::string, std::string>::fit_implicit(data);Get user-based recommendations - “users like you also liked”
recommender.user_recs(user_id);Get item-based recommendations - “users who liked this item also liked”
recommender.item_recs(item_id);Use the count option to specify the number of recommendations (default is 5)
recommender.user_recs(user_id, 5);Get the predicted rating for a specific user and item
recommender.predict(user_id, item_id);Get similar users
recommender.similar_users(user_id);Download the MovieLens 100K dataset.
And use:
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <stdexcept>
#include <string>
#include <unordered_map>
#include "disco.hpp"
disco::Dataset<int, std::string> load_movielens(const std::string& path) {
// read movies
std::unordered_map<std::string, std::string> movies;
std::ifstream movies_file{path + "/u.item"};
if (!movies_file.is_open()) {
throw std::runtime_error{"Could not open file"};
}
std::string line;
while (std::getline(movies_file, line)) {
size_t n = line.find('|');
size_t n2 = line.find('|', n + 1);
movies.emplace(line.substr(0, n), line.substr(n + 1, n2 - n - 1));
}
// read ratings and create dataset
disco::Dataset<int, std::string> data;
std::ifstream ratings_file{path + "/u.data"};
if (!ratings_file.is_open()) {
throw std::runtime_error{"Could not open file"};
}
while (std::getline(ratings_file, line)) {
size_t n = line.find('\t');
size_t n2 = line.find('\t', n + 1);
size_t n3 = line.find('\t', n2 + 1);
data.push(
std::stoi(line.substr(0, n)),
movies.at(line.substr(n + 1, n2 - n - 1)),
std::stof(line.substr(n2 + 1, n3 - n2 - 1))
);
}
return data;
}
int main() {
// https://grouplens.org/datasets/movielens/100k/
const char* movielens_path = std::getenv("MOVIELENS_100K_PATH");
if (movielens_path == nullptr) {
std::cout << "Set MOVIELENS_100K_PATH" << std::endl;
return 1;
}
disco::Dataset<int, std::string> data = load_movielens(movielens_path);
auto recommender = disco::Recommender<int, std::string>::fit_explicit(data, {.factors = 20});
std::string movie{"Star Wars (1977)"};
std::cout << "Item-based recommendations for " << movie << std::endl;
for (const auto& [item_id, score] : recommender.item_recs(movie)) {
std::cout << "- " << item_id << std::endl;
}
int user_id = 123;
std::cout << std::endl << "User-based recommendations for " << user_id << std::endl;
for (const auto& [item_id, score] : recommender.user_recs(user_id)) {
std::cout << "- " << item_id << std::endl;
}
return 0;
}Save recommendations to your database.
Alternatively, you can store only the factors and use a library like pgvector-cpp. See an example.
Disco uses high-performance matrix factorization.
- For explicit feedback, it uses the stochastic gradient method with twin learners
- For implicit feedback, it uses the conjugate gradient method
Specify the number of factors and iterations
auto recommender = Recommender<int, int>::fit_explicit(data, {.factors = 8, .iterations = 20});Pass a callback to show progress
auto callback = [](const disco::FitInfo& info) {
std::cout << info.iteration << ": " << info.train_loss << std::endl;
};
auto recommender = disco::Recommender<int, int>::fit_explicit(data, {.callback = callback});Note: train_loss is not available for implicit feedback
Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.
recommender.user_recs(new_user_id); // returns empty arrayThere are a number of ways to deal with this, but here are some common ones:
- For user-based recommendations, show new users the most popular items
- For item-based recommendations, make content-based recommendations
Get ids
recommender.user_ids();
recommender.item_ids();Get the global mean
recommender.global_mean();Get factors
recommender.user_factors(user_id);
recommender.item_factors(item_id);- A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization
- Faster Implicit Matrix Factorization
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/disco-cpp.git
cd disco-cpp
cmake -S . -B build
cmake --build build
build/test