Disco C++

🔥 Recommendations for C++ using collaborative filtering

Supports user-based and item-based recommendations
Works with explicit and implicit feedback
Uses high-performance matrix factorization

🎉 Zero dependencies

Installation

Add the header to your project (supports C++20 and greater).

There is also support for CMake and FetchContent:

include(FetchContent)

FetchContent_Declare(disco GIT_REPOSITORY https://github.com/ankane/disco-cpp.git GIT_TAG v0.1.4)
FetchContent_MakeAvailable(disco)

target_link_libraries(app PRIVATE disco::disco)

Getting Started

Include the header

#include "disco.hpp"

Prep your data in the format user_id, item_id, value

disco::Dataset<std::string, std::string> data;
data.push("user_a", "item_a", 5.0);
data.push("user_a", "item_b", 3.5);
data.push("user_b", "item_a", 4.0);

IDs can be integers, strings, or any other hashable data type

data.push(1, "item_a", 5.0);

If users rate items directly, this is known as explicit feedback. Fit the recommender with:

auto recommender = disco::Recommender<std::string, std::string>::fit_explicit(data);

If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Use 1.0 or a value like number of purchases or page views for the dataset, and fit the recommender with:

auto recommender = disco::Recommender<std::string, std::string>::fit_implicit(data);

Get user-based recommendations - “users like you also liked”

recommender.user_recs(user_id);

Get item-based recommendations - “users who liked this item also liked”

recommender.item_recs(item_id);

Use the count option to specify the number of recommendations (default is 5)

recommender.user_recs(user_id, 5);

Get the predicted rating for a specific user and item

recommender.predict(user_id, item_id);

Get similar users

recommender.similar_users(user_id);

Examples

MovieLens

Download the MovieLens 100K dataset.

And use:

#include <cstdlib>
#include <fstream>
#include <iostream>
#include <stdexcept>
#include <string>
#include <unordered_map>

#include "disco.hpp"

disco::Dataset<int, std::string> load_movielens(const std::string& path) {
    // read movies
    std::unordered_map<std::string, std::string> movies;
    std::ifstream movies_file{path + "/u.item"};
    if (!movies_file.is_open()) {
        throw std::runtime_error{"Could not open file"};
    }
    std::string line;
    while (std::getline(movies_file, line)) {
        size_t n = line.find('|');
        size_t n2 = line.find('|', n + 1);
        movies.emplace(line.substr(0, n), line.substr(n + 1, n2 - n - 1));
    }

    // read ratings and create dataset
    disco::Dataset<int, std::string> data;
    std::ifstream ratings_file{path + "/u.data"};
    if (!ratings_file.is_open()) {
        throw std::runtime_error{"Could not open file"};
    }
    while (std::getline(ratings_file, line)) {
        size_t n = line.find('\t');
        size_t n2 = line.find('\t', n + 1);
        size_t n3 = line.find('\t', n2 + 1);
        data.push(
            std::stoi(line.substr(0, n)),
            movies.at(line.substr(n + 1, n2 - n - 1)),
            std::stof(line.substr(n2 + 1, n3 - n2 - 1))
        );
    }

    return data;
}

int main() {
    // https://grouplens.org/datasets/movielens/100k/
    const char* movielens_path = std::getenv("MOVIELENS_100K_PATH");
    if (movielens_path == nullptr) {
        std::cout << "Set MOVIELENS_100K_PATH" << std::endl;
        return 1;
    }

    disco::Dataset<int, std::string> data = load_movielens(movielens_path);
    auto recommender = disco::Recommender<int, std::string>::fit_explicit(data, {.factors = 20});

    std::string movie{"Star Wars (1977)"};
    std::cout << "Item-based recommendations for " << movie << std::endl;
    for (const auto& [item_id, score] : recommender.item_recs(movie)) {
        std::cout << "- " << item_id << std::endl;
    }

    int user_id = 123;
    std::cout << std::endl << "User-based recommendations for " << user_id << std::endl;
    for (const auto& [item_id, score] : recommender.user_recs(user_id)) {
        std::cout << "- " << item_id << std::endl;
    }

    return 0;
}

Storing Recommendations

Save recommendations to your database.

Alternatively, you can store only the factors and use a library like pgvector-cpp. See an example.

Algorithms

Disco uses high-performance matrix factorization.

For explicit feedback, it uses the stochastic gradient method with twin learners
For implicit feedback, it uses the conjugate gradient method

Specify the number of factors and iterations

auto recommender = Recommender<int, int>::fit_explicit(data, {.factors = 8, .iterations = 20});

Progress

Pass a callback to show progress

auto callback = [](const disco::FitInfo& info) {
    std::cout << info.iteration << ": " << info.train_loss << std::endl;
};
auto recommender = disco::Recommender<int, int>::fit_explicit(data, {.callback = callback});

Note: train_loss is not available for implicit feedback

Cold Start

Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.

recommender.user_recs(new_user_id); // returns empty array

There are a number of ways to deal with this, but here are some common ones:

For user-based recommendations, show new users the most popular items
For item-based recommendations, make content-based recommendations

Reference

Get ids

recommender.user_ids();
recommender.item_ids();

Get the global mean

recommender.global_mean();

Get factors

recommender.user_factors(user_id);
recommender.item_factors(item_id);

References

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/ankane/disco-cpp.git
cd disco-cpp
cmake -S . -B build
cmake --build build
build/test

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
examples		examples
include		include
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disco C++

Installation

Getting Started

Examples

MovieLens

Storing Recommendations

Algorithms

Progress

Cold Start

Reference

References

History

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disco C++

Installation

Getting Started

Examples

MovieLens

Storing Recommendations

Algorithms

Progress

Cold Start

Reference

References

History

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages