Skip to content

ankane/disco-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

152 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disco C++

🔥 Recommendations for C++ using collaborative filtering

  • Supports user-based and item-based recommendations
  • Works with explicit and implicit feedback
  • Uses high-performance matrix factorization

🎉 Zero dependencies

Build Status

Installation

Add the header to your project (supports C++20 and greater).

There is also support for CMake and FetchContent:

include(FetchContent)

FetchContent_Declare(disco GIT_REPOSITORY https://github.com/ankane/disco-cpp.git GIT_TAG v0.1.4)
FetchContent_MakeAvailable(disco)

target_link_libraries(app PRIVATE disco::disco)

Getting Started

Include the header

#include "disco.hpp"

Prep your data in the format user_id, item_id, value

disco::Dataset<std::string, std::string> data;
data.push("user_a", "item_a", 5.0);
data.push("user_a", "item_b", 3.5);
data.push("user_b", "item_a", 4.0);

IDs can be integers, strings, or any other hashable data type

data.push(1, "item_a", 5.0);

If users rate items directly, this is known as explicit feedback. Fit the recommender with:

auto recommender = disco::Recommender<std::string, std::string>::fit_explicit(data);

If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Use 1.0 or a value like number of purchases or page views for the dataset, and fit the recommender with:

auto recommender = disco::Recommender<std::string, std::string>::fit_implicit(data);

Get user-based recommendations - “users like you also liked”

recommender.user_recs(user_id);

Get item-based recommendations - “users who liked this item also liked”

recommender.item_recs(item_id);

Use the count option to specify the number of recommendations (default is 5)

recommender.user_recs(user_id, 5);

Get the predicted rating for a specific user and item

recommender.predict(user_id, item_id);

Get similar users

recommender.similar_users(user_id);

Examples

MovieLens

Download the MovieLens 100K dataset.

And use:

#include <cstdlib>
#include <fstream>
#include <iostream>
#include <stdexcept>
#include <string>
#include <unordered_map>

#include "disco.hpp"

disco::Dataset<int, std::string> load_movielens(const std::string& path) {
    // read movies
    std::unordered_map<std::string, std::string> movies;
    std::ifstream movies_file{path + "/u.item"};
    if (!movies_file.is_open()) {
        throw std::runtime_error{"Could not open file"};
    }
    std::string line;
    while (std::getline(movies_file, line)) {
        size_t n = line.find('|');
        size_t n2 = line.find('|', n + 1);
        movies.emplace(line.substr(0, n), line.substr(n + 1, n2 - n - 1));
    }

    // read ratings and create dataset
    disco::Dataset<int, std::string> data;
    std::ifstream ratings_file{path + "/u.data"};
    if (!ratings_file.is_open()) {
        throw std::runtime_error{"Could not open file"};
    }
    while (std::getline(ratings_file, line)) {
        size_t n = line.find('\t');
        size_t n2 = line.find('\t', n + 1);
        size_t n3 = line.find('\t', n2 + 1);
        data.push(
            std::stoi(line.substr(0, n)),
            movies.at(line.substr(n + 1, n2 - n - 1)),
            std::stof(line.substr(n2 + 1, n3 - n2 - 1))
        );
    }

    return data;
}

int main() {
    // https://grouplens.org/datasets/movielens/100k/
    const char* movielens_path = std::getenv("MOVIELENS_100K_PATH");
    if (movielens_path == nullptr) {
        std::cout << "Set MOVIELENS_100K_PATH" << std::endl;
        return 1;
    }

    disco::Dataset<int, std::string> data = load_movielens(movielens_path);
    auto recommender = disco::Recommender<int, std::string>::fit_explicit(data, {.factors = 20});

    std::string movie{"Star Wars (1977)"};
    std::cout << "Item-based recommendations for " << movie << std::endl;
    for (const auto& [item_id, score] : recommender.item_recs(movie)) {
        std::cout << "- " << item_id << std::endl;
    }

    int user_id = 123;
    std::cout << std::endl << "User-based recommendations for " << user_id << std::endl;
    for (const auto& [item_id, score] : recommender.user_recs(user_id)) {
        std::cout << "- " << item_id << std::endl;
    }

    return 0;
}

Storing Recommendations

Save recommendations to your database.

Alternatively, you can store only the factors and use a library like pgvector-cpp. See an example.

Algorithms

Disco uses high-performance matrix factorization.

Specify the number of factors and iterations

auto recommender = Recommender<int, int>::fit_explicit(data, {.factors = 8, .iterations = 20});

Progress

Pass a callback to show progress

auto callback = [](const disco::FitInfo& info) {
    std::cout << info.iteration << ": " << info.train_loss << std::endl;
};
auto recommender = disco::Recommender<int, int>::fit_explicit(data, {.callback = callback});

Note: train_loss is not available for implicit feedback

Cold Start

Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.

recommender.user_recs(new_user_id); // returns empty array

There are a number of ways to deal with this, but here are some common ones:

  • For user-based recommendations, show new users the most popular items
  • For item-based recommendations, make content-based recommendations

Reference

Get ids

recommender.user_ids();
recommender.item_ids();

Get the global mean

recommender.global_mean();

Get factors

recommender.user_factors(user_id);
recommender.item_factors(item_id);

References

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/disco-cpp.git
cd disco-cpp
cmake -S . -B build
cmake --build build
build/test

About

Recommendations for C++ using collaborative filtering

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors