DeepLatent

DeepLatent is a unified latent variable modeling framework for analyzing large multimodal and multilingual datasets. It relies on variational inference using deep neural networks for estimation.

The package currently supports:

Generic latent factor models
Topic models: The latent variables are a mixture of topics within documents.
Ideal point models: The latent variables are interpreted as ideological dimensions.

🌟 Key Features

Multilingual and multimodal support
- Learn topics / ideal points across multiple modalities (e.g., texts and images, texts and votes, etc.)
- Learn the weight of each modality in determining the latent variables per observation
Flexible metadata handling:
- prevalence: covariates that influence the latent variables
- content: covariates that influence the response variables conditional on the latent variables (e.g., topic-word distributions)
- labels: outcomes for classification or regression tasks
- prediction: additional predictors for the labels
Flexible input/output representations:
- Document embeddings (for texts, images, audio-visual data)
- Word frequencies (BoW)
- Raw images
- Discrete choice data
- Voting records

📦 Models

`GTM` (Generalized Topic Model)

Learns topics on the simplex
Supports dirichlet or logistic_normal priors (optionally conditioned on covariates)

`IdealPointNN`

Learns unconstrained latent variables (ℝ️ⁿ) for ideal point modeling
Designed for political texts, images, audio and video recordings, surveys, and votes
Uses a gaussian prior (optionally conditioned on covariates)

Installation

From PyPI (Recommended)

pip install deeplatent

From Source

git clone https://github.com/PinchOfData/DeepLatent.git  
cd deeplatent
pip install -e .

Development Installation

git clone https://github.com/PinchOfData/DeepLatent.git 
cd deeplatent
python setup_dev.py

🚀 Getting Started

1. Prepare Your Data with `Corpus()`

Supports text, embeddings, votes, and survey questions:

import sys
sys.path.append('../src/')

from corpus import Corpus

modalities = {
    "text": {
        "column": "doc_clean",
        "views": {
            "bow": {
                "type": "bow",
                "vectorizer": CountVectorizer()
            }
        }
    },
    "image": {
        "column": "image_path",
        "views": {
            "embedding": {
                "type": "embedding",
                "embed_fn": my_image_embedder
            }
        }
    }
}

my_dataset = Corpus(df, modalities=modalities)

Optionally include metadata:

prevalence, content, labels, prediction

2. Train a Model

For Topic Models:

from models import GTM

model = GTM(
    n_topics=20, 
    doc_topic_prior="logistic_normal",
    ae_type="wae"
)

For Ideal Point Models:

from models import IdealPointNN

model = IdealPointNN(
    n_ideal_points=1, # one-dimensional ideal point model
    ae_type="vae"
)

🔧 Common Options

Argument	Description
`ae_type`	`"wae"` (Wasserstein autoencoder) or `"vae"` (variational autoencoder) or `"ae"` (plain autoencoder)
`fusion`	`"poe"` (Product of Experts), `"moe_gating"` (Mixture of Experts), or `"moe_average"` (Simple averaging across modalities)
`update_prior`	Learn a structured prior conditioned on `prevalence` covariates
`w_prior`	Strength of prior alignment for `wae`
`w_pred_loss`	Weight of supervised loss predicting `label`
`kl_annealing_*`	Strength of prior alignment for `vae`. Helps preventing posterior collapse.

🔍 Analysis and Utilities

📚 Topic Models (`GTM`)

get_topic_words() – top words per topic
get_covariate_words() – word shifts by content covariates
get_top_docs() – representative documents
get_topic_word_distribution() – topic-word matrix
get_covariate_word_distribution() – word shift matrix
plot_topic_word_distribution() – word clouds / bar plots
visualize_docs() – document embeddings (UMAP, t-SNE, PCA)
visualize_words() – word embeddings
visualize_topics() – topic embeddings

👤 Ideal Point Models (`IdealPointNN`)

get_ideal_points() – ℝ️ⁿ latent space
get_predictions() – supervised output
get_modality_weights() – fusion weights (PoE or gating)

📁 Tutorials

Check out the example notebooks to get started.

Download sample data to run some notebooks: Congressional Speeches CSV

📖 References

Deep Latent Variable Models for Unstructured Data , Germain Gauthier, Philine Widmer, Elliott Ash (2025)
The Neural Ideal Point Model , Germain Gauthier, Hugo Subtil, Philine Widmer (2025)

⚠️ Disclaimer

This package is under active development 🚧 — feedback and contributions welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
deeplatent.egg-info		deeplatent.egg-info
deeplatent		deeplatent
dist		dist
notebooks		notebooks
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
publish_pypi.py		publish_pypi.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_dev.py		setup_dev.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepLatent

🌟 Key Features

📦 Models

`GTM` (Generalized Topic Model)

`IdealPointNN`

Installation

From PyPI (Recommended)

From Source

Development Installation

🚀 Getting Started

1. Prepare Your Data with `Corpus()`

2. Train a Model

For Topic Models:

For Ideal Point Models:

🔧 Common Options

🔍 Analysis and Utilities

📚 Topic Models (`GTM`)

👤 Ideal Point Models (`IdealPointNN`)

📁 Tutorials

📖 References

⚠️ Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepLatent

🌟 Key Features

📦 Models

GTM (Generalized Topic Model)

IdealPointNN

Installation

From PyPI (Recommended)

From Source

Development Installation

🚀 Getting Started

1. Prepare Your Data with Corpus()

2. Train a Model

For Topic Models:

For Ideal Point Models:

🔧 Common Options

🔍 Analysis and Utilities

📚 Topic Models (GTM)

👤 Ideal Point Models (IdealPointNN)

📁 Tutorials

📖 References

⚠️ Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GTM` (Generalized Topic Model)

`IdealPointNN`

1. Prepare Your Data with `Corpus()`

📚 Topic Models (`GTM`)

👤 Ideal Point Models (`IdealPointNN`)

Packages