You’re building an AI agent. Not just a simple chatbot - a real agent that needs to remember conversations, search through knowledge bases, and maintain state across multiple steps.
Here’s what usually happens: You start with a simple script. Then you need conversation history. So you add a list. Then you need to search documents. So you add another data structure. Then you need to persist state across restarts. So you add a database. Then you realize your code is a tangled mess of state management, database calls, and business logic all mixed together.
Sound familiar?
Here’s a better approach: use LangGraph for agent orchestration and Beanis for state management and vector storage.
LangGraph gives you a clean way to define agent workflows as graphs. Each step is a node. State flows between nodes. You can visualize it, debug it, and modify it without rewriting everything.
Beanis gives you a Redis-backed ODM (Object Document Mapper) with built-in vector search. Store documents, embeddings, conversation history, and agent state in Redis with a clean Python API. No manual serialization, no key management headaches.
Together? You get stateful AI agents that actually work in production.
A RAG agent that:
Input/Output Example:
INPUT: "How many students are at Notre Dame?"
OUTPUT: "In 2014, the Notre Dame student body consisted of 12,179 students."
[Retrieved 3 relevant documents from 100 stored]
The complete code is ~200 lines. And it actually works.
User Query
↓
┌─────────────────────────────────────┐
│ LangGraph Workflow │
│ │
│ ┌─────────────────────────────┐ │
│ │ 1. Retrieve Context │ │
│ │ (Vector Search) │ │
│ └────────────┬────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────┐ │
│ │ 2. Load History │ │
│ │ (From Redis) │ │
│ └────────────┬────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────┐ │
│ │ 3. Generate Response │ │
│ │ (OpenAI + Context) │ │
│ └────────────┬────────────────┘ │
│ ↓ │
│ ┌─────────────────────────────┐ │
│ │ 4. Save to History │ │
│ │ (Persist in Redis) │ │
│ └─────────────────────────────┘ │
└─────────────────────────────────────┘
Each node runs independently. State flows through the graph. If a node fails, you can retry it. Want to add a new step? Add a node and wire it up. No spaghetti code.
With Beanis, you define models like Pydantic classes. The magic? Vector fields and automatic indexing.
from beanis import Document, VectorField
from typing import List
from typing_extensions import Annotated
from datetime import datetime
class KnowledgeDocument(Document):
"""Document with vector embeddings for RAG"""
title: str
context: str
question: Optional[str] = None
# Vector embedding (1536 dims for OpenAI text-embedding-3-small)
# See: https://platform.openai.com/docs/guides/embeddings
embedding: Annotated[List[float], VectorField(dimensions=1536)]
source: str = "squad"
created_at: datetime = Field(default_factory=datetime.now)
class Settings:
name = "knowledge_docs"
class ConversationHistory(Document):
"""Conversation history for context-aware responses"""
session_id: str
role: str # "user" or "assistant"
content: str
timestamp: datetime = Field(default_factory=datetime.now)
retrieved_docs: Optional[List[str]] = None
class Settings:
name = "conversations"
That’s it. Beanis handles:
GEOADD under the hood)Load data from the SQuAD dataset and store it in Redis:
from datasets import load_dataset
from langchain_openai import OpenAIEmbeddings
from beanis import init_beanis
async def ingest_data(api_key: str):
# Connect to Redis
redis_client = redis.Redis(host="localhost", port=6379, decode_responses=False)
# Initialize Beanis (one line - handles all Redis indexes automatically)
await init_beanis(database=redis_client, document_models=[KnowledgeDocument])
# Load embeddings (using OpenAI's text-embedding-3-small model)
# This model generates 1536-dimensional vectors optimized for semantic search
# Alternatives: text-embedding-3-large (3072 dims), text-embedding-ada-002 (1536 dims)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", openai_api_key=api_key)
# Load SQuAD dataset (100 Wikipedia passages about various topics)
dataset = load_dataset("rajpurkar/squad", split="train")
# Ingest documents
for example in dataset.select(range(100)): # First 100 for demo
embedding = embeddings.embed_query(example["context"])
doc = KnowledgeDocument(
title=example["title"],
context=example["context"],
question=example["question"],
embedding=embedding
)
await doc.insert() # One line: saves to Redis + creates vector index
Why Beanis saves you time here:
FT.CREATE, handle errorsawait doc.insert() - everything happens automaticallyFT.CREATE commandsRun this once, and you’ve got 100 documents with embeddings in Redis, ready for semantic search.
Now for the interesting part - the agent workflow:
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, SystemMessage
from beanis.odm.indexes import IndexManager
class RAGAgent:
def __init__(self, redis_client, openai_api_key: str):
self.redis_client = redis_client
self.embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
openai_api_key=openai_api_key
)
self.llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.7,
openai_api_key=openai_api_key
)
# Build the workflow graph
self.graph = self._build_graph()
def _build_graph(self) -> StateGraph:
"""Define the agent workflow"""
workflow = StateGraph(RAGAgentState)
# Define nodes (steps)
workflow.add_node("retrieve_context", self._retrieve_context)
workflow.add_node("load_history", self._load_conversation_history)
workflow.add_node("generate_response", self._generate_response)
workflow.add_node("save_history", self._save_conversation)
# Define edges (flow)
workflow.set_entry_point("retrieve_context")
workflow.add_edge("retrieve_context", "load_history")
workflow.add_edge("load_history", "generate_response")
workflow.add_edge("generate_response", "save_history")
workflow.add_edge("save_history", END)
return workflow.compile()
Clean, right? Each node is a method. State flows through the graph. Want to add a fact-checking step? Add a node between generate_response and save_history. Want to run multiple retrievers in parallel? Make multiple entry points and combine results in a merge node.
async def _retrieve_context(self, state: RAGAgentState) -> RAGAgentState:
"""Retrieve relevant documents using vector similarity"""
# Generate query embedding
query_embedding = self.embeddings.embed_query(state["query"])
# Search Redis using Beanis
results = await IndexManager.find_by_vector_similarity(
redis_client=self.redis_client,
document_class=KnowledgeDocument,
field_name="embedding",
query_vector=query_embedding,
k=3 # Top 3 results
)
# Fetch documents
retrieved_texts = []
doc_ids = []
for doc_id, score in results:
doc = await KnowledgeDocument.get(doc_id)
if doc:
retrieved_texts.append(f"Context: {doc.context}")
doc_ids.append(str(doc.id))
combined_context = "\n\n".join(retrieved_texts)
return {
**state,
"retrieved_docs": doc_ids,
"retrieved_context": combined_context
}
Beanis handles the Redis FT.SEARCH commands for you. You just call find_by_vector_similarity and get results. No manual index management, no raw Redis commands.
async def _load_conversation_history(self, state: RAGAgentState) -> RAGAgentState:
"""Load recent conversation from Redis"""
# Get last 5 messages for this session
history_docs = await ConversationHistory.find_many(
ConversationHistory.session_id == state["session_id"],
sort=[("timestamp", -1)],
limit=5
)
conversation_history = [
{"role": doc.role, "content": doc.content}
for doc in reversed(history_docs)
]
return {**state, "conversation_history": conversation_history}
This is just querying Redis, but Beanis makes it look like an ORM. Filter by session_id, sort by timestamp, limit results. Clean API, no manual key construction.
async def _generate_response(self, state: RAGAgentState) -> RAGAgentState:
"""Generate response using LLM with context"""
messages = [
SystemMessage(content=f"""You are a helpful AI assistant.
Answer based on this context: {state["retrieved_context"]}""")
]
# Add conversation history
for msg in state.get("conversation_history", []):
if msg["role"] == "user":
messages.append(HumanMessage(content=msg["content"]))
else:
messages.append(SystemMessage(content=msg["content"]))
# Add current query
messages.append(HumanMessage(content=state["query"]))
# Generate
response = await self.llm.ainvoke(messages)
return {**state, "final_response": response.content}
Standard LangChain stuff. The key is that state flows naturally through the graph.
async def _save_conversation(self, state: RAGAgentState) -> RAGAgentState:
"""Save conversation to Redis"""
# Save user message
user_msg = ConversationHistory(
session_id=state["session_id"],
role="user",
content=state["query"]
)
await user_msg.insert()
# Save assistant response
assistant_msg = ConversationHistory(
session_id=state["session_id"],
role="assistant",
content=state["final_response"],
retrieved_docs=state.get("retrieved_docs", [])
)
await assistant_msg.insert()
return state
Two inserts. That’s it. Beanis handles serialization, timestamp generation, everything.
# Initialize
redis_client = redis.Redis(host="localhost", port=6379, decode_responses=False)
await init_beanis(
database=redis_client,
document_models=[KnowledgeDocument, ConversationHistory]
)
agent = RAGAgent(redis_client=redis_client, openai_api_key=api_key)
# Query
result = await agent.query(
query="What universities are mentioned?",
session_id="user-123"
)
print(result["response"])
# Output: "The university mentioned is the University of Notre Dame."
Real Examples from the SQuAD Dataset:
INPUT: "Tell me about education"
OUTPUT: "Education encompasses primary, secondary, and higher education levels.
In formal education, structured systems prepare individuals for the
workforce and promote social cohesion..."
[Retrieved 3 documents, 530 queries/second]
INPUT: "What year is mentioned?"
OUTPUT: "The year mentioned is 1879, specifically in the context of a fire
that destroyed the Main Building and library collection."
[Search took 1.89ms]
INPUT: "How many students are there?"
OUTPUT: "In 2014, the Notre Dame student body consisted of 12,179 students."
[Vector search: 27x faster than naive Python comparison]
That’s it. The agent:
All state is persistent. Restart your app? History is still there. Scale horizontally? Multiple instances share the same Redis.
Clear workflow visualization: You can literally draw your agent’s logic as a graph. New team member? Show them the graph. Debugging? Trace through the graph.
Easy to extend: Want to add a fact-checking step? Add a node. Want parallel retrieval from multiple sources? Add parallel entry points. Want conditional logic? Add conditional edges.
Stateful by design: LangGraph manages state flow between nodes. No global variables, no passing dictionaries through 10 functions.
Error handling: Node failed? Retry it. Want to checkpoint state? Built-in support.
Fewer lines, less complexity: Compare the approaches:
# Without Beanis (manual Redis):
# 1. Construct key manually
key = f"doc:{uuid.uuid4()}"
# 2. Serialize embedding to bytes
import struct
embedding_bytes = struct.pack(f"{len(embedding)}f", *embedding)
# 3. Create hash manually
await redis.hset(key, mapping={
"title": title,
"context": context,
"embedding": embedding_bytes
})
# 4. Create vector index manually
await redis.execute_command(
"FT.CREATE", "idx", "ON", "HASH", "PREFIX", "1", "doc:",
"SCHEMA", "embedding", "VECTOR", "HNSW", "6",
"TYPE", "FLOAT32", "DIM", "1536", "DISTANCE_METRIC", "COSINE"
)
# Total: ~15 lines per document type, error-prone
# With Beanis:
doc = KnowledgeDocument(title=title, context=context, embedding=embedding)
await doc.insert()
# Total: 2 lines, indexes created automatically
No key management: You define models. Beanis generates Redis keys. Update a document? Beanis updates the right hash and indexes.
Vector search included: Other Redis libraries? You’re writing raw FT.SEARCH commands. Beanis? Call find_by_vector_similarity.
Type safety: Pydantic validation on all fields. Try to insert invalid data? Fails before hitting Redis.
Async native: Everything is async. No blocking calls, no thread pools.
Just Redis: No RedisJSON module needed. No RediSearch setup (though it uses it). Works with vanilla Redis or Redis Stack.
You get stateful agents with persistent memory, vector search, and clean orchestration. All backed by Redis, which you’re probably already running.
Run multiple search strategies simultaneously for better results:
# Add multiple retrieval nodes
workflow.add_node("retrieve_semantic", self._retrieve_semantic) # Vector search
workflow.add_node("retrieve_keyword", self._retrieve_keyword) # Full-text search
workflow.add_node("combine_results", self._combine_results)
# Both run in parallel
workflow.set_entry_point("retrieve_semantic")
workflow.set_entry_point("retrieve_keyword")
# Merge results
workflow.add_edge("retrieve_semantic", "combine_results")
workflow.add_edge("retrieve_keyword", "combine_results")
async def _retrieve_keyword(self, state):
"""Full-text search using Redis FT.SEARCH"""
# Beanis also supports full-text search on regular fields
results = await KnowledgeDocument.find_many(
KnowledgeDocument.context.contains(state["query"]),
limit=3
)
return {**state, "keyword_results": results}
async def _combine_results(self, state):
"""Merge semantic + keyword results"""
all_docs = state["retrieved_docs"] + state["keyword_results"]
# Deduplicate and rerank
unique_docs = list({doc.id: doc for doc in all_docs}.values())
return {**state, "combined_docs": unique_docs[:5]}
LangGraph handles parallel execution automatically. This hybrid approach (semantic + keyword) often beats pure vector search, especially for technical terms or proper nouns. Learn more about Redis full-text search.
Add decision points:
def _should_search_web(self, state):
"""Decide if we need web search"""
if not state["retrieved_context"]:
return "web_search"
return "generate_response"
workflow.add_conditional_edges(
"retrieve_context",
_should_search_web,
{
"web_search": "web_search",
"generate_response": "generate_response"
}
)
Route based on state.
Save intermediate state:
class AgentCheckpoint(Document):
session_id: str
current_step: str
state_data: dict
timestamp: datetime = Field(default_factory=datetime.now)
# Save after each node
async def _checkpoint_state(self, state):
checkpoint = AgentCheckpoint(
session_id=state["session_id"],
current_step="generate_response",
state_data=state
)
await checkpoint.insert()
Restart from any point.
Benchmarked on M1 Mac with 100 documents:
The Redis operations are negligible. The bottleneck is the LLM call, which is unavoidable.
Memory: ~4KB per document with embeddings. 100 docs = ~400KB. 10K docs = ~40MB. Redis can easily handle millions.
Forgetting to initialize Beanis: You need to call init_beanis() before using document models. Do it once at app startup.
Wrong embedding dimensions: Make sure your VectorField(dimensions=...) matches your embedding model. OpenAI text-embedding-3-small is 1536 dimensions.
Not handling async properly: Everything in Beanis and LangGraph is async. Use await, run in asyncio.run(), don’t mix sync and async.
Stale conversation history: If your conversations get really long, limit what you load. Don’t pass 100 messages to the LLM - it’s expensive and slow.
Vector search returning nothing: Your query needs to be embedded with the same model you used for documents. Different model = different vector space = no matches.
Full working example: github.com/andreim14/beanis-examples/tree/main/langgraph-agent
git clone https://github.com/andreim14/beanis-examples.git
cd beanis-examples/langgraph-agent
# Install
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
# Start Redis
docker run -d -p 6379:6379 redis:latest
# Set API key
echo "OPENAI_API_KEY=your-key-here" > .env
# Ingest data
python ingest_data.py
# Run agent
python main.py
The example includes:
Good fit:
Not a fit:
Building stateful AI agents doesn’t have to be messy. LangGraph gives you clean workflow orchestration. Beanis gives you persistent state and vector search with a clean API. Together, they let you build production-ready agents in a few hundred lines of code.
No manual state management. No key construction. No serialization headaches. Just define your workflow, define your models, and write your business logic.
Everything is on GitHub. The code works. Try it.
Built with ❤️ by Andrei Stefan Bejgu - AI Applied Scientist @ SylloTips
]]>Look, I’m going to be straight with you: most vision-language models are living in an ImageNet bubble, and Concept-pedia proves it.
We built a massive dataset with 165,000+ semantically-annotated concepts and found something wild - models that supposedly achieve “human-level” performance on standard benchmarks completely fall apart when you test them on real-world visual diversity.
What we’re releasing:
The bottom line: If your ImageNet accuracy is 80% but your Concept-10k score is 45%, you don’t have a general vision model - you have an ImageNet classifier. Time to fix that.
For over a decade, ImageNet has been the gold standard for computer vision. Its 1,000 categories became THE benchmark everyone optimized for.
But here’s the thing: the real world doesn’t have just 1,000 visual concepts.
Try asking state-of-the-art models about concepts outside ImageNet’s distribution and watch what happens. That model bragging about 85% ImageNet accuracy? It’ll confidently tell you a Bombay cat is just a “black cat” and an Allen wrench is a “screwdriver.” Not great when you’re building real applications.
We’re not talking about obscure edge cases here. These are everyday objects that humans recognize instantly. The problem? The entire field has been optimizing for a test that doesn’t reflect reality.
I’m thrilled to share our paper “Concept-pedia: A Wide-coverage Semantically-annotated Multimodal Dataset”, published at EMNLP 2025 - the Conference on Empirical Methods in Natural Language Processing.
Unlike most datasets that just throw images and labels together, we built Concept-pedia on top of BabelNet - the world’s largest multilingual semantic network. What does that mean practically? Every single concept comes with definitions, relationships to other concepts, and support for multiple languages. It’s not “here’s a picture, here’s what we think it is” - it’s “here’s a concept that exists in a web of human knowledge, and here’s what it looks like.”
And we’re not talking about 1,000 ImageNet categories repeated in different poses. We have concepts ranging from specific cat breeds to architectural elements to types of pasta you’ve probably never heard of.
Creating a huge dataset is one thing. Making sure it’s actually useful? That’s different. We manually went through and curated Concept-10k - 10,000 concepts that are diverse, human-verified, and designed to test whether models actually understand visual concepts or just memorized ImageNet.
We had expert annotators verify every single image. Multiple rounds. We made sure the difficulty was balanced (mix of easy, medium, and genuinely hard examples) and that we covered the full range of semantic categories. This isn’t a toy benchmark - when models fail here, it tells you something real about their limitations.
Most vision-language datasets give you image-text pairs. Cool. We give you that PLUS the semantic relationships. Hypernymy (is-a relationships), meronymy (part-of relationships), connections to Wikipedia, WordNet, you name it.
This isn’t just for show - having this structure means you can actually reason about concepts, not just pattern match. Your model can understand that a “Bombay cat” is a type of “cat” which is a type of “feline” which is a type of “mammal.” Try doing that with CLIP trained on web-scraped captions.
Our experiments reveal a critical issue: modern vision-language models are heavily anchored to ImageNet.
When we evaluate state-of-the-art models on Concept-10k:
| Model | ImageNet Performance | Concept-10k Performance | Drop |
|---|---|---|---|
| CLIP (ViT-L/14) | 75.5% | 42.3% | -33.2% |
| ALIGN | 76.4% | 43.8% | -32.6% |
| OpenCLIP | 78.2% | 45.1% | -33.1% |
Performance drops by over 30 points when tested on diverse concepts!
Three words: we’ve been lazy. Well, not lazy exactly - but we’ve been optimizing for the wrong thing for so long that nobody questioned it.
Most vision-language models get trained on data that looks suspiciously like ImageNet. Maybe the images come from the web instead of Flickr, but the distribution? Pretty similar. Common objects. Western-centric. Same biases, bigger scale.
Then we evaluate on… ImageNet. Or benchmarks that are basically “ImageNet but slightly different.” We’ve been testing on variations of the same exam for a decade, and then acting surprised when our models can’t handle concepts outside that narrow bubble.
The real problem? Those impressive benchmark scores gave everyone a false sense of progress. “Look, we hit 85% on ImageNet!” Cool, but can your model tell a moka pot from a french press? Because my grandma can, and she’s never seen a neural network in her life.
Let’s see where models fail:
Concept: “Allen wrench” (a specific type of hex key)
Concept: “Bombay cat” (a specific cat breed)
Concept: “Takoyaki pan” (Japanese cooking equipment)
These aren’t edge cases - they’re everyday objects that humans recognize instantly.
BabelNet is massive - we’re talking about millions of concepts across hundreds of languages. But not every concept is visual. “Democracy”? Great concept, hard to photograph. So we had to filter.
We started with their full knowledge graph and pulled out concepts that actually have clear visual representations. Things you can point a camera at. That still left us with 165,000+ concepts spanning everything from animals to architecture to food to specialized tools.
The key was maintaining the semantic annotations through this process. We didn’t just want labels - we wanted the full context: definitions, relationships, multilingual mappings, connections to Wikipedia. All of it.
Finding images for 165,000 concepts isn’t trivial. We queried multiple sources for each concept, then hit them with automatic quality filters (blurry images? Gone. Watermarks everywhere? Nope.). We checked for diversity too - different angles, lighting conditions, contexts. Nobody wants a cat breed dataset where every photo is a professional studio shot.
Deduplication was huge. The internet loves copying the same image everywhere, so we had to be aggressive about catching duplicates.
For the evaluation benchmark, automation wasn’t enough. We brought in expert annotators and had them verify every single image across 10,000 concepts. Multiple rounds of review. We weren’t just checking “is this the right label?” - we were checking “is this actually a good example? Is it ambiguous? Would a human struggle with this?”
We also calibrated difficulty. Some concepts are easy (most people can spot a golden retriever). Some are hard (distinguishing between types of wrenches requires domain knowledge). The benchmark needed both.
Remember those 30+ point drops in performance? That’s not a bug, it’s the whole point. Models don’t just perform “a bit worse” on unfamiliar concepts - they completely faceplant. And here’s the kicker: the concepts they’re failing on aren’t even more visually complex than ImageNet categories. A Bombay cat isn’t harder to recognize than an Egyptian cat. The model just never learned to care about that distinction.
When we compared models that use semantic annotations vs pure vision-language pretraining, the difference was clear. Having access to the knowledge graph - understanding that concepts have relationships and hierarchies - legitimately helps with generalization.
It’s almost like… treating visual understanding as part of broader knowledge helps you understand things better? Shocking, I know.
If there’s one thing that consistently breaks modern vision models, it’s fine-grained understanding. Specific dog breeds? Nope. Different types of the same tool? Forget it. Region-specific cultural objects? Not a chance.
Medical instruments, technical equipment, subspecies of animals - these are all areas where models basically give up and output the closest generic category they know. It’s like asking someone who only studied from flashcards to handle nuance. They can’t.
I know the instinct is “just add more data” but that’s not it. We tested this. Throwing more examples of the same distribution at the problem doesn’t fix the fundamental issue.
What you need is semantic diversity, not scale. A million more images of “dog” doesn’t teach your model about specific breeds if all those images are labelled “dog.” You need the structure, the relationships, the actual understanding that different concepts exist and matter.
That CLIP model you’re using that claims 80% ImageNet accuracy? In your specific domain, it might be sitting at 45%. Or worse.
I’ve seen people deploy models in production based purely on ImageNet scores, then act shocked when the thing can’t tell medical instruments apart or consistently fails on region-specific products. Test on data that actually looks like what you’ll see in production, not the same academic benchmarks everyone else uses.
If you’re working in healthcare, industrial inspection, e-commerce with diverse products, cultural heritage - basically anything that isn’t “generic web images” - you need to assume standard models will underperform.
Fine-tuning helps, but it’s not magic. You’re still building on a foundation that fundamentally doesn’t understand fine-grained distinctions. Better approach? Start with models that have semantic grounding (like ours) or invest in seriously good domain-specific data collection.
And for the love of god, evaluate on YOUR concepts, not ImageNet. Your stakeholders don’t care if the model knows “Egyptian cat” when your actual use case needs to distinguish between different manufacturing defects.
Image-text correlation can only get you so far. When you incorporate actual semantic knowledge - hierarchies, relationships, definitions - generalization improves dramatically.
Think about it: if your model knows that “Siamese cat” is-a “cat” is-a “feline” is-a “mammal,” it can reason about things it’s never seen. Without that structure, it’s just pattern matching pixels to tokens and hoping for the best.
Concept-pedia isn’t just a dataset - it’s a different way of thinking about visual understanding.
For researchers, it means you can finally test your models on something other than ImageNet variants. 165K+ concepts spanning actual diversity. When your model fails, Concept-10k tells you exactly where and why - fine-grained categories? Cultural concepts? Specialized domains? You’ll know.
And because everything’s grounded in BabelNet, you can extend to multilingual scenarios without starting from scratch. The semantic structure is already there.
For training, the semantic annotations are the real value. Instead of just feeding models image-text pairs and hoping they figure out relationships, you can give them the structure directly. “This is a Bombay cat, which is a type of cat, which is a feline…” The hierarchy matters.
We’re expanding to 500K+ concepts for v2. We’re also working on temporal understanding (video concepts, not just static images) and spatial reasoning (3D object understanding).
We’re building an interactive evaluation platform so you can test your own models on Concept-10k without downloading everything. And we’re developing semantic-aware training methods that actually leverage the knowledge graph instead of just including it as metadata.
Look, the field spent a decade optimizing for ImageNet. Can’t blame anyone - it was the benchmark we had, and it drove real progress. But we’ve reached the point where ImageNet performance and real-world capability have diverged so much that the benchmark is actively misleading.
Concept-pedia is our push to evaluate on actual diversity, incorporate semantic knowledge instead of just pattern matching, and build for real-world deployment instead of academic leaderboards. The visual world has way more than 1,000 concepts. Our models should too.
This work was a collaborative effort:
Presented at EMNLP 2025 in Suzhou, China.
The entire Concept-pedia ecosystem is now available on Hugging Face, making it dead simple to use these models and datasets in your own projects. Whether you’re training a new vision-language model, evaluating your existing system, or just exploring the dataset, here’s everything you need to know.
We’ve released three fine-tuned SigLIP models and two comprehensive datasets:
Models (Vision-Language):
sapienzanlp/siglip-base-patch16-256-ft-concept-pedia (0.2B params) - Fast and efficientsapienzanlp/siglip-large-patch16-256-ft-concept-pedia (0.7B params) - Better accuracysapienzanlp/siglip-so400m-patch14-384-ft-concept-pedia (0.9B params) - Best performanceDatasets:
sapienzanlp/Concept-10k - Text annotations and metadata (34.3K concepts)sapienzanlp/Concept-10k-imgs - Full image dataset with visual content (4.26 GB)All models are trained on the full Concept-pedia dataset, giving them knowledge of 165K+ visual concepts beyond traditional ImageNet categories.
Here’s how to get started with zero-shot image classification using our models. This example shows you how to classify an image into one of several possible concepts:
from transformers import AutoModel, AutoProcessor
from PIL import Image
import torch
# Load the base model (fastest option)
model_name = "sapienzanlp/siglip-base-patch16-256-ft-concept-pedia"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Load your image
image = Image.open("your_image.jpg")
# Define candidate concepts - can be anything!
candidate_concepts = [
"Bombay cat",
"Persian cat",
"Siamese cat",
"Maine Coon cat",
"tabby cat"
]
# Process the inputs
inputs = processor(
text=candidate_concepts,
images=image,
return_tensors="pt",
padding=True
)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits_per_image
probs = logits.softmax(dim=1)
# Print results
print("Classification results:")
for concept, prob in zip(candidate_concepts, probs[0]):
print(f" {concept}: {prob.item():.1%}")
The beauty of this approach? You can test any visual concept you want, not just the 1,000 categories in ImageNet. Want to distinguish between types of pasta, breeds of dogs, or specific tools? Just change the candidate_concepts list.
Remember that ImageNet anchor problem? Our models were trained specifically to avoid it. Instead of optimizing for ImageNet’s 1,000 categories, we trained on the full 165K concept distribution.
This means they can actually distinguish between specific cat breeds (Bombay cat vs Persian cat vs Scottish Fold), handle specialized domains (medical equipment, industrial tools, architectural elements), recognize culturally-specific objects, and work with long-tail concepts that most models have never seen.
They’re not perfect - nothing is - but they’re substantially better at real-world diversity than models anchored to ImageNet.
The dataset comes in two flavors - one with just metadata and one with images. Here’s how to load and explore them:
from datasets import load_dataset
# Load the text/metadata dataset (lightweight)
dataset = load_dataset("sapienzanlp/Concept-10k")
# Look at the first example
example = dataset['test'][0]
print(f"Concept: {example['concept']}")
print(f"Category: {example['category']}")
print(f"Caption: {example['caption']}")
print(f"BabelNet ID: {example['bn_id']}")
Each entry includes the concept name (“Allen wrench”, “Bombay cat”, whatever), its semantic category (ARTIFACT, ANIMAL, FOOD, etc.), a natural language caption describing it, and a BabelNet ID that links it to the full knowledge graph. The image dataset adds the actual visual content.
For the full visual experience with images:
from datasets import load_dataset
from PIL import Image
# Load the image dataset
img_dataset = load_dataset("sapienzanlp/Concept-10k-imgs")
# Browse examples
for i in range(5):
example = img_dataset['train'][i]
# Access the image
img = example['jpg']
# Show or save it
img.show() # Opens in default viewer
# Or save: img.save(f"concept_{i}.jpg")
print(f"Image {i}: {example['__key__']}")
The image dataset is about 4.26 GB, so it might take a few minutes to download the first time. After that, it’s cached locally.
from transformers import AutoModel, AutoProcessor
from PIL import Image
import torch
from pathlib import Path
def find_similar_concepts(query_image_path, concept_database):
"""
Find the most similar concepts to a query image.
Args:
query_image_path: Path to query image
concept_database: List of concept names to search
Returns:
Ranked list of (concept, score) tuples
"""
# Load model
model_name = "sapienzanlp/siglip-base-patch16-256-ft-concept-pedia"
processor = AutoProcessor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Load image
image = Image.open(query_image_path)
# Process
inputs = processor(
text=concept_database,
images=image,
return_tensors="pt",
padding=True
)
# Get scores
with torch.no_grad():
outputs = model(**inputs)
scores = outputs.logits_per_image[0].softmax(dim=0)
# Rank results
results = sorted(
zip(concept_database, scores.tolist()),
key=lambda x: x[1],
reverse=True
)
return results
# Example usage
concepts = [
"espresso machine", "coffee grinder", "french press",
"moka pot", "pour over coffee maker", "cold brew maker"
]
results = find_similar_concepts("kitchen_appliance.jpg", concepts)
print("Top 3 matches:")
for concept, score in results[:3]:
print(f" {concept}: {score:.1%}")
Use Concept-10k as a benchmark to test how well your model handles diverse concepts:
from datasets import load_dataset
from tqdm import tqdm
def evaluate_on_concept10k(your_model, your_processor):
"""Evaluate any vision-language model on Concept-10k"""
# Load test set
dataset = load_dataset("sapienzanlp/Concept-10k-imgs")
test_data = dataset['train']
correct = 0
total = 0
# Group by concept for efficiency
from collections import defaultdict
concept_groups = defaultdict(list)
for i, example in enumerate(test_data):
concept = dataset['test'][i]['concept']
concept_groups[concept].append((example['jpg'], i))
# Test each concept
for concept, examples in tqdm(concept_groups.items()):
for img, idx in examples:
# Your model's prediction logic here
prediction = your_model.predict(img)
if prediction == concept:
correct += 1
total += 1
accuracy = correct / total
print(f"Accuracy on Concept-10k: {accuracy:.2%}")
return accuracy
Want to understand what’s in the dataset? Here’s a quick analysis script:
from datasets import load_dataset
from collections import Counter
import matplotlib.pyplot as plt
# Load dataset
dataset = load_dataset("sapienzanlp/Concept-10k")
test_data = dataset['test']
# Analyze categories
categories = [ex['category'] for ex in test_data]
category_counts = Counter(categories)
# Plot distribution
plt.figure(figsize=(12, 6))
plt.bar(category_counts.keys(), category_counts.values())
plt.xticks(rotation=45, ha='right')
plt.title('Concept Distribution across Categories')
plt.xlabel('Category')
plt.ylabel('Number of Concepts')
plt.tight_layout()
plt.savefig('concept_distribution.png')
# Find longest concepts
concepts = [ex['concept'] for ex in test_data]
longest = sorted(concepts, key=len, reverse=True)[:10]
print("Longest concept names:")
for i, concept in enumerate(longest, 1):
print(f" {i}. {concept} ({len(concept)} chars)")
# Category breakdown
print(f"\nTotal categories: {len(category_counts)}")
print(f"Total concepts: {len(test_data)}")
print(f"Average concepts per category: {len(test_data) / len(category_counts):.1f}")
The full Concept-10k dataset has 34,345 rows spread across 28 semantic categories. We’re talking artifacts (tools, equipment), food (dishes, ingredients, cuisines), animals (species, breeds), plants, locations, structures, people (occupations, roles), organizations, diseases, substances, media, and more. Basically everything you might actually encounter in images.
The BabelNet ID (bn_id) in each entry is your gateway to the full knowledge graph. Through that ID, you can pull semantic relationships (is-a, part-of, related-to), get definitions in dozens of languages, and connect to Wikipedia, WordNet, and other structured resources. It’s not just “here’s a label” - it’s “here’s where this concept sits in human knowledge.”
Pick your model based on what you actually need. The base model (0.2B params) is fast enough for real-time stuff, the large model (0.7B) gives you better accuracy for production, and the SO400M model (0.9B) is when you need the absolute best performance and don’t care about inference speed.
The text dataset is tiny (few MB), downloads instantly. The image dataset is 4.26 GB, so first download takes a minute. If you’re memory-constrained, stream it:
# Stream large dataset without downloading everything
dataset = load_dataset("sapienzanlp/Concept-10k-imgs", streaming=True)
# Process in batches
from itertools import islice
batch_size = 100
for batch in islice(dataset['train'], 0, batch_size):
# Process batch
pass
For inference: batch your images together, use GPU if you have one (model.to("cuda")), and for god’s sake cache your processor and model instead of reloading them every time.
Stop using ImageNet-trained models for everything. Seriously. Here’s when Concept-pedia is the better choice:
Your domain isn’t well-covered by ImageNet: Building a medical diagnosis tool? Industrial quality inspection system? Cultural heritage preservation app? ImageNet won’t cut it.
You need fine-grained recognition: If distinguishing between a Golden Retriever and a Labrador matters, or you need to tell apart a cappuccino from a flat white, you need fine-grained understanding.
You want actual zero-shot capability: Not “zero-shot on similar stuff to training data” but real zero-shot - throw any concept at it and get reasonable results.
You’re building multilingual systems: BabelNet integration means your visual concepts come with multilingual support out of the box.
You care about real-world diversity: ImageNet is super Western-centric. If you’re building for global users, you need concepts from different cultures.
You want semantic grounding: Connecting visual concepts to knowledge graphs unlocks explainability, reasoning, and integration with other AI systems.
Pitfall 1: Testing on ImageNet after training on Concept-pedia
If you fine-tune on Concept-pedia and then evaluate on ImageNet, you might see a performance drop. That’s expected! Concept-pedia is designed for broader coverage, not ImageNet-specific optimization.
Solution: Evaluate on Concept-10k or your specific domain, not ImageNet.
Pitfall 2: Using too many candidate concepts at once
The models work best with 10-100 candidate concepts per query. If you have 10,000+ concepts, consider using a retrieval stage first.
Solution: Use semantic search or clustering to narrow down candidates before classification.
Pitfall 3: Assuming perfect accuracy on rare concepts
Even our models struggle with extremely rare or ambiguous visual concepts. They’re better than ImageNet-anchored models, but not perfect.
Solution: Use confidence thresholds and human-in-the-loop verification for critical applications.
LangChain Integration:
from langchain.tools import Tool
from transformers import AutoModel, AutoProcessor
def create_concept_classifier_tool():
model = AutoModel.from_pretrained(
"sapienzanlp/siglip-base-patch16-256-ft-concept-pedia"
)
processor = AutoProcessor.from_pretrained(
"sapienzanlp/siglip-base-patch16-256-ft-concept-pedia"
)
def classify(image_path: str, concepts: str) -> str:
# concepts should be comma-separated
concept_list = [c.strip() for c in concepts.split(',')]
# ... classification logic ...
return result
return Tool(
name="ConceptClassifier",
func=classify,
description="Classifies images into fine-grained visual concepts"
)
FastAPI Endpoint:
from fastapi import FastAPI, File, UploadFile
from typing import List
import torch
app = FastAPI()
# Load model at startup
@app.on_event("startup")
async def load_model():
app.state.model = AutoModel.from_pretrained(
"sapienzanlp/siglip-base-patch16-256-ft-concept-pedia"
)
app.state.processor = AutoProcessor.from_pretrained(
"sapienzanlp/siglip-base-patch16-256-ft-concept-pedia"
)
@app.post("/classify")
async def classify_image(
file: UploadFile = File(...),
concepts: List[str] = ["cat", "dog", "bird"]
):
# Read image
image_bytes = await file.read()
image = Image.open(BytesIO(image_bytes))
# Classify
inputs = app.state.processor(
text=concepts,
images=image,
return_tensors="pt",
padding=True
)
with torch.no_grad():
outputs = app.state.model(**inputs)
probs = outputs.logits_per_image.softmax(dim=1)[0]
results = {
concept: float(prob)
for concept, prob in zip(concepts, probs)
}
return {"predictions": results}
Everything is freely available for research and commercial use:
Hugging Face Resources:
Paper:
If you’re in research, Concept-10k gives you a benchmark that actually tests real-world generalization instead of ImageNet memorization. The semantic annotations let you train models that learn structured knowledge, not just pixel-text correlations. And when models fail, you can diagnose exactly which concept types are problematic.
If you’re building production systems, this is your reality check. Test on Concept-10k before deploying, incorporate the semantic structure if you can, and understand your model’s limitations before your users find them for you.
For the field overall, we need to shift evaluation beyond ImageNet-centric metrics. We need to integrate vision with knowledge graphs. We need to care about long-tail concepts and real-world diversity. Concept-pedia is one step in that direction.
We spent a decade building models that ace ImageNet and fail in the real world. That 30+ point performance drop on Concept-10k? That’s the gap between what we think our models can do and what they actually can do.
Concept-pedia gives you 165K+ semantically-annotated concepts for training, Concept-10k for honest evaluation, and evidence that our current approaches are way more limited than the benchmarks suggested. The semantic structure shows a path forward - combine vision with knowledge graphs instead of just scaling up image-text pairs.
All the code and data is on Hugging Face. The models are ready to use. The benchmark is waiting.
Time to build multimodal AI that actually handles real-world visual diversity, not just ImageNet variations.
If you use Concept-pedia in your research, please cite our paper:
@inproceedings{ghonim-etal-2025-conceptpedia,
title = "Concept-pedia: A Wide-coverage Semantically-annotated Multimodal Dataset",
author = "Ghonim, Karim and
Bejgu, Andrei Stefan and
Fern{\'a}ndez-Castro, Alberte and
Navigli, Roberto",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.1745/",
pages = "34405--34426",
}
Plain text citation:
Karim Ghonim, Andrei Stefan Bejgu, Alberte Fernández-Castro, and Roberto Navigli. 2025. Concept-pedia: A Wide-coverage Semantically-annotated Multimodal Dataset. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34405–34426, Suzhou, China. Association for Computational Linguistics.
Published at EMNLP 2025 - Conference on Empirical Methods in Natural Language Processing, Suzhou, China
Built with ❤️ by Andrei Stefan Bejgu - AI Applied Scientist @ SylloTips
]]>
Let’s say you’re building a food delivery app. User opens the app in downtown Rome, and they want to see Italian restaurants within 2km. Simple enough, right?
Here’s your PostgreSQL query with PostGIS (the standard geo-spatial extension for PostgreSQL):
SELECT *,
ST_Distance(location, ST_MakePoint(12.4922, 41.8902)) as distance
FROM restaurants
WHERE ST_DWithin(
location,
ST_MakePoint(12.4922, 41.8902),
2000 -- 2km in meters
)
AND cuisine = 'italian'
AND rating >= 4.5
ORDER BY distance
LIMIT 20;
This query takes 750 milliseconds. On a modern database server. With indexes.
“750ms isn’t that bad,” you might think. Let me tell you why you’re wrong.
Your app has 10,000 concurrent users during lunch rush. Each one opening the app, scrolling around, changing filters. That’s not 10,000 queries - that’s more like 50,000 queries in a few minutes as users pan the map and adjust their search.
PostgreSQL can handle maybe 150 of these geo-spatial queries per second on decent hardware. You need 830 queries per second. Your database is now 5.5x overloaded, CPU pinned at 100%, queries timing out, and users seeing that spinning loader that makes them switch to a competitor.
The problem? PostGIS calculations are computationally expensive. For every single query, it’s:
It’s doing all this math in real-time, from scratch, every single time. Your database server’s CPU is melting just to tell someone that La Carbonara is 450 meters away.
Sure. You could throw money at it. Add read replicas. Maybe shard by geography. Get bigger servers with more CPU cores.
At peak load, you’d need roughly 500 database connections running geo-spatial calculations simultaneously. That’s not cheap. And you’re still doing the same expensive calculations over and over for queries that barely change (restaurant locations don’t move much).
There’s a better way, and it doesn’t involve explaining to your CTO why the database bill is suddenly five figures a month.
Cache the hot queries in Redis. Not the data - the actual geo-spatial indexes.
Redis has built-in geo-spatial commands (GEOADD, GEORADIUS) that are specifically designed for this. They pre-compute the indexes, store them in memory, and can serve 10,000+ geo queries per second on a single instance.
Here’s what changes:
PostgreSQL becomes your source of truth (persistent, reliable, handles writes), and Redis becomes your speed layer (ephemeral, fast, handles 99% of reads).
The cache misses? Sure, they still take 750ms while you populate Redis. But once the cache is warm (which happens quickly), your database can go back to doing what it’s good at - handling transactions and complex queries - instead of calculating the same distances a million times a day.
Here’s the basic architecture - it’s simpler than you might think:
OpenStreetMap API
↓ (import once)
PostgreSQL
↓ (sync to cache)
Redis Cache
↓ (serve queries)
Your Users
PostgreSQL is your source of truth. It has all the restaurant data, handles writes, maintains referential integrity - all the stuff databases are good at.
Redis sits in front as a cache layer. When you import restaurant data, you push it to Redis and create geo-spatial indexes. When users query “restaurants near me,” you hit Redis first. 12ms response time, no database load.
The magic? Redis’s GEOADD and GEORADIUS commands. They’re specifically built for this use case. You give Redis a set of coordinates, and it pre-computes geohashes and stores them in a way that makes radius queries blazing fast. No expensive spherical distance calculations at query time - it’s all pre-indexed.
When you save a restaurant to Redis through Beanis, it automatically:
When a user queries nearby restaurants:
If Redis doesn’t have the data (cache miss), you fall back to PostgreSQL, get the results in 750ms, cache them in Redis, and serve them. Next request for that area? 12ms.
This tutorial shows you how to:
GeoPoint type to handle geo-spatial indexing automaticallyThe key insight: Redis isn’t trying to be your database. It’s your speed layer. PostgreSQL can rebuild the cache anytime, so you don’t need to worry about Redis durability. You’re trading a bit of staleness (cached data might be a few seconds old) for massive performance gains.
Here’s the essential Beanis code for caching restaurant data:
from beanis import Document, Indexed, GeoPoint
from beanis.odm.indexes import IndexedField
from typing_extensions import Annotated
class RestaurantCache(Document):
"""Redis cache model - mirrors PostgreSQL data"""
# Source tracking
db_id: Indexed(int) # Link to PostgreSQL
name: str
# ⭐ The magic: Geo-spatial index
location: Annotated[GeoPoint, IndexedField()]
# This automatically creates Redis GEORADIUS index!
# Indexed fields for fast filtering
cuisine: Indexed(str) # Creates sorted set
rating: Indexed(float) # Creates sorted set
price_range: Indexed(int) # Creates sorted set
is_active: Indexed(bool) # Creates sorted set
# Other fields (not indexed)
address: str = ""
phone: Optional[str] = None
cached_at: datetime = Field(default_factory=datetime.now)
class Settings:
name = "RestaurantCache"
What Beanis does automatically:
location: Annotated[GeoPoint, IndexedField()] → Creates GEOADD index in RedisIndexed(str/int/float/bool) → Creates sorted sets for filteringDocument → Handles serialization and Redis hash storageThe key Beanis operation - saving to Redis:
from beanis import GeoPoint
# Create and save to Redis
restaurant = RestaurantCache(
db_id=123,
name="La Carbonara",
location=GeoPoint(latitude=41.8933, longitude=12.4829), # ⭐ Geo-spatial index
cuisine="italian",
rating=4.5,
price_range=2,
is_active=True
)
await restaurant.insert() # Saves to Redis with all indexes
What Beanis does behind the scenes:
RestaurantCache:123 → {name: "La Carbonara", ...}GEOADD RestaurantCache:location 12.4829 41.8933 "123"RestaurantCache:idx:cuisine:italian → {123, ...}RestaurantCache:idx:rating → {(123, 4.5), ...}RestaurantCache:idx:price_range → {(123, 2), ...}The core Beanis feature - finding nearby restaurants:
from beanis.odm.indexes import IndexManager
# ⭐ Query Redis geo-spatial index
results_with_distance = await IndexManager.find_by_geo_radius_with_distance(
redis_client=redis_client,
document_class=RestaurantCache,
field_name="location",
longitude=12.4922,
latitude=41.8902,
radius=2.0, # 2km
unit="km"
)
# Returns: [(doc_id, distance_km), ...]
# Example: [("123", 0.45), ("456", 1.2), ("789", 1.8)]
# Get full documents
for doc_id, distance in results_with_distance:
restaurant = await RestaurantCache.get(doc_id)
print(f"{restaurant.name}: {distance:.2f}km away")
What this does:
GEORADIUS command internallyWith filters:
# Fetch and filter
results = []
for doc_id, distance in results_with_distance:
doc = await RestaurantCache.get(doc_id)
# Filter using indexed fields (fast in-memory)
if doc.cuisine == "italian" and doc.rating >= 4.5:
results.append((doc, distance))
Initialize Beanis on application startup:
from fastapi import FastAPI
from beanis import init_beanis
import redis.asyncio as redis
app = FastAPI()
@app.on_event("startup")
async def startup():
# Connect to Redis
redis_client = redis.Redis(
host="localhost",
port=6379,
decode_responses=True
)
# Initialize Beanis with your document models
await init_beanis(
database=redis_client,
document_models=[RestaurantCache] # ⭐ Register your models
)
That’s it! Beanis will:
I tested this with a real Paris dataset - 2,600+ restaurants imported from OpenStreetMap. Here’s what actually happened (not theoretical numbers, actual measurements):
PostgreSQL does its thing: 850ms to calculate distances and sort results. Then we cache those results in Redis (adds another 120ms). Total: ~970ms. Not great, but it only happens once per cache region.
PostgreSQL would still take 750ms every time (it has to recalculate everything). Redis? 12ms. Same query, 62x faster.
Add a cuisine filter and minimum rating. PostgreSQL now takes 820ms (more work to do). Redis? 15ms. It’s using pre-computed sorted sets for the filters, so barely any extra work.
Tested with just 1,030 restaurants. PostgreSQL: 380ms (half the data, but still expensive calculations). Redis: 8ms.
This is where it gets fun. Simulated 100 users each making 10 queries (1,000 total queries):
Once the cache is warm, hit rate sits around 99.8%. Nearly every query is served from Redis at 12-15ms. The database? It’s basically idle. CPU usage dropped from 100% to like 1% for the occasional cache refresh.
Throughput went from ~150 req/sec (PostgreSQL bottleneck) to 10,000+ req/sec (limited by network and application code, not Redis).
The gap only gets worse as your dataset grows. With 50,000 restaurants, PostgreSQL queries would be 1.5-2 seconds. Redis? Still 12-15ms. The pre-computed indexes don’t care about dataset size nearly as much.
So your cache is fast, but what happens when restaurant data changes? You’ve got a few options depending on your needs.
Just set a TTL and forget about it. Every cached restaurant gets a timestamp. If the cache is older than, say, an hour, refresh it from PostgreSQL:
async def get_with_ttl(restaurant_id: int, max_age: int = 3600):
"""Refresh cache if older than 1 hour"""
cached = await RestaurantCache.find_one(db_id=restaurant_id)
if cached and not cached.is_stale(max_age):
return cached # Cache fresh
# Refresh from Postgres
db_restaurant = db.query(RestaurantDB).get(restaurant_id)
# Update cache
if cached:
cached.cached_at = datetime.now()
# Update other fields...
await cached.save()
else:
# Create new cache entry
pass
return cached
This works great for data that doesn’t change often. Restaurant info? Rarely changes. Locations? Never change. Ratings? Maybe update hourly. You can live with slight staleness here.
If you need fresher data, update both PostgreSQL and Redis at the same time when something changes:
async def update_restaurant(restaurant_id: int, updates: dict):
"""Update both Postgres and Redis atomically"""
# Update Postgres (source of truth)
db_restaurant = db.query(RestaurantDB).get(restaurant_id)
for key, value in updates.items():
setattr(db_restaurant, key, value)
db.commit()
# Update cache immediately
cached = await RestaurantCache.find_one(db_id=restaurant_id)
if cached:
for key, value in updates.items():
setattr(cached, key, value)
cached.cached_at = datetime.now()
await cached.save()
print(f"✅ Cache updated for restaurant {restaurant_id}")
This keeps your cache fresh at the cost of slightly slower writes (you’re hitting two systems). But reads are still blazing fast, and you never serve stale data.
Sometimes you just want to blow away the cache entry and let it rebuild naturally on the next read:
async def delete_restaurant(restaurant_id: int):
"""Delete from both Postgres and Redis"""
# Delete from Postgres
db.query(RestaurantDB).filter_by(id=restaurant_id).delete()
db.commit()
# Invalidate cache
cached = await RestaurantCache.find_one(db_id=restaurant_id)
if cached:
await cached.delete()
print(f"🗑️ Cache invalidated for restaurant {restaurant_id}")
This is simple and safe. Next read will be slower (cache miss), but the cache rebuilds itself automatically. Works great for deletions or when you’re not sure what changed.
Which one should you use? Depends on your use case. For restaurant data, I’d start with time-based expiration (hourly refresh) and only add write-through if you’re seeing complaints about stale data. The nuclear option is fine for deletions and rare updates.
User opens your app near the Colosseum in Rome and searches for Italian restaurants within 3km:
GET /restaurants/nearby?lat=41.8902&lon=12.4922&radius=3&cuisine=italian&min_rating=4.5
Your app hits Redis first. Behind the scenes, Beanis runs:
GEORADIUS to find all restaurants within 3km (~4ms)Total: 8ms. Your API adds some overhead (JSON serialization, HTTP), so the user sees ~12-15ms response time. The database? Didn’t do anything.
{
"total": 12,
"restaurants": [
{
"id": 4521,
"name": "La Carbonara",
"cuisine": "italian",
"rating": 4.8,
"distance_m": 145
},
...
]
}
Redis doesn’t have data for this area yet. No problem:
Total: ~870ms. Not great, but it only happens once for each geographic area. Every subsequent query for that area hits the cache at 12ms.
The user might notice the first query is slower, but they’re comparing it to “no results yet” (fresh app open), so it’s fine. After that? Everything is instant.
Add metrics to track cache effectiveness:
from prometheus_client import Counter, Histogram
cache_hits = Counter('cache_hits_total', 'Number of cache hits')
cache_misses = Counter('cache_misses_total', 'Number of cache misses')
response_time = Histogram('response_time_seconds', 'Response time')
async def find_nearby_with_metrics(...):
start = time.time()
results = await RestaurantCache.find_near(**query_params).to_list()
if results:
cache_hits.inc()
else:
cache_misses.inc()
response_time.observe(time.time() - start)
return results
Track:
For production monitoring, consider using Redis monitoring tools and set up alerts for cache hit rates and memory usage.
Using Beanis as a Redis cache layer over PostgreSQL turns a 750ms geo query into a 12ms lookup. That’s the difference between an app that feels sluggish and one that feels instant.
PostgreSQL stays as your source of truth - handles writes, maintains data integrity, can rebuild the cache whenever needed. Redis sits in front as your speed layer - serves 99% of reads, handles massive concurrency, keeps your database idle.
The pattern works because restaurant data doesn’t change much. Locations never move. Ratings update occasionally. You can afford to have slightly stale cache data (refresh hourly or on writes) in exchange for 62x performance improvement.
Remember: Redis is your cache, not your database. Don’t try to make it persistent. Don’t worry about durability. If Redis crashes, PostgreSQL rebuilds it. That’s the whole point - you get speed without sacrificing reliability.
And Beanis handles all the messy Redis commands for you. You just define a model with GeoPoint, call find_by_geo_radius, and it works. No manual GEOADD or GEORADIUS commands. No serialization headaches. Just fast geo queries with a clean Python API.
Full working example:
Quick start:
# Clone the repo
git clone https://github.com/andreim14/beanis-examples.git
cd beanis-examples/restaurant-finder
# Install dependencies
pip install -r requirements.txt
# Start databases with Docker
docker run -d --name restaurant-postgres -e POSTGRES_USER=restaurant_user -e POSTGRES_PASSWORD=restaurant_pass -e POSTGRES_DB=restaurant_db -p 5432:5432 postgis/postgis:15-3.3
docker run -d --name restaurant-redis -p 6379:6379 redis:7-alpine
# Start the API
python main.py
# Import sample data (Paris)
curl -X POST "http://localhost:8000/import/area?lat=48.8584&lon=2.2945&radius_km=5"
# Run the interactive demo
python demo.py
The demo includes:
Questions? Drop a comment below! 🚀
Built with ❤️ by Andrei Stefan Bejgu - AI Applied Scientist @ SylloTips
]]>Word Sense Linking automatically identifies ambiguous words in text and links them to their correct meanings. It’s like having a semantic understanding layer for your NLP pipeline.
Use it for:
Install and run in 3 lines:
from wsl import WSL
model = WSL.from_pretrained("Babelscape/wsl-base")
result = model("The bank can guarantee deposits will cover tuition.")
# Automatically knows: bank=financial institution, cover=pay for
Read this sentence:
“The bank can guarantee deposits will eventually cover future tuition costs.”
What does “bank” mean? Financial institution or riverbank? What about “cover”? Pay for, or physically cover?
Humans get this instantly from context. But how do we teach machines to do the same?
I’m excited to share our paper “Word Sense Linking: Disambiguating Outside the Sandbox”, published at ACL 2024 (Findings) - the Association for Computational Linguistics, the top venue in NLP research.
Word Sense Disambiguation (WSD) has been around for decades. The benchmark scores look great. But when you try to actually use it in a real application, you hit a wall.
The problem? Traditional WSD operates in a sandbox. It assumes you’ve already done most of the hard work:
First, you need to manually identify which words in your text are ambiguous. In our example sentence, you’d have to mark “bank” and “cover” as words that need disambiguation. Already a problem - how do you know which words are ambiguous without understanding context?
Second, you need to provide candidate meanings. You have to tell the system “bank could mean: financial institution, riverbank, or slope” and “cover could mean: pay for, place over, include, or protect.” Again, this assumes you already know what senses exist for each word.
Only then does WSD do its job - picking the right sense from your provided list.
It’s like having a translator who needs you to first identify which words need translating, then give them a dictionary of possible translations, and only then they’ll pick the right one. If you can do all that, you probably don’t need the translator.
Let’s see this in action with: “The bank can guarantee deposits will cover tuition.”
Traditional WSD requires:
# Input with pre-marked spans and candidates
{
"text": "The bank can guarantee deposits will cover tuition.",
"spans": [
{"text": "bank", "start": 4, "end": 8}, # You mark this
{"text": "cover", "start": 41, "end": 46} # And this
],
"candidates": {
"bank": ["financial institution", "riverbank", "slope"], # You provide these
"cover": ["pay for", "place over", "include", "protect"] # And these
}
}
# WSD picks: bank → financial institution, cover → pay for
WSL needs only:
# Just raw text!
text = "The bank can guarantee deposits will cover tuition."
# WSL automatically:
# 1. Identifies ALL ambiguous spans (bank, guarantee, deposits, cover, tuition)
# 2. Retrieves candidates from WordNet
# 3. Links each to correct sense
The key difference: WSD assumes you know what to disambiguate. WSL figures it out.
This makes WSL practical for real applications where you don’t have annotated data or pre-identified spans.
Our solution: Word Sense Linking (WSL) removes these constraints.
WSL does two things automatically:
No manual preprocessing. No candidate provision. Just real text in, meanings out.
| Model | Precision | Recall | F1 |
|---|---|---|---|
| ConSeC (previous SOTA) | 80.4% | 64.3% | 71.5% |
| WSL (our model) | 75.2% | 76.7% | 75.9% |
We achieve 4.4% improvement on the ALL_FULL benchmark, with significantly better recall - we find more correct senses.
The best part? We released it as an easy-to-use Python library.
pip install git+https://github.com/Babelscape/WSL.git
from wsl import WSL
# Load pre-trained model
wsl_model = WSL.from_pretrained("Babelscape/wsl-base")
# Disambiguate!
result = wsl_model("Bus drivers drive busses for a living.")
WSLOutput(
text='Bus drivers drive busses for a living.',
spans=[
Span(
start=0, end=11,
text='Bus drivers',
label='bus driver: someone who drives a bus'
),
Span(
start=12, end=17,
text='drive',
label='drive: operate or control a vehicle'
),
Span(
start=18, end=24,
text='busses',
label='bus: a vehicle carrying many passengers'
),
Span(
start=31, end=37,
text='living',
label='living: the financial means whereby one lives'
)
]
)
Notice: It automatically:
Query: “python tutorial”
WSL disambiguates: Programming language? Or the snake?
result = wsl_model("Looking for python tutorial")
# Identifies: "python: a high-level programming language"
Better search results, better user experience.
Context-aware filtering:
wsl_model("The wedding shooting was beautiful")
# "shooting: the act of making a photograph"
wsl_model("There was a shooting downtown")
# "shooting: the act of firing a projectile"
Same word, different meanings, different moderation actions.
Improve retrieval for RAG:
from wsl import WSL
wsl_model = WSL.from_pretrained("Babelscape/wsl-base")
# User query with ambiguous terms
query = "What's the best bank for deposits?"
# Disambiguate before retrieval
result = wsl_model(query)
# Now you know it's about financial institutions
# Not riverbanks!
Better semantic understanding = better retrieval = better LLM responses.
Extract entities with precise meanings:
text = "Apple released a new chip for their computers"
result = wsl_model(text)
# "Apple: a major American tech company" (not the fruit!)
# "chip: electronic equipment consisting of a small circuit" (not food!)
Build more accurate knowledge graphs automatically.
WSL uses a retriever-reader architecture:
Both components are transformer-based and trained end-to-end.
The model learns to:
The key difference comes down to what you need to provide as input.
Traditional WSD needs you to do the hard parts first: identify ambiguous words, provide candidate senses, and prepare your text in a specific format. It then picks the right sense from your candidates. This works fine for academic benchmarks where everything is pre-annotated, but it’s impractical for real applications where you’re processing arbitrary text.
Word Sense Linking does all of that automatically. You throw raw text at it - anything from tweets to research papers - and it figures out which spans are ambiguous, retrieves candidate senses from WordNet, and links everything to the correct meanings. No preprocessing, no manual annotation, no candidate lists.
That’s why we’re seeing actual adoption in production systems. WSL doesn’t require you to build annotation pipelines or maintain sense inventories. It just works on whatever text you have.
Let’s disambiguate a complex sentence:
from wsl import WSL
wsl_model = WSL.from_pretrained("Babelscape/wsl-base")
text = """
The bank can guarantee deposits will eventually
cover future tuition costs because they understand
the financial burden.
"""
result = wsl_model(text)
# Print identified spans with definitions
for span in result.spans:
print(f"{span.text:20} → {span.label}")
Output:
bank → bank: a financial institution
guarantee → guarantee: give surety or assume responsibility
deposits → deposit: money given as security
cover → cover: be sufficient to meet or pay for
tuition → tuition: a fee paid for instruction
financial → financial: involving or relating to money
burden → burden: an onerous or difficult concern
Every ambiguous term correctly disambiguated!
Here’s how to use WSL to improve search:
from wsl import WSL
from typing import List, Dict
class SemanticSearch:
def __init__(self):
self.wsl = WSL.from_pretrained("Babelscape/wsl-base")
def understand_query(self, query: str) -> Dict:
"""Disambiguate query terms for better search"""
result = self.wsl(query)
return {
"original_query": query,
"disambiguated_terms": [
{
"term": span.text,
"sense": span.label,
"start": span.start,
"end": span.end
}
for span in result.spans
]
}
# Usage
search = SemanticSearch()
analysis = search.understand_query("Looking for python courses for machine learning")
print(analysis)
# Knows it's about programming, not snakes!
Model Size: ~400MB (base model) Inference Speed: ~100-200ms per sentence (GPU) Memory: ~2GB GPU RAM
For production:
Traditional WSD achieved high benchmark scores but couldn’t escape the lab.
Word Sense Linking makes lexical semantics practical:
We’re bridging the gap between research and real-world applications.
This work was done in collaboration with:
Presented at ACL 2024 in Bangkok, Thailand.
Ready to add semantic understanding to your NLP pipeline?
# Install
pip install git+https://github.com/Babelscape/WSL.git
# Use
from wsl import WSL
model = WSL.from_pretrained("Babelscape/wsl-base")
result = model("Your text here")
We’re working on:
Word Sense Linking solves a decades-old problem: making word sense disambiguation practical.
No more sandboxes. No more manual preprocessing. Just real text in, precise meanings out.
Whether you’re building search engines, content moderation systems, or RAG applications - understanding what words really mean in context is crucial.
And now it’s as simple as:
from wsl import WSL
model = WSL.from_pretrained("Babelscape/wsl-base")
result = model("Your ambiguous text here")
Give it a try and let me know what you build with it!
If you use Word Sense Linking in your research or applications, please cite our paper:
@inproceedings{bejgu-etal-2024-wsl,
title = "Word Sense Linking: Disambiguating Outside the Sandbox",
author = "Bejgu, Andrei Stefan and
Barba, Edoardo and
Procopio, Luigi and
Fern{\'a}ndez-Castro, Alberte and
Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.851/",
}
Plain text citation:
Andrei Stefan Bejgu, Edoardo Barba, Luigi Procopio, Alberte Fernández-Castro, and Roberto Navigli. 2024. Word Sense Linking: Disambiguating Outside the Sandbox. In Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand. Association for Computational Linguistics.
Published at ACL 2024 (Findings) - Association for Computational Linguistics, Bangkok, Thailand
Built with ❤️ by Andrei Stefan Bejgu - AI Applied Scientist @ SylloTips
]]>You’re building an AI app that needs semantic search for RAG. Everyone on Twitter is telling you to use Pinecone ($70/month minimum, plus 100+ lines of boilerplate). Or Weaviate (yet another service to manage and monitor). Or pgvector (slow queries and complex tuning).
Meanwhile, you already have Redis running. You’re using it for caching, session storage, maybe job queues. It’s just sitting there, being fast and reliable.
Here’s what most people don’t realize: Redis is also a vector database. And if you’re already running it, you’re paying for vector search capability whether you use it or not.
Use Beanis - a Redis ODM with built-in vector search.
The entire RAG system:
# models.py (14 lines)
from beanis import Document, VectorField
from typing import List
from typing_extensions import Annotated
class KnowledgeBase(Document):
text: str
embedding: Annotated[List[float], VectorField(dimensions=1024)]
class Settings:
name = "knowledge"
# ingest.py (20 lines)
from transformers import AutoModel
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v4')
async def ingest_text(text: str):
embedding = model.encode([text])[0].tolist()
doc = KnowledgeBase(text=text, embedding=embedding)
await doc.insert()
# search.py (15 lines)
from beanis.odm.indexes import IndexManager
async def search(query: str):
query_emb = model.encode([query])[0].tolist()
results = await IndexManager.find_by_vector_similarity(
redis_client, KnowledgeBase, "embedding", query_emb, k=5
)
return [await KnowledgeBase.get(doc_id) for doc_id, score in results]
That’s it. That’s the entire RAG system.
First, vector indexes are now created automatically when you call init_beanis(). No manual Redis commands. No setup scripts. Just define your model with VectorField() and you’re done.
Let’s be real: Pinecone is a great product. But it’s solving a problem you might not have.
Pinecone makes sense if you’re doing massive-scale vector search across billions of documents, need global replication, want managed infrastructure with SLAs, and have the budget for it. If that’s you, use Pinecone.
But most apps don’t need that. You’ve got maybe 10K-1M documents. You already run Redis. You’re okay with self-hosting. And you really don’t want another monthly bill.
Here’s what changes when you use Beanis + Redis instead:
The trade-off? You’re self-hosting. If that’s scary, stick with Pinecone. If you’re already running Redis and don’t mind managing it, this approach is simpler and cheaper.
Pinecone (verbose):
# Setup
import pinecone
pinecone.init(api_key=os.getenv("PINECONE_API_KEY"), environment="us-west1-gcp")
index = pinecone.Index("my-index")
# Upsert (complex)
vectors = [(str(i), embedding, {"text": text}) for i, (text, embedding) in enumerate(docs)]
index.upsert(vectors=vectors, namespace="docs")
# Search (multiple steps)
query_response = index.query(
vector=query_embedding,
top_k=5,
namespace="docs",
include_metadata=True
)
results = [match['metadata']['text'] for match in query_response['matches']]
# ~100+ lines for production setup
Beanis (clean):
# Setup
doc = KnowledgeBase(text=text, embedding=embedding)
await doc.insert()
# Search
results = await IndexManager.find_by_vector_similarity(
redis_client, KnowledgeBase, "embedding", query_embedding, k=5
)
# ~50 lines total
pip install beanis transformers redis
Just 3 packages. No complex setup, no account creation.
docker run -d -p 6379:6379 redis/redis-stack:latest
Use redis-stack (includes RediSearch module for vector search).
from beanis import Document, VectorField
from typing import List
from typing_extensions import Annotated
class KnowledgeBase(Document):
text: str
embedding: Annotated[List[float], VectorField(dimensions=1024)]
class Settings:
name = "knowledge"
14 lines. That’s your entire data model. The VectorField() tells Beanis to automatically create a vector index with HNSW algorithm for lightning-fast similarity search.
Vector indexes are created automatically - no manual setup needed!
from transformers import AutoModel
import redis.asyncio as redis
from beanis import init_beanis
# Load open-source embedding model (no API key!)
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v4', trust_remote_code=True) # https://huggingface.co/jinaai/jina-embeddings-v4
async def ingest_text(text: str):
# Generate embedding
embedding = model.encode([text])[0].tolist()
# Store in Redis
doc = KnowledgeBase(text=text, embedding=embedding)
await doc.insert()
print(f"✓ Indexed: {text[:50]}...")
# Initialize
redis_client = redis.Redis(decode_responses=True)
await init_beanis(database=redis_client, document_models=[KnowledgeBase])
# Ingest your documents
texts = ["Redis is fast", "Python is great", "Beanis is simple"]
for text in texts:
await ingest_text(text)
20 lines. Documents are now searchable. Vector indexes were created automatically!
from beanis.odm.indexes import IndexManager
async def search(query: str, k: int = 5):
# Embed query
query_embedding = model.encode([query])[0].tolist()
# Search!
results = await IndexManager.find_by_vector_similarity(
redis_client=redis_client,
document_class=KnowledgeBase,
field_name="embedding",
query_vector=query_embedding,
k=k
)
# Get documents
docs = []
for doc_id, similarity_score in results:
doc = await KnowledgeBase.get(doc_id)
docs.append((doc.text, similarity_score))
return docs
# Search
results = await search("what is semantic search?")
for text, score in results:
print(f"{score:.3f}: {text}")
15 lines. Semantic search working.
Let’s say you’re building a documentation search. User asks:
Query: “how to cancel my subscription?”
Traditional keyword search: ❌ No results (docs say “termination policy”)
Semantic search with Beanis: ✅ Finds:
Why? Vector embeddings understand meaning, not just keywords.
I benchmarked this against the usual suspects with 10,000 documents (real measurements, not marketing numbers):
Beanis + Redis: 15ms queries, ~50 lines of code, Docker run for setup, $0 incremental cost.
Pinecone: 40ms queries (network latency kills you), 100+ lines of setup code, API key dance, $70+/month.
Weaviate: 35ms queries, another service to deploy and monitor, 80+ lines of code, self-hosting overhead.
pgvector: 200ms queries (PostgreSQL isn’t optimized for vector search), 60+ lines of code, need to tune indexes carefully.
Beanis wins on speed (local Redis beats API calls) and simplicity (ODM pattern means less code). The only thing you lose is managed infrastructure - if that matters, Pinecone might be worth it.
Here’s the thing: if you’re running a modern web app, you’re probably already using Redis. Caching, session storage, job queues, rate limiting - Redis does all of it.
Now you can add vector search to that list. Same service, same monitoring, same deployment pipeline. No new infrastructure to learn.
Before, your architecture looked like this:
After:
That’s one fewer service to monitor, deploy, and pay for. Your vectors sit right next to your cache, so queries are faster (data locality). And when you’re debugging at 2 AM, you only need to check two services instead of three.
The cost savings alone ($70-500/month depending on Pinecone tier) probably justify the few hours it takes to set this up. And operationally, it’s just simpler. Fewer dashboards to check, fewer alerts to configure, fewer things that can break.
Once you’ve got the basics working, there are some cool extensions worth knowing about.
Multimodal Search: Jina v4 can embed both text and images into the same vector space. This means you can search for images using text queries, or find relevant text using an image. It’s the same model.encode() API, just pass an image instead:
from PIL import Image
# Search with text
text_emb = model.encode(["red sports car"])[0].tolist()
results = await IndexManager.find_by_vector_similarity(...)
# Search with image
img = Image.open("car.jpg")
img_emb = model.encode_image([img])[0].tolist()
results = await IndexManager.find_by_vector_similarity(...)
Both queries work against the same index. Pretty wild.
Hybrid Search: You can combine vector similarity with traditional filters. Add indexed fields to your model and filter before or after the vector search:
class KnowledgeBase(Document):
text: str
embedding: Annotated[List[float], VectorField(dimensions=1024)]
category: Indexed(str) # Filter by category
date: datetime
language: Indexed(str) # Filter by language
This lets you do things like “find similar documents, but only in English” or “semantic search within the ‘documentation’ category.”
Production Tuning: When you’re ready to scale, Redis Cluster handles sharding automatically, and you can tune the HNSW algorithm parameters for your recall/speed trade-off:
VectorField(
dimensions=1024,
algorithm="HNSW",
m=32, # More connections = better recall, more memory
ef_construction=400 # Higher = better index quality, slower indexing
)
Start with the defaults. Only tune if you’re seeing issues.
Yes. Use redis-stack (includes RediSearch module) or install RediSearch manually. Regular Redis doesn’t have vector search.
Yes! Just swap the model:
import openai
embedding = openai.Embedding.create(input=text, model="text-embedding-3-small")
But Jina v4 is free, faster, and runs locally.
Redis can handle millions of vectors. With proper sharding (Redis Cluster), billions.
Memory usage: ~4KB per document (1024-dim vectors). 1M docs = ~4GB RAM.
# Update
doc = await KnowledgeBase.get(doc_id)
doc.text = "Updated text"
doc.embedding = new_embedding
await doc.save()
# Delete
await doc.delete()
Indexes update automatically.
Clone and run:
git clone https://github.com/andreim14/beanis-examples.git
cd beanis-examples/simple-rag
# Install
pip install -r requirements.txt
# Start Redis
docker run -d -p 6379:6379 redis/redis-stack:latest
# Ingest sample docs (vector indexes created automatically!)
python ingest.py
# Search!
python search.py "what is semantic search?"
Full working example in the repo.
If you already use Redis:
Start building:
In the next post, I’ll show you how to build a multimodal RAG system that searches PDFs, diagrams, and code screenshots using Jina v4’s vision capabilities.
Spoiler: It’s also ~50 lines of code.
Built with ❤️ by Andrei Stefan Bejgu - AI Applied Scientist @ SylloTips
]]>json.dumps(), json.loads(), and fragile string manipulation. There had to be a better way.
That’s why I built Beanis - a Redis ODM that brings the elegance of modern ORMs to Redis, without sacrificing the speed that makes Redis special.
I was working on a real-time recommendation system that needed to query thousands of products per second. Redis was the obvious choice for speed, but the code was becoming unmaintainable:
# The old way - painful and error-prone
product_key = f"Product:{product_id}"
data = await redis.hgetall(product_key)
# Manual type conversion for EVERY field
product = {
'name': data.get('name', ''),
'price': float(data.get('price', 0)) if data.get('price') else 0.0,
'stock': int(data.get('stock', 0)) if data.get('stock') else 0,
'tags': json.loads(data.get('tags', '[]')),
'metadata': json.loads(data.get('metadata', '{}')),
}
# And that's just reading ONE document!
I wanted to write code that looked like this instead:
# The Beanis way - clean and type-safe
product = await Product.get(product_id)
Spoiler: I made it happen. And it’s only 8% slower than raw Redis.
I’ve spent years working with databases in AI/ML projects. I love MongoDB’s ODMs like Beanie - the clean API, Pydantic integration, and how they let you focus on business logic instead of CRUD boilerplate. But when you need Redis-level performance, you’re stuck with manual serialization and key management.
The existing Redis libraries weren’t cutting it:
I wanted something that combined:
When I couldn’t find it, I built it. Beanis is what I wish existed when I started working with Redis.
Let’s build something real: a product catalog for an e-commerce platform. You need:
With raw redis-py, you’d write something like this for a single product insert:
import json
import redis.asyncio as redis
async def create_product(redis_client, product_data):
# Generate unique ID
product_id = await redis_client.incr("product:id:counter")
product_key = f"Product:{product_id}"
# Manually serialize complex types
redis_data = {
'id': str(product_id),
'name': product_data['name'],
'price': str(product_data['price']),
'category': product_data['category'],
'stock': str(product_data['stock']),
'tags': json.dumps(product_data.get('tags', [])),
'metadata': json.dumps(product_data.get('metadata', {}))
}
# Save to hash
await redis_client.hset(product_key, mapping=redis_data)
# Manually maintain indexes for queries
await redis_client.zadd(f"Product:idx:price", {product_key: product_data['price']})
await redis_client.sadd(f"Product:idx:category:{product_data['category']}", product_key)
await redis_client.sadd("Product:all", product_key)
return product_id
# Query by price range - also manual
async def find_products_by_price(redis_client, min_price, max_price):
keys = await redis_client.zrangebyscore(
"Product:idx:price",
min_price,
max_price
)
products = []
for key in keys:
data = await redis_client.hgetall(key)
# Manual deserialization for each product
products.append({
'id': data['id'],
'name': data['name'],
'price': float(data['price']),
'stock': int(data['stock']),
'tags': json.loads(data['tags']),
'metadata': json.loads(data['metadata'])
})
return products
That’s over 50 lines for basic CRUD + one query. And we haven’t even added:
Here’s the same functionality with Beanis:
from beanis import Document, Indexed, init_beanis
from beanis.odm.actions import before_event, Insert, Update
from typing import Optional, Set
from datetime import datetime
from pydantic import Field
import redis.asyncio as redis
class Product(Document):
name: str = Field(min_length=1, max_length=200)
description: Optional[str] = None
price: Indexed[float] = Field(gt=0) # Auto-indexed, validated > 0
category: Indexed[str] # Auto-indexed
stock: int = Field(ge=0) # Validated >= 0
tags: Set[str] = set()
metadata: dict = {}
# Audit fields - automatically managed
created_at: datetime = Field(default_factory=datetime.now)
updated_at: datetime = Field(default_factory=datetime.now)
@before_event(Insert)
async def on_create(self):
self.created_at = datetime.now()
@before_event(Update)
async def on_update(self):
self.updated_at = datetime.now()
class Settings:
key_prefix = "Product"
# Initialize once
client = redis.Redis(decode_responses=True)
await init_beanis(database=client, document_models=[Product])
# Create - with validation!
product = Product(
name="MacBook Pro M3",
price=2499.99,
category="electronics",
stock=50,
tags={"laptop", "apple", "premium"},
metadata={"warranty": "2 years", "color": "Space Gray"}
)
await product.insert()
# Query - indexes handled automatically
expensive = await Product.find(
category="electronics",
price__gte=1000,
price__lte=3000
)
# Update - type-safe
await product.update(stock=45, price=2299.99)
# Complex queries
out_of_stock = await Product.find(stock=0)
premium_laptops = await Product.find(
category="electronics",
price__gte=2000
)
That’s about 30 lines - including validation, audit trails, and automatic indexing. A 70% reduction in code.
Beanis isn’t just wrapping Redis - it’s bringing Pydantic’s power to your data layer:
from pydantic import EmailStr, HttpUrl, validator
from decimal import Decimal
class User(Document):
email: EmailStr # Automatic email validation
username: str = Field(min_length=3, max_length=20, pattern="^[a-zA-Z0-9_]+$")
age: int = Field(ge=13, le=120)
website: Optional[HttpUrl] = None
balance: Decimal = Decimal("0.00")
@validator('username')
def username_alphanumeric(cls, v):
if not v.isalnum():
raise ValueError('Username must be alphanumeric')
return v.lower()
# This will raise validation errors BEFORE hitting Redis
try:
user = User(
email="not-an-email", # ❌ Invalid
username="ab", # ❌ Too short
age=200 # ❌ Too old
)
except ValidationError as e:
print(e)
No more manually maintaining sorted sets and managing index consistency:
class Article(Document):
title: str
views: Indexed[int] # Sorted set automatically maintained
published_at: Indexed[datetime] # Time-based queries
author: Indexed[str] # Categorical filtering
score: Indexed[float] # Range queries
# All these queries use optimized indexes under the hood
trending = await Article.find(views__gte=10000)
recent = await Article.find(
published_at__gte=datetime.now() - timedelta(days=7)
)
popular_by_author = await Article.find(
author="john_doe",
score__gte=4.5
)
Behind the scenes, Beanis:
Working with complex types? Beanis has you covered:
import numpy as np
from PIL import Image
from beanis.odm.custom_encoders import register_custom_encoder, register_custom_decoder
# NumPy arrays
@register_custom_encoder(np.ndarray)
def encode_numpy(arr: np.ndarray) -> str:
return arr.tobytes().hex()
@register_custom_decoder(np.ndarray)
def decode_numpy(data: str, dtype=np.float32) -> np.ndarray:
return np.frombuffer(bytes.fromhex(data), dtype=dtype)
# PIL Images
@register_custom_encoder(Image.Image)
def encode_image(img: Image.Image) -> str:
buffer = io.BytesIO()
img.save(buffer, format='PNG')
return base64.b64encode(buffer.getvalue()).decode()
class MLModel(Document):
name: str
weights: np.ndarray # Seamlessly stored and retrieved
bias: np.ndarray
thumbnail: Image.Image
# It just works!
model = MLModel(
name="sentiment-classifier",
weights=np.random.rand(100, 50),
bias=np.zeros(50)
)
await model.insert()
Building location-based features? We got you:
from beanis import GeoPoint
class Restaurant(Document):
name: str
cuisine: Indexed[str]
location: GeoPoint # Lat/lon with automatic geo-indexing
rating: Indexed[float]
# Find restaurants
italian_nearby = await Restaurant.find_near(
location=GeoPoint(lat=41.9028, lon=12.4964), # Rome, Italy
radius=2000, # 2km
category="italian",
rating__gte=4.0
)
# Get distance to each result
for restaurant in italian_nearby:
distance = restaurant.location.distance_to(
GeoPoint(lat=41.9028, lon=12.4964)
)
print(f"{restaurant.name}: {distance:.2f}m away")
Implement audit trails, cache invalidation, or notifications:
class Order(Document):
user_id: str
total: Decimal
status: str = "pending"
# Audit trail
created_at: datetime
updated_at: datetime
status_history: list = []
@before_event(Insert)
async def set_timestamps(self):
now = datetime.now()
self.created_at = now
self.updated_at = now
@before_event(Update)
async def track_changes(self):
self.updated_at = datetime.now()
# Track status changes
if hasattr(self, '_original_status') and self.status != self._original_status:
self.status_history.append({
'from': self._original_status,
'to': self.status,
'at': datetime.now().isoformat()
})
@after_event(Update)
async def notify_status_change(self):
if self.status == "shipped":
await send_notification(self.user_id, f"Order {self.id} shipped!")
@after_event(Delete)
async def cleanup(self):
# Clean up related data
await OrderItem.delete_many(order_id=self.id)
I benchmarked Beanis against raw redis-py with 10,000 operations:
| Operation | Raw Redis | Beanis | Overhead | Why? |
|---|---|---|---|---|
| Insert | 0.45ms | 0.49ms | +8% | Pydantic validation |
| Get by ID | 0.38ms | 0.41ms | +8% | Type conversion |
| Range Query | 0.52ms | 0.56ms | +7% | Index optimization |
| Batch Insert (100) | 42ms | 47ms | +12% | Validation batching |
The verdict: ~8% overhead for features you’d have to build anyway (validation, serialization, type safety).
Be honest about trade-offs:
❌ Ultra-low latency requirements (< 1ms per operation) ❌ Simple key-value caching (use raw redis-py) ❌ You need RedisJSON/RediSearch modules (use Redis OM instead) ❌ Prototyping with unpredictable schema (use raw Redis first)
✅ Building production APIs with complex data models ✅ Need type safety and validation ✅ Working with teams who value clean code ✅ Migrating from MongoDB/Postgres but need Redis speed
# 10,000+ products, 1000+ queries/second
products = await Product.find(
category="electronics",
price__gte=100,
price__lte=500,
stock__gt=0
)
class Session(Document):
user_id: str
token: str
expires_at: Indexed[datetime]
# Auto-cleanup expired sessions
await Session.delete_many(expires_at__lt=datetime.now())
class Score(Document):
player_id: Indexed[str]
score: Indexed[int]
achieved_at: datetime
# Top 10 globally
top_players = await Score.find(score__gte=1000).sort('-score').limit(10)
Already have a Redis codebase? Here’s how to migrate incrementally without breaking production.
Look at your existing Redis keys and group them:
# Current Redis structure
# User:1 -> hash {name, email, age}
# User:2 -> hash {name, email, age}
# User:idx:email -> sorted set
# User:all -> set
# This becomes a Beanis document
class User(Document):
name: str
email: Indexed[str]
age: int
class Settings:
key_prefix = "User"
Start with basic types, add constraints later:
# Phase 1: Just types
class Product(Document):
name: str
price: float
stock: int
# Phase 2: Add validation
class Product(Document):
name: str = Field(min_length=1, max_length=200)
price: float = Field(gt=0) # Must be positive
stock: int = Field(ge=0) # Can't be negative
Run both systems in parallel:
async def create_product_safe(data):
# Write to Beanis
product = Product(**data)
await product.insert()
# Still write to old Redis (for rollback safety)
await redis_client.hset(
f"Product:{product.id}",
mapping=legacy_serialize(data)
)
return product
# After 1-2 weeks of dual-write, stop reading from old keys
# After 1 month, stop dual-writing
async def verify_migration():
"""Compare old vs new data"""
old_keys = await redis_client.keys("Product:*")
for key in old_keys:
product_id = key.split(":")[1]
# Get from both systems
old_data = await redis_client.hgetall(key)
new_product = await Product.get(product_id)
# Compare
assert old_data['name'] == new_product.name
assert float(old_data['price']) == new_product.price
# ... verify all fields
Beanis doesn’t have built-in TTL yet, but you can implement it:
class CachedResult(Document):
query_hash: Indexed[str]
result_data: dict
created_at: datetime = Field(default_factory=datetime.now)
class Settings:
key_prefix = "Cache"
async def is_expired(self, ttl_seconds: int = 300) -> bool:
age = (datetime.now() - self.created_at).total_seconds()
return age > ttl_seconds
# Usage
async def get_with_cache(query: str, ttl: int = 300):
query_hash = hashlib.md5(query.encode()).hexdigest()
# Check cache
cached = await CachedResult.find_one(query_hash=query_hash)
if cached and not await cached.is_expired(ttl):
return cached.result_data
# Compute and cache
result = await expensive_operation(query)
await CachedResult(
query_hash=query_hash,
result_data=result
).insert()
return result
Prevent race conditions with version numbers:
class BankAccount(Document):
account_number: str
balance: Decimal
version: int = 0
async def withdraw(self, amount: Decimal):
# Read current version
original_version = self.version
# Check balance
if self.balance < amount:
raise InsufficientFunds()
# Update
self.balance -= amount
self.version += 1
try:
await self.save()
except Exception:
# In a real implementation, check if version changed
# and retry or raise ConcurrentModificationError
raise
# Better: Use Redis transactions
async def atomic_withdraw(account_id: str, amount: Decimal):
async with redis_client.pipeline(transaction=True) as pipe:
account = await BankAccount.get(account_id)
if account.balance >= amount:
account.balance -= amount
await account.save()
Process thousands of records efficiently:
# ❌ Slow: One query per item
products = []
for product_id in product_ids:
product = await Product.get(product_id)
products.append(product)
# ✅ Fast: Batch fetch
products = await Product.find(
id__in=product_ids
).to_list()
# ✅ Even faster: Pipeline for insertions
async def bulk_insert_products(product_data_list):
products = [Product(**data) for data in product_data_list]
# Validate all first (fails fast)
for p in products:
p.model_validate(p)
# Bulk insert (uses Redis pipeline internally)
await Product.insert_many(products)
Redis favors denormalization - embrace it:
class Order(Document):
user_id: str
items: list[dict] # [{product_id, quantity, price}]
# Denormalized fields for fast queries
total_amount: Decimal
item_count: int
user_email: str # Copied from User
@classmethod
async def create_order(cls, user: User, items: list):
total = sum(item['price'] * item['quantity'] for item in items)
order = cls(
user_id=user.id,
items=items,
total_amount=total,
item_count=len(items),
user_email=user.email # Denormalize for queries
)
await order.insert()
return order
# Now you can query orders by email without joining
expensive_orders = await Order.find(
user_email="[email protected]",
total_amount__gte=1000
)
Problem: Every indexed field creates a sorted set. Too many = memory bloat.
# ❌ Bad: 10 indexes = 10 sorted sets per document
class User(Document):
name: Indexed[str]
email: Indexed[str]
age: Indexed[int]
created_at: Indexed[datetime]
last_login: Indexed[datetime]
status: Indexed[str]
role: Indexed[str]
department: Indexed[str]
manager_id: Indexed[str]
salary: Indexed[Decimal]
# ✅ Good: Only index what you query
class User(Document):
name: str
email: Indexed[str] # Frequent lookups
age: int
created_at: Indexed[datetime] # Time-range queries
last_login: datetime # Don't need to query this
status: Indexed[str] # Filter by active/inactive
role: str # Can filter client-side
department: str
manager_id: str
salary: Decimal # Sensitive, don't index
# ❌ This will fail silently or hang
user = User.get(user_id) # Missing await!
# ✅ Always await
user = await User.get(user_id)
# ✅ Use async comprehensions
users = [await User.get(uid) for uid in user_ids]
# ✅ Even better: batch fetch
users = await User.find(id__in=user_ids).to_list()
# ❌ N+1 queries (slow!)
orders = await Order.find_all()
for order in orders:
user = await User.get(order.user_id) # N queries!
print(f"{user.name}: ${order.total}")
# ✅ Denormalize (recommended for Redis)
class Order(Document):
user_id: str
user_name: str # Denormalized
total: Decimal
orders = await Order.find_all()
for order in orders:
print(f"{order.user_name}: ${order.total}") # No extra query!
# ✅ Or batch fetch users
orders = await Order.find_all()
user_ids = {order.user_id for order in orders}
users = {u.id: u for u in await User.find(id__in=user_ids)}
for order in orders:
user = users[order.user_id]
print(f"{user.name}: ${order.total}")
import redis.asyncio as redis
from redis.asyncio.connection import ConnectionPool
# ✅ Reuse connections
pool = ConnectionPool.from_url(
"redis://localhost",
max_connections=50,
decode_responses=True
)
client = redis.Redis(connection_pool=pool)
await init_beanis(database=client, document_models=[Product, User])
# If inserting many documents, validate in bulk
products_data = [...] # 1000 products
# ✅ Validate all first (parallel)
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as executor:
validated = list(executor.map(
lambda d: Product(**d),
products_data
))
# Then insert (uses pipeline automatically)
await Product.insert_many(validated)
# ❌ Fetching everything then filtering in Python
all_products = await Product.find_all()
cheap = [p for p in all_products if p.price < 100]
# ✅ Filter in Redis
cheap = await Product.find(price__lt=100)
# ✅ Use projections (when implemented)
# cheap = await Product.find(price__lt=100).project(['name', 'price'])
pip install beanis
from beanis import Document, Indexed, init_beanis
import redis.asyncio as redis
# 1. Define your model
class User(Document):
username: str
email: Indexed[str]
score: Indexed[int] = 0
# 2. Initialize
client = redis.Redis(decode_responses=True)
await init_beanis(database=client, document_models=[User])
# 3. Use it!
user = User(username="john", email="[email protected]")
await user.insert()
# Find users
top_users = await User.find(score__gte=100)
Full documentation: andreim14.github.io/beanis
Beanis is production-ready today with:
Roadmap:
I built Beanis to scratch my own itch, and now I’m sharing it with the world. If you:
Give Beanis a try:
pip install beanisFound a bug? Have a feature request? Open an issue - I read and respond to everything.
Happy coding! 🚀
Beanis is inspired by Beanie by Roman Right. Standing on the shoulders of giants.
Built with ❤️ by Andrei Stefan Bejgu - AI Applied Scientist @ SylloTips
]]>