TL;DR: OpenTelemetry's GenAI instrumentation gives you traces, but conversations span multiple traces. We built an SDK that bridges this gap with conversation IDs, automatic cost tracking, and context propagation - without replacing your existing OTel setup.
If you're building LLM applications, you've probably tried instrumenting them with OpenTelemetry. The GenAI semantic conventions are great - they give you gen_ai.request.model, gen_ai.usage.input_tokens, and all the standard attributes you need.
But here's what they don't give you:
A user opens your chatbot and has a 10-turn conversation. With standard OTel instrumentation, you get:
Trace 1: User asks "What's the weather?"
└─ Span: OpenAI API call
Trace 2: User asks "Will I need an umbrella?"
└─ Span: OpenAI API call
Trace 3: User asks "What about tomorrow?"
└─ Span: OpenAI API call
The problem: Each request is a separate trace with a different trace_id. You can't answer:
- "What was the full conversation?"
- "How much did this entire conversation cost?"
- "What's the average cost per conversation for user X?"
You're stuck with individual traces that don't connect to the business concept of a conversation.
OTel GenAI gives you token counts (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens), but:
- ❌ No cost calculation (even though you know your pricing)
- ❌ No cost aggregation across multiple LLM calls
- ❌ No way to track: "This RAG workflow cost $0.47"
You end up writing custom logic to:
- Extract token counts from traces
- Multiply by your model pricing
- Aggregate costs across related spans
- Store and query this data separately
Your RAG pipeline has multiple steps:
def handle_query(user_query):
# Step 1: Retrieve documents (embedding call)
docs = retrieve_documents(user_query)
# Step 2: Rerank (another LLM call)
relevant_docs = rerank(docs, user_query)
# Step 3: Generate response (GPT-4 call)
response = generate_response(relevant_docs, user_query)
return responseWith standard OTel, you get three separate spans, but:
- ❌ No way to tag them all as part of "RAG workflow ID: xyz"
- ❌ No automatic cost aggregation across the workflow
- ❌ Manual span correlation required in your observability backend
When you're building production LLM applications, you need to answer questions like:
Business Questions:
- "What's our average cost per conversation?"
- "Which users are having the most expensive conversations?"
- "How much did we spend on GPT-4 vs Claude this week?"
Engineering Questions:
- "This conversation went off the rails - what was the full context?"
- "Our costs spiked - which workflows are responsible?"
- "How do I debug this multi-step RAG pipeline?"
OpenTelemetry gives you traces and spans, but you need conversations and workflows.
We built last9-genai as an extension to OpenTelemetry that adds:
- Conversation tracking - Group all spans from a multi-turn conversation
- Workflow tracking - Track complex multi-step AI operations
- Automatic cost tracking - Calculate costs from token usage + your pricing
- Context propagation - Automatically tag all nested operations
Important: This is not a replacement for OTel GenAI instrumentation. It works alongside it (or standalone).
from last9_genai import conversation_context, observe, ModelPricing
from openai import OpenAI
client = OpenAI()
# Define your pricing (USD per million tokens)
pricing = {
"gpt-4o": ModelPricing(input=2.50, output=10.0),
}
@observe(pricing=pricing)
def chat(message: str, conversation_id: str):
"""Each call is automatically tracked with conversation context"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}]
)
return response.choices[0].message.content
# User has a 3-turn conversation
with conversation_context(conversation_id="user_123_session_456", user_id="user_123"):
# Turn 1
response1 = chat("What's the weather in SF?", "user_123_session_456")
# Turn 2
response2 = chat("Will I need an umbrella?", "user_123_session_456")
# Turn 3
response3 = chat("What about tomorrow?", "user_123_session_456")What you get:
All three LLM calls are automatically tagged with:
{
"gen_ai.conversation.id": "user_123_session_456",
"gen_ai.conversation.turn": 1, // auto-incremented
"gen_ai.usage.cost_usd": 0.000234, // calculated automatically
"user.id": "user_123",
"gen_ai.l9.span.kind": "llm" // span classification
}Now you can query: "Show me all spans where conversation_id = user_123_session_456" - and you get the complete conversation, even though they're in separate traces.
from last9_genai import workflow_context, observe, ModelPricing
pricing = {
"gpt-4o": ModelPricing(input=2.50, output=10.0),
"text-embedding-3-small": ModelPricing(input=0.02, output=0.0),
}
@observe(span_kind="tool", pricing=pricing)
def retrieve_documents(query: str):
# Embedding API call
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=query
)
# Search vector DB...
return ["doc1", "doc2", "doc3"]
@observe(pricing=pricing)
def rerank_documents(docs, query):
# LLM call to rerank
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "Rerank these documents by relevance"
}]
)
return response
@observe(pricing=pricing)
def generate_response(docs, query):
# Final LLM call
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": f"Use these docs: {docs}"
}]
)
return response.choices[0].message.content
# Track the entire RAG workflow
with workflow_context(workflow_id="rag_12345", workflow_type="rag"):
docs = retrieve_documents("What is RAG?") # Cost: $0.000002
ranked = rerank_documents(docs, "What is RAG?") # Cost: $0.000123
answer = generate_response(ranked, "What is RAG?") # Cost: $0.000456
# Total workflow cost: $0.000581 (automatically calculated)What you get:
All spans tagged with:
{
"workflow.id": "rag_12345",
"workflow.type": "rag",
"gen_ai.usage.cost_usd": 0.000581, // aggregated across all steps
"gen_ai.l9.span.kind": "tool" // or "llm" depending on operation
}Now you can answer:
- "Show me all RAG workflows that cost more than $1"
- "What's the average cost of our RAG pipeline?"
- "Which workflow ID had the error?"
Real applications have both - a multi-turn conversation where each turn triggers a workflow:
with conversation_context(conversation_id="support_session_789", user_id="user_456"):
# Turn 1: User asks question
with workflow_context(workflow_id="rag_001", workflow_type="rag"):
docs = retrieve_documents(user_query)
response1 = generate_response(docs, user_query)
# Turn 2: Follow-up question
with workflow_context(workflow_id="rag_002", workflow_type="rag"):
docs = retrieve_documents(followup_query)
response2 = generate_response(docs, followup_query)All spans automatically inherit:
{
"gen_ai.conversation.id": "support_session_789",
"user.id": "user_456",
"workflow.id": "rag_001", // or rag_002
"workflow.type": "rag",
"gen_ai.conversation.turn": 1 // or 2
}Context propagates automatically - no need to manually pass IDs through your call stack.
Traditional OTel observability is trace-centric:
trace_id: abc123 → Single request/response
But LLM applications are conversation-centric:
conversation_id: user_session_456 → Multiple traces over time
conversation_id becomes your primary grouping dimension, letting you:
- Debug full conversations, not just individual API calls
- Calculate conversation-level metrics (cost, latency, token usage)
- Track user behavior across sessions
You bring your model pricing, we handle the calculation:
pricing = {
"gpt-4o": ModelPricing(input=2.50, output=10.0), # USD per million tokens
}
@observe(pricing=pricing)
def chat(message):
# SDK automatically:
# 1. Extracts token counts from response
# 2. Calculates cost: (input_tokens/1M * 2.50) + (output_tokens/1M * 10.0)
# 3. Adds gen_ai.usage.cost_usd attribute to span
return client.chat.completions.create(...)No custom backends, no post-processing - cost is a first-class metric in your spans.
This is an OpenTelemetry extension, not a proprietary solution:
- ✅ Uses standard OTel SDK (TracerProvider, SpanProcessor)
- ✅ Exports to any OTLP backend (Last9, Datadog, Honeycomb, etc.)
- ✅ Works alongside OTel auto-instrumentation
- ✅ No vendor lock-in
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from last9_genai import Last9SpanProcessor
# Use any OTLP backend
otlp_exporter = OTLPSpanExporter(endpoint="https://your-backend.com")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
# Add conversation/workflow tracking
provider.add_span_processor(Last9SpanProcessor())Use the @observe() decorator for automatic tracking:
@observe(pricing=pricing)
def my_llm_function(prompt: str):
# Automatically tracked:
# - Input/output capture
# - Latency
# - Token usage
# - Cost calculation
# - Error handling
# - Conversation context (if inside conversation_context)
return client.chat.completions.create(...)No manual span creation, no boilerplate.
Problem: User reports "the chatbot gave a weird response"
Without conversation tracking:
Search for: user_id = "user_123" AND timestamp ~= "2024-01-15"
→ Get 47 traces
→ Manually correlate which traces belong to the problematic conversation
→ Miss context from earlier turns
With conversation tracking:
Search for: conversation_id = "user_123_session_456"
→ Get all 8 turns of the conversation in order
→ See full context: what the user asked, what the model responded
→ Identify that turn 6 had incorrect context from turn 2
Problem: You need to charge customers based on LLM usage
Without cost tracking:
1. Export all traces to data warehouse
2. Parse token counts from span attributes
3. Join with pricing table
4. Calculate costs per trace
5. Group by customer_id
6. Generate invoice
With cost tracking:
Query: SELECT SUM(gen_ai.usage.cost_usd)
WHERE customer_id = "acme_corp"
AND date = "2024-01"
→ $247.32
Problem: RAG pipeline costs are too high
With workflow tracking:
Query: AVG(gen_ai.usage.cost_usd) GROUP BY workflow.type
→ rag_pipeline_v1: $0.42 per query
→ rag_pipeline_v2: $0.18 per query
Query: Show me spans WHERE workflow.type = "rag_pipeline_v1"
AND gen_ai.usage.cost_usd > 0.50
→ See which queries are expensive
→ Identify that reranking step is the culprit
→ Optimize or remove reranking
# From GitHub (available now)
pip install git+https://github.com/last9/[email protected]
# Or from PyPI (coming soon)
pip install last9-genai[otlp]from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from last9_genai import Last9SpanProcessor, conversation_context, observe, ModelPricing
# Set up OpenTelemetry
provider = TracerProvider()
otlp_exporter = OTLPSpanExporter(endpoint="https://otlp.your-backend.io")
provider.add_span_processor(BatchSpanProcessor(otlp_exporter))
# Add Last9 processor for conversation/cost tracking
provider.add_span_processor(Last9SpanProcessor())
trace.set_tracer_provider(provider)
# Define your pricing
pricing = {
"gpt-4o": ModelPricing(input=2.50, output=10.0),
}
# Start tracking!
with conversation_context(conversation_id="user_session", user_id="user_123"):
@observe(pricing=pricing)
def chat(message):
return client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}]
)| Feature | OTel GenAI Instrumentation | Last9 GenAI SDK |
|---|---|---|
| Auto-instrument LLM SDKs | ✅ Yes (monkey-patching) | ❌ No (manual decorator) |
| Standard gen_ai. attributes* | ✅ Yes | ✅ Yes |
| Conversation tracking | ❌ No | ✅ Yes (conversation_id) |
| Multi-turn tracking | ❌ No | ✅ Yes (auto turn counter) |
| Workflow tracking | ❌ No | ✅ Yes (workflow_id) |
| Cost calculation | ❌ No | ✅ Yes (with your pricing) |
| Cost aggregation | ❌ No | ✅ Yes (workflow-level) |
| Context propagation | ❌ Manual | ✅ Automatic (thread-safe) |
| Span classification | ❌ No | ✅ Yes (llm/tool/chain/agent) |
| Custom attributes | ✅ Manual | ✅ Automatic propagation |
Recommendation: Use both together!
- OTel GenAI auto-instruments your LLM SDKs (traces every API call)
- Last9 GenAI adds conversation/workflow context and cost tracking
We built this SDK because we needed it at Last9 for our own LLM applications. The gap between "traces" and "conversations" was too painful, and cost tracking was essential for production.
Try it out:
- GitHub: https://github.com/last9/python-ai-sdk
- Examples: https://github.com/last9/python-ai-sdk/tree/main/examples
- Installation: https://github.com/last9/python-ai-sdk#installation
Questions? Feedback?
- Open an issue: https://github.com/last9/python-ai-sdk/issues
- Tweet at us: @last9io
- LLM observability is different - conversations span multiple traces, traditional trace_id isn't enough
- Conversation ID is your primary dimension - group all related spans across traces
- Cost is a first-class metric - calculate it automatically from token usage + your pricing
- Context propagates automatically - no need to pass IDs through your call stack
- Works with existing OTel - it's an extension, not a replacement
Build better LLM applications with proper observability. 🚀
Written by the Last9 team - making observability suck less, one SDK at a time.