feat: Replace Pydantic with native Python dataclasses for cog.BaseModel#2681
Merged
tempusfrangit merged 23 commits intomainfrom Feb 4, 2026
Merged
feat: Replace Pydantic with native Python dataclasses for cog.BaseModel#2681tempusfrangit merged 23 commits intomainfrom
tempusfrangit merged 23 commits intomainfrom
Conversation
Remove the legacy pydantic-based Python SDK to prepare for the dataclass-based implementation. This includes all server code, type definitions, and associated tests.
Replace pydantic with a pure dataclass-based implementation: - Type inspection without pydantic overhead - Schema generation using native Python types - Custom coder system for complex type serialization - API compatible with existing predictors
Remove multi-wheel complexity now that pydantic-based cog is replaced: - pkg/wheels: embed only the cog wheel, remove cog-dataclass - pkg/dockerfile: simplify wheel installation to single embedded wheel - integration-tests: remove cog_dataclass condition - CI: remove dataclass-specific test matrix entries - tox: remove pydantic version matrix - mise: consolidate coglet-python test task
Delete tests that specifically test pydantic 1.x/2.x behavior which is no longer relevant with the dataclass-based implementation.
The dataclass implementation handles Pydantic BaseModel outputs via duck-typing - it checks for model_dump() (v2) or dict() (v1) methods in cog/json.py:make_encodeable(). Users can still use Pydantic for their own model types.
Remove obsolete skips - the test uses Python 3.10 which is supported. Verified passing with both Python and Rust coglet servers.
Remove obsolete skips - the tests use Python 3.10 which is supported. These are slow tests that will run in CI (not -short mode).
Remove obsolete skips. This test verifies cog version in base images. Verified passing with both Python and Rust coglet servers.
coglet_alpha is no longer a supported configuration - remove all skips.
- Simplify format_validation_error to use cog's already-formatted errors - Remove unwrap_pydantic_serialization_iterators (no longer needed) - Remove schema_via_fastapi fallback, use cog._schemas directly - Update Runtime enum: remove Pydantic variant, rename NonPydantic to Cog - Update SdkImplementation: remove Pydantic/Dataclass, use Cog/Unknown - Update detection to check for cog._adt module - Update comments to remove pydantic references
…BaseModel pydantic.BaseModel outputs are no longer supported. Users should use cog.BaseModel (a dataclass) or @DataClass for structured outputs.
Add support for user-defined healthcheck() method on predictors: - Add Healthcheck event type to eventtypes.py - Add get_healthcheck() helper to predictor.py - Add healthcheck() method to Worker and _ChildWorker classes - Add healthcheck() to PredictionRunner - Update /health-check endpoint to call user healthcheck - Add UNHEALTHY status to Health enum Features: - Sync and async healthcheck methods supported - 5 second timeout for healthcheck execution - Returns UNHEALTHY with error details on failure/timeout/exception Remove [cog_dataclass] skip from healthcheck integration tests.
Add healthcheck support to coglet-rust: Protocol: - Add ControlRequest::Healthcheck and ControlResponse::HealthcheckResult - Add HealthcheckStatus enum (Healthy/Unhealthy) Orchestrator: - Add HealthcheckResult type with healthy()/unhealthy() constructors - Add healthcheck() method to Orchestrator trait - Implement request/response flow via control channel - Add semaphore to prevent concurrent healthchecks (skip if busy) - Handle healthcheck results in event loop HTTP: - Add HealthResponse enum (includes transient UNHEALTHY state) - Update /health-check to call user healthcheck when ready - Return user_healthcheck_error in response on failure Worker: - Add healthcheck() to PredictHandler trait (default: healthy) - Handle Healthcheck requests in worker event loop Python integration (coglet-python): - Add has_healthcheck() and is_healthcheck_async() to PythonPredictor - Implement healthcheck_sync() with ThreadPoolExecutor + 5s timeout - Implement healthcheck_async() with asyncio.wait_for + 5s timeout - Wire up in PythonPredictHandler::healthcheck()
- Remove [coglet_rust] skip from existing sync healthcheck tests - Add async healthcheck tests: - healthcheck_async_custom: async healthcheck returning True - healthcheck_async_unhealthy: async healthcheck returning False - healthcheck_async_exception: async healthcheck raising exception - healthcheck_async_timeout: async healthcheck timing out (>5s)
Python type fixes: - _adt.py: Fix type hints for PrimitiveType methods to handle Any - config.py: Add type arguments to dict types - input.py: Add cast for default_factory, add type ignore for field() - coder.py: Rename factory parameter from cls to tpe (static method) - coders/*.py: Match renamed parameter in factory method overrides - http.py: Add type ignores for dynamic FastAPI types and coglet module - _inspector.py: Remove unused imports, add 'from None' to re-raises Makefile: - Update tox env from typecheck-pydantic2 to typecheck (pydantic removed) Cleanup: - Remove unused warnings import from _inspector.py - Remove experimental coders warning
- Change timeout format from {} to {:.1} to output '5.0' instead of '5'
- Update test harness waitForServer to accept UNHEALTHY and BUSY as valid 'ready' states
- Remove Pydantic compat code from cog.Path - Update README, docs/python.md, docs/llms.txt - Clean up comments referencing pydantic
- Remove pydantic from dependencies in pyproject.toml - Simplify dependencies to minimal set - Remove PYDANTIC_V2 constant from pyright config - Delete cog-dataclass/ directory (was scaffold, code now in python/cog/)
- Remove unused Type import from types.py - Remove pydantic from Go dockerfile test expectation - Remove pydantic comment from requirements_test.go - Fix pyright warnings in openapi_schema.py (use Any type) - Sanitize validation error messages to first line only
4cce837 to
69745a5
Compare
Use prediction to trigger slow healthcheck mode instead of relying on call counting, which was flaky due to harness also calling healthcheck.
Use ThreadPoolExecutor with shutdown(wait=False) to avoid blocking when sync healthcheck exceeds timeout. Previously the context manager would wait for the thread to complete even after timeout.
8a1c2e7 to
a95882c
Compare
a95882c to
75759b1
Compare
Add _sanitize_validation_message() that only passes through known safe validation patterns (Field required, Invalid value, fails constraint, does not match regex/choices). Unknown messages are replaced with generic 'Invalid value' to prevent potential stack trace or internal details from reaching clients. This addresses CodeQL security warning about information exposure.
ca4bd72 to
3c81266
Compare
michaeldwan
approved these changes
Feb 4, 2026
Member
michaeldwan
left a comment
There was a problem hiding this comment.
This beefy PR looks good. It's a lot, but no blocking issues... so @tempusfrangit merge it and we'll chat through a few gaps in test coverage that were accidentally covered in the deleted tests.
|
|
||
| test-coglet-python: | ||
| name: "Test Coglet Python bindings (${{ matrix.runtime }})" | ||
| name: "Test Coglet Python bindings" |
Member
There was a problem hiding this comment.
Probably out of scope for this PR, but I wonder if we should be testing this, and all the other python code, in a matrix across supported python versions.
Contributor
Author
There was a problem hiding this comment.
It should be something we consider but if we fail on any version of python it's a bug in Maturin/pyo3 since we're compiling to pure ABI3, 3.10+
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces Pydantic with native Python dataclasses as the foundation for
cog.BaseModeland the input/output type system. This is a significant architectural change that simplifies the codebase, removes a major dependency, and provides a more predictable, lightweight runtime.Motivation
Key Changes
Python SDK (
python/cog/)cog.BaseModel: Now a native Python dataclass instead of Pydantic BaseModel_adt.py): Algebraic Data Type utilities for type-safe unions and enumscoder/): Modular encode/decode system for all supported types (primitives, files, paths, secrets, lists, etc.)input.py): Clean dataclass-based input validationcog.Field(): Provides metadata (default, description, ge, le, choices) without Pydantic dependencyRust Coglet (
crates/)healthcheck()method support with 5-second timeoutUNHEALTHYstatus in health responseIntegration Tests
build_pydantic1_none,complex_types, etc.)[coglet_rust]for healthcheck tests (removed skips)complex_outputto usecog.BaseModelinstead ofpydantic.BaseModelRemoved Features
Breaking Changes
cog.BaseModelis now a dataclass - Models must be defined as dataclasses, not Pydantic modelscog.Field()API changes - Uses dataclass field metadata instead of Pydantic Field