Skip to content

feat: Replace Pydantic with native Python dataclasses for cog.BaseModel#2681

Merged
tempusfrangit merged 23 commits intomainfrom
feat/dataclass-cog
Feb 4, 2026
Merged

feat: Replace Pydantic with native Python dataclasses for cog.BaseModel#2681
tempusfrangit merged 23 commits intomainfrom
feat/dataclass-cog

Conversation

@tempusfrangit
Copy link
Copy Markdown
Contributor

This PR replaces Pydantic with native Python dataclasses as the foundation for cog.BaseModel and the input/output type system. This is a significant architectural change that simplifies the codebase, removes a major dependency, and provides a more predictable, lightweight runtime.

Motivation

  • Simplification: Pydantic added complexity for maintaining compatibility across v1 and v2 APIs
  • Performance: Native dataclasses have lower overhead than Pydantic models
  • Predictability: Removes magic behavior and implicit coercion from Pydantic
  • Maintenance: Eliminates need to support both Pydantic 1.x and 2.x codepaths

Key Changes

Python SDK (python/cog/)

  • New cog.BaseModel: Now a native Python dataclass instead of Pydantic BaseModel
  • New ADT system (_adt.py): Algebraic Data Type utilities for type-safe unions and enums
  • New type coders (coder/): Modular encode/decode system for all supported types (primitives, files, paths, secrets, lists, etc.)
  • Simplified input handling (input.py): Clean dataclass-based input validation
  • Updated cog.Field(): Provides metadata (default, description, ge, le, choices) without Pydantic dependency

Rust Coglet (crates/)

  • Removed Pydantic detection: No longer checks for or handles Pydantic-specific code paths
  • Simplified input processing: Direct dataclass field extraction
  • User-defined healthcheck support: New healthcheck() method support with 5-second timeout
    • Sync and async healthcheck support
    • Proper error handling and timeout messages
    • UNHEALTHY status in health response

Integration Tests

  • Removed Pydantic-specific tests (build_pydantic1_none, complex_types, etc.)
  • Added async healthcheck tests (4 new test files)
  • Enabled [coglet_rust] for healthcheck tests (removed skips)
  • Updated complex_output to use cog.BaseModel instead of pydantic.BaseModel

Removed Features

  • Pydantic 1.x/2.x compatibility layer

Breaking Changes

  1. cog.BaseModel is now a dataclass - Models must be defined as dataclasses, not Pydantic models
  2. No implicit type coercion - Types must match exactly (no automatic string→int conversion)
  3. cog.Field() API changes - Uses dataclass field metadata instead of Pydantic Field

Remove the legacy pydantic-based Python SDK to prepare for the
dataclass-based implementation. This includes all server code,
type definitions, and associated tests.
Replace pydantic with a pure dataclass-based implementation:
- Type inspection without pydantic overhead
- Schema generation using native Python types
- Custom coder system for complex type serialization
- API compatible with existing predictors
Remove multi-wheel complexity now that pydantic-based cog is replaced:
- pkg/wheels: embed only the cog wheel, remove cog-dataclass
- pkg/dockerfile: simplify wheel installation to single embedded wheel
- integration-tests: remove cog_dataclass condition
- CI: remove dataclass-specific test matrix entries
- tox: remove pydantic version matrix
- mise: consolidate coglet-python test task
Delete tests that specifically test pydantic 1.x/2.x behavior which is
no longer relevant with the dataclass-based implementation.
The dataclass implementation handles Pydantic BaseModel outputs via
duck-typing - it checks for model_dump() (v2) or dict() (v1) methods
in cog/json.py:make_encodeable(). Users can still use Pydantic for
their own model types.
Remove obsolete skips - the test uses Python 3.10 which is supported.
Verified passing with both Python and Rust coglet servers.
Remove obsolete skips - the tests use Python 3.10 which is supported.
These are slow tests that will run in CI (not -short mode).
Remove obsolete skips. This test verifies cog version in base images.
Verified passing with both Python and Rust coglet servers.
coglet_alpha is no longer a supported configuration - remove all skips.
- Simplify format_validation_error to use cog's already-formatted errors
- Remove unwrap_pydantic_serialization_iterators (no longer needed)
- Remove schema_via_fastapi fallback, use cog._schemas directly
- Update Runtime enum: remove Pydantic variant, rename NonPydantic to Cog
- Update SdkImplementation: remove Pydantic/Dataclass, use Cog/Unknown
- Update detection to check for cog._adt module
- Update comments to remove pydantic references
…BaseModel

pydantic.BaseModel outputs are no longer supported. Users should use
cog.BaseModel (a dataclass) or @DataClass for structured outputs.
Add support for user-defined healthcheck() method on predictors:
- Add Healthcheck event type to eventtypes.py
- Add get_healthcheck() helper to predictor.py
- Add healthcheck() method to Worker and _ChildWorker classes
- Add healthcheck() to PredictionRunner
- Update /health-check endpoint to call user healthcheck
- Add UNHEALTHY status to Health enum

Features:
- Sync and async healthcheck methods supported
- 5 second timeout for healthcheck execution
- Returns UNHEALTHY with error details on failure/timeout/exception

Remove [cog_dataclass] skip from healthcheck integration tests.
Add healthcheck support to coglet-rust:

Protocol:
- Add ControlRequest::Healthcheck and ControlResponse::HealthcheckResult
- Add HealthcheckStatus enum (Healthy/Unhealthy)

Orchestrator:
- Add HealthcheckResult type with healthy()/unhealthy() constructors
- Add healthcheck() method to Orchestrator trait
- Implement request/response flow via control channel
- Add semaphore to prevent concurrent healthchecks (skip if busy)
- Handle healthcheck results in event loop

HTTP:
- Add HealthResponse enum (includes transient UNHEALTHY state)
- Update /health-check to call user healthcheck when ready
- Return user_healthcheck_error in response on failure

Worker:
- Add healthcheck() to PredictHandler trait (default: healthy)
- Handle Healthcheck requests in worker event loop

Python integration (coglet-python):
- Add has_healthcheck() and is_healthcheck_async() to PythonPredictor
- Implement healthcheck_sync() with ThreadPoolExecutor + 5s timeout
- Implement healthcheck_async() with asyncio.wait_for + 5s timeout
- Wire up in PythonPredictHandler::healthcheck()
- Remove [coglet_rust] skip from existing sync healthcheck tests
- Add async healthcheck tests:
  - healthcheck_async_custom: async healthcheck returning True
  - healthcheck_async_unhealthy: async healthcheck returning False
  - healthcheck_async_exception: async healthcheck raising exception
  - healthcheck_async_timeout: async healthcheck timing out (>5s)
Python type fixes:
- _adt.py: Fix type hints for PrimitiveType methods to handle Any
- config.py: Add type arguments to dict types
- input.py: Add cast for default_factory, add type ignore for field()
- coder.py: Rename factory parameter from cls to tpe (static method)
- coders/*.py: Match renamed parameter in factory method overrides
- http.py: Add type ignores for dynamic FastAPI types and coglet module
- _inspector.py: Remove unused imports, add 'from None' to re-raises

Makefile:
- Update tox env from typecheck-pydantic2 to typecheck (pydantic removed)

Cleanup:
- Remove unused warnings import from _inspector.py
- Remove experimental coders warning
- Change timeout format from {} to {:.1} to output '5.0' instead of '5'
- Update test harness waitForServer to accept UNHEALTHY and BUSY as valid 'ready' states
@tempusfrangit tempusfrangit requested a review from a team as a code owner February 4, 2026 00:34
@tempusfrangit tempusfrangit marked this pull request as draft February 4, 2026 00:34
Comment thread python/cog/server/http.py Dismissed
- Remove Pydantic compat code from cog.Path
- Update README, docs/python.md, docs/llms.txt
- Clean up comments referencing pydantic
- Remove pydantic from dependencies in pyproject.toml
- Simplify dependencies to minimal set
- Remove PYDANTIC_V2 constant from pyright config
- Delete cog-dataclass/ directory (was scaffold, code now in python/cog/)
- Remove unused Type import from types.py
- Remove pydantic from Go dockerfile test expectation
- Remove pydantic comment from requirements_test.go
- Fix pyright warnings in openapi_schema.py (use Any type)
- Sanitize validation error messages to first line only
Use prediction to trigger slow healthcheck mode instead of relying on
call counting, which was flaky due to harness also calling healthcheck.
Use ThreadPoolExecutor with shutdown(wait=False) to avoid blocking
when sync healthcheck exceeds timeout. Previously the context manager
would wait for the thread to complete even after timeout.
@tempusfrangit tempusfrangit marked this pull request as ready for review February 4, 2026 04:29
Add _sanitize_validation_message() that only passes through known safe
validation patterns (Field required, Invalid value, fails constraint,
does not match regex/choices). Unknown messages are replaced with
generic 'Invalid value' to prevent potential stack trace or internal
details from reaching clients.

This addresses CodeQL security warning about information exposure.
Copy link
Copy Markdown
Member

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This beefy PR looks good. It's a lot, but no blocking issues... so @tempusfrangit merge it and we'll chat through a few gaps in test coverage that were accidentally covered in the deleted tests.

Comment thread .github/workflows/ci.yaml

test-coglet-python:
name: "Test Coglet Python bindings (${{ matrix.runtime }})"
name: "Test Coglet Python bindings"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably out of scope for this PR, but I wonder if we should be testing this, and all the other python code, in a matrix across supported python versions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be something we consider but if we fail on any version of python it's a bug in Maturin/pyo3 since we're compiling to pure ABI3, 3.10+

@tempusfrangit tempusfrangit merged commit df576ff into main Feb 4, 2026
31 of 32 checks passed
@tempusfrangit tempusfrangit deleted the feat/dataclass-cog branch February 4, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants