feat: Replace Pydantic with native Python dataclasses for cog.BaseModel by tempusfrangit · Pull Request #2681 · replicate/cog

tempusfrangit · 2026-02-04T00:34:28Z

This PR replaces Pydantic with native Python dataclasses as the foundation for cog.BaseModel and the input/output type system. This is a significant architectural change that simplifies the codebase, removes a major dependency, and provides a more predictable, lightweight runtime.

Motivation

Simplification: Pydantic added complexity for maintaining compatibility across v1 and v2 APIs
Performance: Native dataclasses have lower overhead than Pydantic models
Predictability: Removes magic behavior and implicit coercion from Pydantic
Maintenance: Eliminates need to support both Pydantic 1.x and 2.x codepaths

Key Changes

Python SDK (`python/cog/`)

New cog.BaseModel: Now a native Python dataclass instead of Pydantic BaseModel
New ADT system (_adt.py): Algebraic Data Type utilities for type-safe unions and enums
New type coders (coder/): Modular encode/decode system for all supported types (primitives, files, paths, secrets, lists, etc.)
Simplified input handling (input.py): Clean dataclass-based input validation
Updated cog.Field(): Provides metadata (default, description, ge, le, choices) without Pydantic dependency

Rust Coglet (`crates/`)

Removed Pydantic detection: No longer checks for or handles Pydantic-specific code paths
Simplified input processing: Direct dataclass field extraction
User-defined healthcheck support: New healthcheck() method support with 5-second timeout
- Sync and async healthcheck support
- Proper error handling and timeout messages
- UNHEALTHY status in health response

Integration Tests

Removed Pydantic-specific tests (build_pydantic1_none, complex_types, etc.)
Added async healthcheck tests (4 new test files)
Enabled [coglet_rust] for healthcheck tests (removed skips)
Updated complex_output to use cog.BaseModel instead of pydantic.BaseModel

Removed Features

Pydantic 1.x/2.x compatibility layer

Breaking Changes

cog.BaseModel is now a dataclass - Models must be defined as dataclasses, not Pydantic models
No implicit type coercion - Types must match exactly (no automatic string→int conversion)
cog.Field() API changes - Uses dataclass field metadata instead of Pydantic Field

Remove the legacy pydantic-based Python SDK to prepare for the dataclass-based implementation. This includes all server code, type definitions, and associated tests.

Replace pydantic with a pure dataclass-based implementation: - Type inspection without pydantic overhead - Schema generation using native Python types - Custom coder system for complex type serialization - API compatible with existing predictors

Remove multi-wheel complexity now that pydantic-based cog is replaced: - pkg/wheels: embed only the cog wheel, remove cog-dataclass - pkg/dockerfile: simplify wheel installation to single embedded wheel - integration-tests: remove cog_dataclass condition - CI: remove dataclass-specific test matrix entries - tox: remove pydantic version matrix - mise: consolidate coglet-python test task

Delete tests that specifically test pydantic 1.x/2.x behavior which is no longer relevant with the dataclass-based implementation.

The dataclass implementation handles Pydantic BaseModel outputs via duck-typing - it checks for model_dump() (v2) or dict() (v1) methods in cog/json.py:make_encodeable(). Users can still use Pydantic for their own model types.

Remove obsolete skips - the test uses Python 3.10 which is supported. Verified passing with both Python and Rust coglet servers.

Remove obsolete skips - the tests use Python 3.10 which is supported. These are slow tests that will run in CI (not -short mode).

Remove obsolete skips. This test verifies cog version in base images. Verified passing with both Python and Rust coglet servers.

coglet_alpha is no longer a supported configuration - remove all skips.

- Simplify format_validation_error to use cog's already-formatted errors - Remove unwrap_pydantic_serialization_iterators (no longer needed) - Remove schema_via_fastapi fallback, use cog._schemas directly - Update Runtime enum: remove Pydantic variant, rename NonPydantic to Cog - Update SdkImplementation: remove Pydantic/Dataclass, use Cog/Unknown - Update detection to check for cog._adt module - Update comments to remove pydantic references

…BaseModel pydantic.BaseModel outputs are no longer supported. Users should use cog.BaseModel (a dataclass) or @DataClass for structured outputs.

Add support for user-defined healthcheck() method on predictors: - Add Healthcheck event type to eventtypes.py - Add get_healthcheck() helper to predictor.py - Add healthcheck() method to Worker and _ChildWorker classes - Add healthcheck() to PredictionRunner - Update /health-check endpoint to call user healthcheck - Add UNHEALTHY status to Health enum Features: - Sync and async healthcheck methods supported - 5 second timeout for healthcheck execution - Returns UNHEALTHY with error details on failure/timeout/exception Remove [cog_dataclass] skip from healthcheck integration tests.

Add healthcheck support to coglet-rust: Protocol: - Add ControlRequest::Healthcheck and ControlResponse::HealthcheckResult - Add HealthcheckStatus enum (Healthy/Unhealthy) Orchestrator: - Add HealthcheckResult type with healthy()/unhealthy() constructors - Add healthcheck() method to Orchestrator trait - Implement request/response flow via control channel - Add semaphore to prevent concurrent healthchecks (skip if busy) - Handle healthcheck results in event loop HTTP: - Add HealthResponse enum (includes transient UNHEALTHY state) - Update /health-check to call user healthcheck when ready - Return user_healthcheck_error in response on failure Worker: - Add healthcheck() to PredictHandler trait (default: healthy) - Handle Healthcheck requests in worker event loop Python integration (coglet-python): - Add has_healthcheck() and is_healthcheck_async() to PythonPredictor - Implement healthcheck_sync() with ThreadPoolExecutor + 5s timeout - Implement healthcheck_async() with asyncio.wait_for + 5s timeout - Wire up in PythonPredictHandler::healthcheck()

- Remove [coglet_rust] skip from existing sync healthcheck tests - Add async healthcheck tests: - healthcheck_async_custom: async healthcheck returning True - healthcheck_async_unhealthy: async healthcheck returning False - healthcheck_async_exception: async healthcheck raising exception - healthcheck_async_timeout: async healthcheck timing out (>5s)

Python type fixes: - _adt.py: Fix type hints for PrimitiveType methods to handle Any - config.py: Add type arguments to dict types - input.py: Add cast for default_factory, add type ignore for field() - coder.py: Rename factory parameter from cls to tpe (static method) - coders/*.py: Match renamed parameter in factory method overrides - http.py: Add type ignores for dynamic FastAPI types and coglet module - _inspector.py: Remove unused imports, add 'from None' to re-raises Makefile: - Update tox env from typecheck-pydantic2 to typecheck (pydantic removed) Cleanup: - Remove unused warnings import from _inspector.py - Remove experimental coders warning

- Change timeout format from {} to {:.1} to output '5.0' instead of '5' - Update test harness waitForServer to accept UNHEALTHY and BUSY as valid 'ready' states

- Remove Pydantic compat code from cog.Path - Update README, docs/python.md, docs/llms.txt - Clean up comments referencing pydantic

- Remove pydantic from dependencies in pyproject.toml - Simplify dependencies to minimal set - Remove PYDANTIC_V2 constant from pyright config - Delete cog-dataclass/ directory (was scaffold, code now in python/cog/)

- Remove unused Type import from types.py - Remove pydantic from Go dockerfile test expectation - Remove pydantic comment from requirements_test.go - Fix pyright warnings in openapi_schema.py (use Any type) - Sanitize validation error messages to first line only

Use prediction to trigger slow healthcheck mode instead of relying on call counting, which was flaky due to harness also calling healthcheck.

Use ThreadPoolExecutor with shutdown(wait=False) to avoid blocking when sync healthcheck exceeds timeout. Previously the context manager would wait for the thread to complete even after timeout.

Add _sanitize_validation_message() that only passes through known safe validation patterns (Field required, Invalid value, fails constraint, does not match regex/choices). Unknown messages are replaced with generic 'Invalid value' to prevent potential stack trace or internal details from reaching clients. This addresses CodeQL security warning about information exposure.

michaeldwan

This beefy PR looks good. It's a lot, but no blocking issues... so @tempusfrangit merge it and we'll chat through a few gaps in test coverage that were accidentally covered in the deleted tests.

michaeldwan · 2026-02-04T18:58:46Z


  test-coglet-python:
-    name: "Test Coglet Python bindings (${{ matrix.runtime }})"
+    name: "Test Coglet Python bindings"


Probably out of scope for this PR, but I wonder if we should be testing this, and all the other python code, in a matrix across supported python versions.

It should be something we consider but if we fail on any version of python it's a bug in Maturin/pyo3 since we're compiling to pure ABI3, 3.10+

tempusfrangit added 16 commits February 3, 2026 13:19

chore: remove pydantic-based cog implementation

8ca3b9d

Remove the legacy pydantic-based Python SDK to prepare for the dataclass-based implementation. This includes all server code, type definitions, and associated tests.

test: remove pydantic-specific integration tests

198a563

Delete tests that specifically test pydantic 1.x/2.x behavior which is no longer relevant with the dataclass-based implementation.

test: unskip complex_output test

f2af071

The dataclass implementation handles Pydantic BaseModel outputs via duck-typing - it checks for model_dump() (v2) or dict() (v1) methods in cog/json.py:make_encodeable(). Users can still use Pydantic for their own model types.

test: unskip setup_subprocess_multiprocessing test

5379e46

Remove obsolete skips - the test uses Python 3.10 which is supported. Verified passing with both Python and Rust coglet servers.

test: unskip torch_baseimage tests

0e3e77c

Remove obsolete skips - the tests use Python 3.10 which is supported. These are slow tests that will run in CI (not -short mode).

test: unskip build_cog_version_match test

333967a

Remove obsolete skips. This test verifies cog version in base images. Verified passing with both Python and Rust coglet servers.

test: remove coglet_alpha skips from integration tests

5fb9af8

coglet_alpha is no longer a supported configuration - remove all skips.

test: update complex_output to use cog.BaseModel instead of pydantic.…

b30904b

…BaseModel pydantic.BaseModel outputs are no longer supported. Users should use cog.BaseModel (a dataclass) or @DataClass for structured outputs.

fix(coglet): correct healthcheck timeout message format and harness

1848c26

- Change timeout format from {} to {:.1} to output '5.0' instead of '5' - Update test harness waitForServer to accept UNHEALTHY and BUSY as valid 'ready' states

tempusfrangit requested a review from a team as a code owner February 4, 2026 00:34

tempusfrangit marked this pull request as draft February 4, 2026 00:34

github-advanced-security AI found potential problems Feb 4, 2026

View reviewed changes

Comment thread python/cog/server/http.py Dismissed

tempusfrangit added 3 commits February 3, 2026 18:48

docs: remove all Pydantic references

edf1eb2

- Remove Pydantic compat code from cog.Path - Update README, docs/python.md, docs/llms.txt - Clean up comments referencing pydantic

chore: remove pydantic dependency and cog-dataclass scaffold

e9bc372

- Remove pydantic from dependencies in pyproject.toml - Simplify dependencies to minimal set - Remove PYDANTIC_V2 constant from pyright config - Delete cog-dataclass/ directory (was scaffold, code now in python/cog/)

tempusfrangit force-pushed the feat/dataclass-cog branch from 4cce837 to 69745a5 Compare February 4, 2026 02:49

tempusfrangit added 3 commits February 3, 2026 18:56

test: fix healthcheck timeout tests to use trigger-based approach

1aa2921

Use prediction to trigger slow healthcheck mode instead of relying on call counting, which was flaky due to harness also calling healthcheck.

fix: remove unused imports in openapi_schema.py

64fc84b

fix: properly timeout sync healthchecks in Python server

5572f98

Use ThreadPoolExecutor with shutdown(wait=False) to avoid blocking when sync healthcheck exceeds timeout. Previously the context manager would wait for the thread to complete even after timeout.

tempusfrangit marked this pull request as ready for review February 4, 2026 04:29

tempusfrangit requested review from markphelps and michaeldwan February 4, 2026 04:31

tempusfrangit force-pushed the feat/dataclass-cog branch from 8a1c2e7 to a95882c Compare February 4, 2026 04:42

tempusfrangit force-pushed the feat/dataclass-cog branch from a95882c to 75759b1 Compare February 4, 2026 04:49

tempusfrangit force-pushed the feat/dataclass-cog branch from ca4bd72 to 3c81266 Compare February 4, 2026 05:39

michaeldwan approved these changes Feb 4, 2026

View reviewed changes

tempusfrangit merged commit df576ff into main Feb 4, 2026
31 of 32 checks passed

tempusfrangit deleted the feat/dataclass-cog branch February 4, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Replace Pydantic with native Python dataclasses for cog.BaseModel#2681

feat: Replace Pydantic with native Python dataclasses for cog.BaseModel#2681
tempusfrangit merged 23 commits intomainfrom
feat/dataclass-cog

tempusfrangit commented Feb 4, 2026

Uh oh!

Uh oh!

michaeldwan left a comment

Uh oh!

michaeldwan Feb 4, 2026

Uh oh!

tempusfrangit Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tempusfrangit commented Feb 4, 2026

Motivation

Key Changes

Python SDK (python/cog/)

Rust Coglet (crates/)

Integration Tests

Removed Features

Breaking Changes

Uh oh!

Uh oh!

michaeldwan left a comment

Choose a reason for hiding this comment

Uh oh!

michaeldwan Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

tempusfrangit Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Python SDK (`python/cog/`)

Rust Coglet (`crates/`)