🚀 Added

@Erol444

🚀 Added

🌀 Execution Engine `v1.8.0`

Steps gated by control flow (e.g. after a ContinueIf block) can now run even when they have no data-derived lineage — meaning they don't receive batch-oriented inputs from upstream steps. Lineage and execution dimensionality are now derived from control flow predecessor steps. Existing workflows are unaffected.

🔀 Control flow lineage — The compiler now tracks lineage coming from control flow steps (e.g. branches after ContinueIf). When a step has no batch-oriented data inputs but is preceded by control flow steps, its execution slices and batch structure are taken from those control flow predecessors.
🔓 Loosened compatibility check — Previously, steps with control flow predecessors but no data-derived lineage would fail at compile time with ControlFlowDefinitionError. That check is now relaxed: lineage is derived from control flow predecessors when no input data lineage exists. The strict check still runs when the step does have data-derived lineage.
✨ New step patterns — Steps triggered only by control flow that don't consume batch data now compile and run correctly. For example, you can send email notifications or run other side-effect steps after a ContinueIf without wiring any data into parameters like message_parameters — the step will execute once per control flow branch.
🐛 Batch.remove_by_indices nested batch fix (breaking) — When removing indices via Batch.remove_by_indices, nested Batch elements are now recursively filtered by the same index set. Previously, only the top-level batch was filtered while nested batches were left unchanged, which could cause downstream blocks to silently process None values or fail outright.

Please review our change log 🥼 which outlines all introduced changes. PR: #2106 by @dkosowski87

Warning

One breaking change is included due to a bug fix in Batch.remove_by_indices with nested batches (see below) — impact is expected to be minimal.

🚧 Maintenance

fix version for exe/dmg download by @Erol444 in #2104
Qwen3.5 Block Safety by @Matvezy in #2111
Fix ONNX batch size limit fast path for fixed-batch models by @grzegorz-roboflow in #2112
Redirect versionless models auth and metadata to new model registry when USE_INFERENCE_MODELS=True by @PawelPeczek-Roboflow in #2105
feat: expose stream/pipeline metrics via Prometheus /metrics endpoint by @alexnorell in #2097
Fix aliases mixup between rfdetr-xlarge and rfdetr-2xlarge by @Erol444 in #2054
metadata on upload to dataset block by @digaobarbosa in #2064
Qwen3 5 docs by @Erol444 in #2114
Update execution engine version in tests to 1.8.0 by @dkosowski87 in #2115

Full Changelog: v1.1.0...v1.1.1

@Erol444

ℹ️ About `1.1.0` release

This inference release brings important changes to the ecosystem:

We have deprecated Python 3.9 which reached EOL
We have not made inference-models the default backend for running predictions - this change is postponed until version 1.2.0.

🚀 Added

🧠 Qwen3.5

Thanks to @Matvezy, inference now supports the new Qwen3.5 model.

Qwen3.5 is Alibaba's latest open-source model family (released Feb 2026), ranging from 0.8B to 397B parameters. The headline features are native multimodal (text + vision) support. inference and Workflows support small 0.8B parameters version.

Model is available only with inference-models backend - released in inference-models 0.20.0

🪄 GPT-5.4 support

Thanks to @Erol444, the LLM Workflows block now supports GPT-5.4, keeping inference current with the latest OpenAI model lineup.

⚙️ Selectable inference backend for batch processing

Following up on inferemce 1.0.0 release, Roboflow clients can now select which inference backend is used for batch processing — giving more fine-grained control when mixing legacy and new engine workloads.

Using inference-cli, one can specify which models backend will be selected inference-models or old-inference.

inference rf-cloud batch-processing process-images-with-workflow \
    --workflow-id <your-workflow> \
    --batch-id <your-batch> \
    --api-key<your-api-key> \
    --inference-backend inference-models
# or - for videos
inference rf-cloud batch-processing process-videos-with-workflow \
    --workflow-id <your-workflow> \
    --batch-id <your-batch> \
    --api-key<your-api-key> \
    --inference-backend inference-models

The same can be configured in Roboflow App and via HTTP integration - check out swagger

Caution

Currently, the default backend is old-inference, but that will change in the nearest future - Roboflow clients should verify new backend and make necessary adjustments in their integrations if they want to still use old-inference backend.

🦺 Maintanence

🐍 Drop of Python 3.9 and upgrade to `transformers>=5`

We've ported all public builds to work with versions of Python newer than 3.9, which was slowing us down when it comes to onboarding new features. Thanks to deprecation, we could migrate to transformers>=5 and enable new model - Qwen 3.5.

Other changes

Fix theme build by @Erol444 in #2093
fix docs.yml to correctly build css by @Erol444 in #2095
fix: pinned models no longer block LRU eviction by @hansent in #2091
Add support for pushing back to client HTTP 402 errors by @PawelPeczek-Roboflow in #2099
Add env flag to globally disable selected inference-models backends by @PawelPeczek-Roboflow in #2096
Added RF-DETR by @Erol444 in #2098
Qwen3 5 improvements by @Matvezy in #2101
Fix OpenCV ffmpeg/gstreamer support in JP51 core image by @alexnorell in #2100
Release/1.1.0 by @PawelPeczek-Roboflow in #2102

Full Changelog: v1.0.5...v1.1.0

@hansent

What's Changed

feat: add model cold start, model/workflow/workspace ID response headers by @hansent in #2052
Support yololite object detection in inference_models with ONNX backend by @leeclemnet in #2078
Fix the input parameter types accepted by the Ethernet IP PLC block for PLC reads/writes by @shntu in #2061
Try to address TRT issue by @PawelPeczek-Roboflow in #2079
Release new inference-models by @PawelPeczek-Roboflow in #2084
Email message serialization fix by @dkosowski87 in #2083
Support New Roboflow API Usage Paused Error 423 by @maxschridde1494 in #2082
Expose /healthz and /readiness endpoints even if API_KEY is not set by @ecarrara in #2077
feat: inference_models adapters respect countinference for credit verification bypass by @hansent in #2081
Update docs by @Erol444 in #2076
Reduce flash-attn MAX_JOBS to 1 for JP7.1 build by @alexnorell in #2068
Bump the npm_and_yarn group across 2 directories with 10 updates by @dependabot[bot] in #2085
Fix shared model cache race conditions causing pod crashes by @hansent in #2080
fix inference-models pypi publishing by @grzegorz-roboflow in #2086
Qwen3 5 and move to transformers 5 by @Matvezy in #2070
Correct resize procedure for RF-DETR models trained on versions with non-stretch, non-square resize by @mkaic in #2067
ENT-969: Add TestPatternStreamProducer as a built-in video source type by @NVergunst-ROBO in #2056
fix: handle expired Redis lock release gracefully by @rafel-roboflow in #2060
Revert/qwen 3.5 by @PawelPeczek-Roboflow in #2087
feat: gate structured access logging behind STRUCTURED_API_LOGGING env var by @hansent in #2088
Deploy inference-models-0.19.5 by @PawelPeczek-Roboflow in #2089

New Contributors

@maxschridde1494 made their first contribution in #2082
@ecarrara made their first contribution in #2077

Full Changelog: v1.0.4...v1.0.5

@leeclemnet

What's Changed

Update sam3_3d tdfy commit to latest main by @leeclemnet in #2050
skip /usage/plan request when api key is not provided by @rafel-roboflow in #2059
Fix issue with rfdetr-segmentation class remapping by @PawelPeczek-Roboflow in #2075

Full Changelog: v1.0.3...v1.0.4

@alexnorell

What's Changed

Fix JP7.1 container build OOM during ORT compilation by @alexnorell in #2065
Add AV codec dependencies to base image by @shntu in #2039
Fix: Updated aiohttp to >=3.13.3 to address CVEs (#1949) by @thchann in #2069
Add upper-bound constraints for aiohttp by @PawelPeczek-Roboflow in #2071
Change the ranking priority for AutoLoader - ONNX packages over Torch by @PawelPeczek-Roboflow in #2047
Loosening typing-extensions dependency by @PawelPeczek-Roboflow in #2072
Prepare inference 1.0.3 release by @PawelPeczek-Roboflow in #2073

New Contributors

@thchann made their first contribution in #2069

Full Changelog: v1.0.2...v1.0.3

@alexnorell

What's Changed

Add single-tenant workflow cache mode and thread workflow_version_id across the stack by @alexnorell in #2031
Add JetPack 7.1 container build workflow and CLI support by @alexnorell in #2032
Fix: Set task_type for SegmentAnything3_3D_Objects by @leeclemnet in #2030
feat(workflows): support custom image names in dataset upload block by @rafel-roboflow in #2034
Expose inference configuration flags for sam3-3d by @leeclemnet in #2040
feat(sam3): enable SDK-based remote execution for SAM3 workflow blocks by @hansent in #2042
Add examples/sam-3d notebooks by @leeclemnet in #2043
Add per-request 100ms duration floor via internal execution header by @hansent in #2037
bugfix: fix version field in polygon and halo v2 visualization block manifests by @lrosemberg in #2044
Fix large weights cdn download issue by @Matvezy in #2046
Fix torch.compile for sam3-3d by @leeclemnet in #2041
Fix overlapping parameter in inference-cli by @PawelPeczek-Roboflow in #2038
Bug/dg 306 wrong workflow that doesnt raise error and provokes 500 by @rafel-roboflow in #2036
Add output from mask measurement block to label visualization by @jeku46 in #2035
feat: add PINNED_MODELS and PRELOAD_API_KEY for preload on serverless by @hansent in #2048
Bump version to 1.0.2 by @PawelPeczek-Roboflow in #2051

Full Changelog: v1.0.1...v1.0.2

@PawelPeczek-Roboflow

What's Changed

Fix issue with RF-Detr model post-processing in TRT by @PawelPeczek-Roboflow in #2029

Full Changelog: v1.0.0...v1.0.1

@leeclemnet

🚀 Added

💪 `inference 1.0.0` just landed 🔥

We are excited to announce the official 1.0.0 release of Inference - which was announced 2 weeks ago with 1.0.0rc1 preview release.

Over the past years, Inference has evolved from a lightweight prediction server into a widely adopted runtime powering local deployments, Docker workloads, edge devices, and production systems. After hundreds of releases, the project has matured — and so has the need for something faster, more modular, and more future-proof.

inference 1.0.0 closes one chapter and opens another. This release introduces a new prediction engine that will serve as the foundation for future development.

⚡ New prediction engine: `inference-models`

We are introducing inference-models, a redesigned engine to run models focused on:

faster model loading and inference
improved resource utilization
better modularity and extensibility
cleaner separation between serving and model runtime
support from different backends - including TensorRT

Important

With inference 1.0.0 we released also first stable build of inference-models 0.19.0. You can use the engine in inference - just set env variable USE_INFERENCE_MODELS=True

Caution

The new inference-models engine is wrapped with adapters - to serve as dropdown replacement for old engine. We are making it default engine on Roboflow platform, but clients running inference locally have the USE_INFERENCE_MODELS set to False by default. We would like all clients to test the new engine - when the flag is not set, inference works as usually.
In approximately 2 weeks, with inference 1.1.0 release - we will make inference-models default engine for everyone.

Caution

inference-models is completely new backend, we've fixed a lot of problems and bugs. As a result - predictions from your model may be different - but according to our tests, quality-wise they are better. That being said, we still may have introduced some minor bugs - please report us any problems - we will do our best to fix problems 🙏

🛣️ Roadmap

Todays release is just a start for broader changes in inference - the plan for the future is the following:

shortly after release, we will complete our work around Roboflow platform - including migration of small fraction of models not onboarded into new registry used by inference-models and adjusting automations on the platform - until finished, clients who very recently uploaded or renamed models may be impacted by HTTP 404 - contact us to receive support in such cases.
there will be consecutive hot-fixes (if needed) - released as 1.0.x versions.
clients running inference locally should test inference-models backend now, as in approximately 2 weeks, inference-models will become default engine
We have still some work to do in 1.x.x - mainly to provide patches - but we start a march towards 2.0, which should bring new quality for other components of inference - stay tuned for updates.
You should expect that new contributions to inference will be based on inference-models engine and may not work if you don't migrate.

Caution

One of the problem we have not addressed in 1.0.0 is models cache purge - new inference-models engine uses different structure of the local cache than old engine. As a result - inference server with USE_INFERENCE_MODELS=True does not perform clean-up on volume with models pulled from the platform. If you run locally, generally that should not be an issue, since we expect clients only use limited number of different models in their deployments.
If you use large amount of models or when your disk space is tight, running new inference you should perform periodic clean-ups of /tmp/cache. This issue will be addressed before 1.1.0 release.

🎨 Semantic Segmentation in `inference`

Thanks to @leeclemnet, DeepLabV3Plus segmentation model was onboarded to inference and can be used by clients.

📐 Area Measurement block 🤝 Workflows

Thanks to @jeku46 we can now measure area size with Workflows.

🚧 Maintanence

add missing ffmpeg package for dev by @rafel-roboflow in #2009
fix expose sam3 with proper envs by @rafel-roboflow in #2011
Detections Class Replacement support for strings by @Erol444 in #2000
fix: Send termination_reason via data channel on WebRTC stream timeout by @balthazur in #2008
Remove content length validation to allow for chunked responses by @dkosowski87 in #2015
added processing_timeout support to webrtc's StreamConfig dataclass by @Erol444 in #2017
fix: Return 400 instead of 500 for raw bytes sent as base64 image by @bigbitbus in #2016
Added claude sonnet 4.6 by @Erol444 in #2014
Fix mkdocs-macros Jinja2 syntax errors in generated block docs by @yeldarby in #2012
Add remote GPU processing time collection and forwarding by @hansent in #2007
Add semantic-segmentation endpoints + deep_lab_v3_plus by @leeclemnet in #2018
Update CODEOWNERS: Add dkosowski87 and reorganize team assignments by @hansent in #2021
Add support for gemini 3.1 pro in gemini block by @Erol444 in #2024
Add area_measurement workflow block by @jeku46 in #2013
Auto-detect Jetson JetPack version in CLI server start by @alexnorell in #1958
Ged rid of unstable assertions on predictions in e2e tests by @PawelPeczek-Roboflow in #2026
ENT-884: Add workflow_version_id support to inference pipeline by @NVergunst-ROBO in #2022
Add JetPack 7.1 support for NVIDIA Thor by @alexnorell in #1935

🏅 New Contributors

@dkosowski87 made their first contribution in #2015
@leeclemnet made their first contribution in #2018

Full Changelog: v0.64.8...v1.0.0

@Erol444

💪 Added

Fisheye cameras in camera calibration block by @Erol444 in #1996
Calibration block was supporting polynomial calibration which is not handling fisheye distortions well. This change adds support for fisheye calibration.

Heatmap block by @Erol444 in #1986
This change adds heatmap block (uses supervision's heatmap annotator), which supports both:

detections, so heatmap based on where detections were
tracklets, which ignores stationary objects (default: on), so we heatmap the movements not the objects

heatmap2.mp4

🚧 Maintanence

temporarily pin z3-solver version by @grzegorz-roboflow in #1990
Code workflow block icon issue by @Erol444 in #1988
Optimize cosine_similarity by @KRRT7 in #1989
add inference version to the request headers by @japrescott in #1985
Fix video frame count estimation by detecting actual FPS from uploaded video by @rafel-roboflow in #1992
Mark file processing in webrtc worker for downstream blocks to pick frame timestamp correctly by @grzegorz-roboflow in #1995
add frame size to webrtc video metadata by @rafel-roboflow in #1997
enable gzip compression by default by @rafel-roboflow in #1998
WIP: enabled sam3 visual segment by @rafel-roboflow in #1975
added ffmpeg to docker dependencies by @rafel-roboflow in #2002
rename seg-preview to sam3 by @rafel-roboflow in #2005
Fix RF-DETR-Seg mask postprocessing for letterboxed input case by @mkaic in #2001
Enable inference pipeline api on jetpack 6.2.0 by @grzegorz-roboflow in #2006

Full Changelog: v0.64.7...v0.64.8

`inference 1.0.0rc1` — Release Candidate

Today marks an important milestone for Inference.

Over the past years, Inference has grown from a lightweight prediction server into a widely adopted runtime used across local deployments, Docker, edge devices, and production systems. Hundreds of releases later, the project has matured significantly — and so has the need for a faster, more modular, and future-proof.

inference 1.0.0rc1 is a preview of 1.0.0 release which will close one chapter and open another - this release introduces a new prediction engine that will become the foundation for all future development.

🚀 New prediction engine - `inference-models`

We are introducing inference-models, a redesigned execution engine focused on:

faster model loading and inference
improved resource utilization
better modularity and extensibility
cleaner separation between serving and model runtime
stronger foundations for future major versions

The engine is already available today in:

inference-models package → 0.18.6rc8 (RC)
inference package and Docker → enabled with env variable

USE_INFERENCE_MODELS=True

inference-models wrapped within old inference is a drop-down replacement. This allows testing the new runtime without changing existing integrations.

Important

Predictions from your models may change - but generally for better! inference-models is completely new engine for running models, we have fixed a lot of bugs and make it multi-backend - capable to run onnx, torch and even trt models! It automatically negotiate with Roboflow model registry to choose best package to run in your environment. We have already migrated almost all Roboflow models to new registry - working hard to achieve full coverage soon!

📅 What happens next

Next week
- Stable Inference 1.0.0
- Stable inference-models release
- Roboflow platform updated to use inference-models as the default engine
In the coming weeks
- inference-models becomes the default engine for public builds (USE_INFERENCE_MODELS becomes opt-out, not opt-in)
- continued performance improvements and runtime optimizations

🔭 Looking forward - the road to `2.0`

This engine refresh is only the first step.
We are starting work toward Inference 2.0, a larger modernization effort similar in spirit to the changes introduced with inference-models.

Stay tuned for future updates!

Releases: roboflow/inference

v1.1.1

🚀 Added

🌀 Execution Engine v1.8.0

🚧 Maintenance

Contributors

Uh oh!

v1.1.0

ℹ️ About 1.1.0 release

🚀 Added

🧠 Qwen3.5

🪄 GPT-5.4 support

⚙️ Selectable inference backend for batch processing

🦺 Maintanence

🐍 Drop of Python 3.9 and upgrade to transformers>=5

Other changes

Contributors

Uh oh!

v1.0.5

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.4

What's Changed

Contributors

Uh oh!

v1.0.3

What's Changed

New Contributors

Contributors

Uh oh!

v1.0.2

What's Changed

Contributors

Uh oh!

v1.0.1

What's Changed

Contributors

Uh oh!

v1.0.0

🚀 Added

💪 inference 1.0.0 just landed 🔥

⚡ New prediction engine: inference-models

🛣️ Roadmap

🎨 Semantic Segmentation in inference

📐 Area Measurement block 🤝 Workflows

🚧 Maintanence

🏅 New Contributors

Contributors

Uh oh!

v0.64.8

💪 Added

🚧 Maintanence

Contributors

Uh oh!

v1.0.0rc1

inference 1.0.0rc1 — Release Candidate

🚀 New prediction engine - inference-models

📅 What happens next

🔭 Looking forward - the road to 2.0

Uh oh!

🌀 Execution Engine `v1.8.0`

ℹ️ About `1.1.0` release

🐍 Drop of Python 3.9 and upgrade to `transformers>=5`

💪 `inference 1.0.0` just landed 🔥

⚡ New prediction engine: `inference-models`

🎨 Semantic Segmentation in `inference`

`inference 1.0.0rc1` — Release Candidate

🚀 New prediction engine - `inference-models`

🔭 Looking forward - the road to `2.0`