[Workflow] Optimize workflow flush query for pathological cases
Overhaul workflow flushing query to avoid lateral sub-queries, minimize buffer hits, avoid repeated TOAST reads, and use a more efficient hash join for same-workflow dependencies.
For a 10_000->1 fan-in workflow the result is 31x faster, with 222x fewer buffer hits.
[Pro] Fix bootstrap file checking on Windows
File checks failed on Windows because Mix copies files to _build instead of symlinking. The
NIF was navigating from priv_dir up to find source files, which only worked when symlinks
resolved to deps.
[Refresh] Gracefully check for hex availability
Ensure hex is available before attempting to update in the oban_pro.refresh task.
[Pro] Update app modules list after compiling sources
Encrypted modules handled by the :oban_pro compiler weren't included in the .app file
because it was generated before they were compiled. Releases use the modules list to determine
what to load at boot, causing releases to crash with UndefinedFunctionError for encrypted
modules.
[Pro] Fix Windows NIF loading in bootstrap module
On Windows, BEAM passes NIF function pointers via a callback table at load time instead of resolving them through dynamic linking. The bootstrap module now properly defines, resolves, and casts callbacks for cross-platform operation.
[Migration] Stop adding suspended state to the oban_jobs_state enum
The suspended state must be added by Oban's v14 migration prior to running the Pro 1.7.0
migration. Postgres prohibits altering a type and referencing it within the same transaction,
and the upgrade could fail.
[Workflow] Use t:add_cascade_opts/0 for add_graft/4
The add_graft/4 function accepts workflow opts in addition to job opts, but only specified the
t:add_opts/0 type.
chunk_id generation for smoother upgrade to v1.7Generate a chunk_id at job insertion time and store it in meta. This pre-populates the
identifier that Pro v1.7 uses for optimized chunk queries, ensuring jobs inserted on v1.6 will
work seamlessly immediately after upgrading. Pro v1.7 will backfill jobs with a missing chunk_id
after a minute or so, but does so in batches.
[Worker] Conditionally add @impl to worker opts/0
Oban v2.21 marked c:Worker.__opts__/0 as a callback and made it public. Now, Oban.Pro.Worker
marks the __opts__/0 function as a callback for Oban v2.21, but not for older versions.
[Unique] Include suspended state in unique bmp mapping
The suspended state is available for uniqueness as of Oban v2.21, but it was lacking from the
uniq bmp mapping. The state is now supported to prevent errors when inserting unique jobs.
[Pro] Use suspended state for workflow and chain tracking
Jobs waiting on workflow or chain dependencies now use a proper suspended job state instead of
the previous on_hold psuedo-state.
This provides cleaner state semantics, better query performance through simplified indexes, and
enables the database triggers to track workflow state counts accurately. The scheduled_at
timestamp is preserved directly on suspended jobs, eliminating the need for orig_scheduled_at
in meta.
[Pro] Add usage rules for agentic coding assistants
Ship reference documents that help coding agents understand Pro's idioms and best practices. Rules cover workers, queues, composition primitives (workflows, batches, chains, chunks), plugins, and testing.
[Chunk] Optimize queries with centralized index and better job acking
Use a pre-computed chunk_id for chunk tracking, enabling a partial index for much faster
chunk lookups. This eliminates dynamic query construction based on partitioning fields in favor
of direct chunk_id matching.
Chunks now use a single SQL operation for acking , reducing database round-trips when completing, cancelling, or retrying jobs within a chunk. The new acking operation has better compatibility with non-Postgres databases such as CockroachDB.
[Chunk] Add :snooze support to chunk workers
Chunks can now selectively snooze jobs using {:snooze, period, jobs} or selectively snooze
some jobs with snooze: {period, jobs} in the keyword list result. Snoozed jobs are rescheduled after the specified
period, while unlisted jobs complete normally.
[DynamicCron] Add get/2 function for fetching entries by name
Provides a convenient way to retrieve a single cron entry without fetching all entries. Accepts either a worker module or custom string name and returns {:ok, entry} or {:error, message}.
[DynamicLifeline] Improve workflow rescue accuracy and remove limit
The DynamicLifeline's workflow rescue mechanism now queries the aggregate table for workflows with suspended jobs, providing more accurate detection than scanning the jobs table for suspended jobs alone.
This catches edge cases where workflows are stuck but their suspended jobs may have been lost or
deleted. Legacy on_hold workflows that predate the aggregate table are still found via a
direct jobs table query.
Only workflows that have been executing for more than a minute are candidates for rescuing by default.
[DynamicLifeline] Automatically repair chunk jobs missing chunk_id
Chunk workers now use a pre-computed chunk_id for grouping. Jobs created before this change
won't have a chunk_id in their metadata, which would prevent them from being grouped correctly.
The DynamicLifeline plugin now automatically computes and sets the chunk_id for any chunk jobs
that are missing it, similar to how it repairs missing partition_key values for partitioned
queues.
[DynamicPruner] Add configurable preserve_workflows option
Allow disabling workflow job preservation during pruning via the new preserve_workflows
option, which defaults to true for backwards compatibility. When disabled, jobs are pruned
regardless of whether their workflow is still active. This is useful for large workflows that
naturally run longer than a pruning cycle.
[Migration] Replace generated columns with expression indexes
Generated columns for uniq_key and partition_key were originally added for CockroachDB
compatibility but introduced unnecessary complexity and excessive table locking for large
tables. This change replaces them with expression indexes directly on the meta JSONB fields.
A generated_columns migration option is still available for apps that are running CockroachDB
and need the old functionality. The Smart engine detects which mode is being used and handles
conflicts accordingly.
[Migration] Add partial indexes for pruning and staging
Partial indexes reduce index size and improve query performance by only indexing rows that match
the filter condition. This adds partial indexes for terminal job states and a staging index for
jobs ready to transition to available.
Also fixes completed job pruning to use completed_at instead of scheduled_at, which is the
semantically correct timestamp for determining job age.
The new staging query may perform 2-10x faster depending on the number of jobs and overall state distribution.
[Rate Limit] Add centralized module for using rate limits outside of job execution
The module's functions allow checking, resetting, and consuming rate limits from running queues.
The consume/3 function is fully distributed and spreads consumption across multiple nodes when
a single producer cannot satisfy the request.
[Smart] Add auto_space option to spread out bulk inserts
When inserting large batches of jobs, auto_space schedules each batch at
increasing intervals. This prevents overwhelming queues when jobs can't
all execute immediately.
[Smart] Add on_conflict: :skip option for bulk inserts
Support skipping unique conflicts without row locking during insert_all. When enabled, conflicting jobs are silently skipped and only newly inserted jobs are returned. This improves performance for high-throughput scenarios where tracking conflicts isn't needed.
[Smart] Add transaction option for bulk insert atomicity
Support transaction: :per_batch to commit each batch independently during insert_all/2.
Previously inserted batches persist even if a later batch fails. The default transaction: :all
preserves the existing all-or-nothing behavior.
[Smart] Add telemetry sub-spans for engine fetch_jobs
Instrument the fetch_jobs transaction with nested telemetry spans for granular observability
into acking, flushing, demand calculation, and job fetching. Each sub-span emits standard span
events under [:oban, :engine, :fetch_jobs, :ack | :flush | :demand | :fetch].
[Smart] Support selecting between multiple rate limiting algorithms
Rate limited queues can select from :sliding_window, :fixed_window, and :token_bucket
algorithms to control how rate quotas are consumed. The :sliding_window algorithm remains the
default.
[Smart] Add :fixed_window rate limiting algorithm
Introduce a fixed window algorithm, which resets the count when the period expires rather than using weighted averaging.
[Smart] Add token bucket rate limiting algorithm
Introduce token bucket algorithm that refills tokens continuously at a fixed rate rather than
resetting at period boundaries. Tokens refill at allowed / period per second, providing
smoother rate limiting with natural burst handling.
[Worker] Add @impl declaration for worker __opts__/0
Mark Oban.Pro.Worker.__opts__/0 as implementing the new __opts__/0 public callback from
Oban.Worker.
[Worker] Add on_cancelled/2 and on_discarded/2 worker hooks
Introduce two new worker callbacks that fire when jobs are cancelled or discarded, regardless of how the state transition happens:
on_cancelled/2 receives :dependency or :manual reasonon_discarded/2 receives :exhausted reasonBoth callbacks work with global hooks via attach_hook/1 and module-level hooks via the
:hooks option.
[Worker] Apply structured args timeout/1 and backoff/1
Automatically apply encryption and structuring before calling user defined timeout/1 or
backoff/1 implementations. This allows pattern matching on structured args without any code
changes.
[Worker] Variable weight rate limit tracking via options and callback
Add support for job weights in rate limiting, allowing jobs to consume variable amounts of rate limit capacity:
use Oban.Pro.Worker, rate: [weight: 5]Worker.new(args, rate: [weight: 3])weight/1 for dynamic weight calculation at dispatch timeJobs with higher weights consume more rate limit capacity, enabling fine-grained control over resource-intensive operations.
[Workflow] Optimize workflow flushing with de-duplication
Restructure workflow flushing to compute dependency states once per unique dependency rather than once per job-dep combination. This eliminates the M*N scaling problem when M jobs share common dependencies.
Benchmarks show ~2x faster execution, 7x fewer buffer hits, and 15x fewer index scans for workflows with shared dependencies.
[Workflow] Flushing is optimized to load minimal data up front
Only the exact data needed for workflow flush operations is loaded from the database, rather
than the entire job structure. This saves data over the wire, serialization overhead, and memory
usage for active workflows or jobs with large args, errors, or meta.
The full job structure is loaded asynchronously when cancellation callbacks are needed.
[Workflow] Add table for centralized workflow tracking
Introduces a dedicated table to track workflow metadata and job state counts, replacing expensive aggregation queries with precomputed values. This improves performance for large workflows and enables efficient filtering/sorting in Oban Web.
[Workflow] Add unique workflow support to prevent duplicates
Workflows can now be created with unique: true to prevent multiple workflows with the same
name from running concurrently. When a duplicate unique workflow is inserted, its jobs are
marked with conflict?: true instead of being inserted.
[Pro] Packages are distributed with encrypted source code
Pro packages are encrypted, with licenses that stay fresh for 30 days. Development remains seamless, so documentation, type signatures, and LSP integration all work normally.
Enterprise license holders receive unencrypted source code.
See the Upgrade Guide for details on checking license status and refreshing.
[DynamicPartitioner] Deprecate the DynamicPartitioner plugin
The complexity and edge cases introduced by partitioned tables far outweigh the benefits for most applications.
[Workflow] Deprecate after_cancelled/2 in favor of universal on_cancelled/2
The after_cancelled/2 callback is deprecated in favor of the universal on_cancelled/2 hook.
Currently, both hooks will be called if defined, and users should switch to on_cancelled/2.
[Migration] Fix migration check crash with dynamic repo config
The migration telemetry handler crashed on startup when users configured a placeholder repo
module with get_dynamic_repo providing the actual repo at runtime. The handler attempted to
call get_dynamic_repo/0 on the static repo before evaluating the dynamic repo callback.
[Refresher] Add error handling to producer record refreshing
Previously, if refresh_producers or cleanup_producers raised (e.g., due to a connection checkout timeout), the GenServer would crash and restart, causing missed heartbeats and timer resets. Now errors are caught and logged, allowing the refresh cycle to continue uninterrupted.
[Testing] Ensure ordered run_workflow/2 output
Always order workflow jobs by execution completion order to preserve sequential execution results.
[Worker] Trigger on_cancelled/2 when deadline force-cancels
When a job with deadline: [force: true] exceeds its deadline during execution, the
on_cancelled/2 hook is now called with :deadline as the reason. This allows workers to
perform cleanup or notifications when jobs are terminated due to deadline expiration.
[Oban] Add job diagnostics for runtime process inspection
Introduce runtime diagnostics for executing Oban.Pro.Worker jobs, allowing external tools like
Oban Web to request process information (stacktrace, memory, status) for running jobs via
PubSub.
[DynamicCron] Skip worker validation when deleting DynamicCron
When a worker module is removed from the codebase, it was previously impossible to delete its persisted cron entry using the delete: true option because validation would fail trying to load the module.
[Smart] Fix unique violation retry loop for similar states
Clearing unique violations could fail to find conflicting jobs when the job's current state
(e.g., retryable) was included in its unique states. This caused the clearing query not to
match, so nothing would be cleared, causing an infinite retry loop.
Retries are now limited and an error logged for excessive retry loops.
[Smart] Prevent chain race condition on concurrent insert
When two transactions concurrently insert jobs for the same chain without any prior jobs in the
chain, both could see "no existing job" and insert as available, violating sequential
guarantees.
[Smart] Include :meta options in t:partition/0
Using meta for partitioning has been supported for a while now, but the type was outdated.
[Workflow] Fix context for deps in nested sub-workflows
When using add_cascade with a fan-out tuple (e.g., {items, &fun/2}) inside a nested
sub-workflow that also has dependencies, the fan-out jobs would incorrectly receive the outer
workflow's context instead of their immediate parent sub-workflow's context.
[DynamicCron] Optimize cron entry insertion history tracking
The history tracking for cron entries retained too much information and was needlessly expensive to read and write. This simplifies the insertion format and dramatically simplifies tracking updates.
The optimization is backward compatible and existing arrays with history will continue to work, they'll just be replaced with single-element arrays on the next insertion.
[Workflow] Support fan-out tuple form in add_graft/4
The type specification for add_graft/4 indicated support for the cascade fan-out form
{Enum.t(), (any(), map() -> any())}, but only the single-function form was implemented.
Now you can fan out graft points the same way you would with add_cascade:
Workflow.new()
|> Workflow.add(:setup, SetupWorker.new(%{}))
|> Workflow.add_graft(:process, {items, &process_item/2}, deps: :setup)
|> Workflow.add(:finalize, FinalizeWorker.new(%{}), deps: :process)
[Workflow] Wait for arbitrarily nested grafted sub-workflows
Jobs depending on a graft now properly wait for dynamically appended sub-workflows at any
nesting depth. Previously, when a grafted workflow used append + add_many to create
additional jobs, the dependent job could run before those nested jobs completed.
The flush mechanism now tracks the complete ancestor chain so workflows flush correctly regardless of how deeply they are nested.
[Workflow] Inherit all context for nested sub-workflows
When using add_cascade with fan-out inside nested sub-workflows, context from intermediate
parent workflows was not accessible. Jobs would only see context from the outermost workflow,
skipping any workflows in between.
Now context is correctly inherited through any depth of workflow nesting. Each level's context is merged in order from outermost to innermost, with closer ancestors overriding values from farther ones.
[Smart] Ensure partition key cache never hits under load
The timed cache for partition keys could fail to return cached results when jobs were being inserted concurrently, causing repeated database queries for available partition keys. This removes stale telemetry handlers that were interfering with cache lookups.
[DynamicLifeline] Correctly filter empty partitions during rescue queries
The partition rescue query could incorrectly check producers with empty partitions after queue updates.
[Smart] Improve partition query by repairing unpartitioned jobs
Partitioning selects fewer keys, more accurately, to prioritize active partitions. This ensures partitions with higher priority or older jobs are processed first, while still distributing work across all active partitions.
Jobs in partitioned queues that are missing a partition key (such as jobs scheduled before a
queue was partitioned) are now repaired by DynamicLifeline automatically.
Overall, this change means fetching jobs in partitioned queues requires fewer resources even after bulk job inserts.
[Smart] Add telemetry and logging for unique repairs
Emit [:oban, :engine, :uniq_violation_repaired] telemetry on every unique repair and log an
error once per unique key to surface potential issues that could otherwise silently degrade
staging and fetching performance.
[Workflow] Preserve name for deeply nested sub-workflows
Preserve the sub-workflow's name for jobs from nested sub-workflows when composing workflows
with add_workflow/3. Previously, nesting a workflow containing add_many/3 inside another
add_workflow/3 would incorrectly overwrite the inner jobs' metadata.
[Workflow] Prevent context conflicts in deeply nested workflows
Preserve the workflow id for jobs from nested sub-workflows to prevent multiple context jobs
from sharing the same workflow_id. This eliminates an Ecto.MultipleResultsError when using
put_context/2 in workflows nested multiple levels deep via add_workflow/2.
[Workflow] Wait for dynamically appended jobs for grafted workflows
Jobs depending on a grafted workflow now correctly wait for all dynamically inserted jobs,
including those added via add_many/4 within apply_graft/2.
Previously, when a grafter job used apply_graft/2 with a workflow containing nested
sub-workflows (from add_many/4), dependent jobs would execute before the appended jobs
completed.
[Workflow] Track recursive graft dependencies from parent workflows
When a grafting job creates sub-grafts recursively, jobs depending on the original graft now correctly wait for all recursive grafts to complete.
The fix ensures nested graft jobs share the root graft's workflow_id so the parent's wildcard
dependency can find them.
[Worker] Validate structured args during update_job/3
Structured args weren't validated during update_job/3 calls, which could lead to errors when
the job eventually processed. Now, same validation is applied on update as when the job is built
with new/2
[Decorator] Correct typespec for the subset of unique opts for decorated jobs
The typespec for period, states, and timestamp were incomplete or too loose. The new
typespec references the types from Oban.Job.
[Smart] Fix inaccurate acking for externally modified jobs
When another node or process updates a job while it's executing, or inserts a conflicting unique job, the ack query could fail to lock or update. This could cause mismatches between tracked and actual job states.
[Worker] Ensure after_process/3 hooks trigger on cancellation
Executing jobs that were manually cancelled via Oban.cancel_job or similar mechanisms, didn't
have after_process/3 hooks triggered due to a caching issue. That situation is handled now and
hooks are called as expected.
[Refresher] Cleanup producers regardless of running queues
The refresher will now cleanup stale producers from any leader node, not just nodes that are running queues. This aims to ensure producers are cleaned up even with subtle misconfigurations, e.g. disabling queues without disabling plugins.
[Docs] Extract svg diagrams into separate files
Inline svg images involve a lot of text and can overwhelm an LLM's context window. This extracts the svgs out as separate assets and loads them dynamically instead.
[Workflow] Add atom_keys option to configure cascade context
The atom_keys option controls whether the keys in a cascade function's context map are
atomized or converted to strings. This helps with consistency for nested workflows, or when
atoms may not exist between nodes.
[Testing] Add missing functions to make a drop-in replacement for Oban.Testing
Add missing functions (build_job/3, perform_job/1,2, and with_testing_mode/2) to ensure
Oban.Pro.Testing is fully compatible with Oban.Testing. This allows seamless migration from
Pro without modifying existing test helpers or test code.
[Workflow] Flatten contexts within nested grafts
When building complex workflows with nested grafts (a graft within another graft), the recorded results from upstream jobs were being incorrectly nested in the context passed to downstream jobs. Now, nesting is applied correctly at any level of nesting.
[Workflow] Prevent deadlocks for queues processing workflows
Adds locking to prevent deadlocks when jobs from the same workflow or chain complete simultaneously on different nodes.
[Workflow] Correct condition guard in status query
The query used to expand the status for sub-workflows didn't make use of the existing workflow indexes, which could lead to poor performance in busy systems.
[DynamicCron] Compare inserted times in UTC for guaranteed cron
Prevent double triggering jobs for guaranteed cron during DST changes. Time is always compared in UTC, rather than shifting to the current timezone.
[Refresher] Prevent active producers from erroneous cleanup
When a queue's producer is stuck in a transaction retry loop, it isn't able to handle new messages, including periodic refresh requests. This would allow the producer record to become outdated in the database, at which point it is subject to erroneous cleanup.
Refreshing now proactively discovers and refreshes all active producers through the Registry, rather than relying on individual producers messages.
[Workflow] Explicitly group workflow conditions for flushing query
The boolean grouping of statements in a workflow query was incorrectly interpreted and resulted in a highly inefficient workflow query.
[Smart] Fetch and execute jobs with missing partition keys
Previously, partitioned queues wouldn't run jobs with null partition keys. This was an edge case, usually caused by adding partition config to a queue with older, scheduled jobs in it.
[Relay] Add with_retries option to await/2
It's now possible to keep waiting for a job to complete across retries. The awaiting process will keep waiting until the job has exhausted all attempts before or the timeout is reached. This makes async/await more reliable for jobs that have flickering failures.
[Refresher] Fix shutdown order to prevent rescue mistakes
Queue producer refreshing, which is essential to identifying live queues, could terminate too
early during node shutdown. When the shutdown_grace_period exceeded the refresh interval,
producers would lose the ability to update their timestamps while still running, leading to
false stale producer detection and unnecessary job rescues.
Refreshing now survives instance shutdowns and continues updating producer timestamps throughout the grace period.
[Smart] Prevent deadlock during concurrent unique insert and fetch operations
Inserting unique jobs that conflicted with currently executing jobs could cause a deadlock. The ack query now takes an explicit lock to ensure updates complete, while also preventing the deadlock.
[Workflow] Update workflow flushing syntax for CRDB
The flushing query used a syntax that is valid in Postgres, but incompatible with how CRDB handles JSONB function output.
[Workflow] Include cascade options in apply_graft/2
The spec only included t:new_opts/0, but the function also creates cascade jobs and should
include t:add_cascade_opts/0.
[DynamicQueues] Correct typespec for update/3
The typespect listed t:queue_opts/0, which is a union of tuples, but should be a keyword list.
[DynamicPruner] Retain jobs in active workflows when pruning
The DynamicPruner no longer deletes jobs that are part of an active workflow or related
sub-workflow. If any job in a workflow is in an incomplete state (e.g. executing), no jobs in
the workflow will be pruned.
[DynamicQueues] Enabling clearing limits on queue config update
Clearing a DynamicQueue's rate_limit or global_limit by setting nil, or swapping between
options, now works with a config update.
[Workflow] Handle empty enumerables passed to add_many/4
Empty sub-workflows no longer prevent downstream jobs in a workflow from executing. Now, step elimination ensures the workflow executes in the expected order when a sub-workflow created from an empty enumerable.
[Worker] Move option validation to after_compile/2 hook
Validating options with an exception in an after_verify/1 hook breaks progressive compilation
and causes ongoing warnings during subsequent compilations. Options are now validated with
after_compile/2 instead.
[Smart] Prevent leaking tracked partitions between job fetches.
A race condition between acking and fetching could cause tracked global partitions to be retained when jobs weren't still processing. Now that logic is simplified and corrected to ensure only actively running job partitions are tracked.
[Smart] Correct expiration for partition key caching
Partition keys are now cached and expired independently to compensate for differing queue access patterns and load.
[Smart] Limit partition keys fetched with optional config
Previously, the number of unique partitions fetched from the database was unlimited. The resulting query, and amount of data fetched, could cause performance problems in systems with high cardinality.
[Smart] Properly enforce partitioned global concurrency between nodes.
Simplified partition tracking prevents concurrency violations for queues with global_limit: 1
and bursts of activity. Lingering tracked values could lead to incorrect demand in some
circumstances.
[Decorator] Update validation and handling for unique state groups
Decorated jobs now allow Oban v2.20's unique state groups. The states portion of decorated
typespecs is corrected as well.
[Workflow] Return unknown status for missing workflows
Rather than causing an exception, the status/2 function returns an empty status map with the
:unknown status when no workflow jobs can be found in the database.
[Workflow] Pass scheduling fields through when creating cascade jobs
Scheduling options weren't passed through to cascade jobs, which made it difficult to delay cascade workflows.
[Workflow] Include sub-workflows when cancelling workflow jobs
Sub workflows can't run if the parent workflow is cancelled. Now sub-workflows are cancelled by
default, though the functionality can be disabled using with_subs: false.
[Workflow] Correct t:add_opt/0 type for workflows.
The type was incorrectly auto-formatted into an invalid form.
[Testing] Restrict execution to single batch with run_batch/1
The run_batch/2 helper would insert the batch but run all available jobs rather than
restricting to the current batch. Now run_batch/2 limits execution as expected, similarly to
run_workflow/2.
[Testing] Prevent config fetching errors while draining jobs
Some functions, such as Workflow.get_job/3, rely on the presence of an Oban config while
draining. Now the generated config is registered and available to any functions that may require
it while testing.
[Testing] Preserve original job order for perform_chunk/1
The order of jobs passed to perform_chunk/1 was reversed after pre-processing was applied.
[Testing] Ignore context jobs when draining sub-workflows
Nested contexts within sub-workflows would mistakenly be attempted while draining. Context jobs shouldn't be processed at any point, including during test runs.
[Worker] Improve error message for hook module validation.
The error message only mentioned after_process/2,3, not the other hooks added more recently.
[Workflow] Only atomize top level keys in recorded cascade
Cascade functions may record arbitrary terms, including maps with purposeful string keys and structs. Now only the top level of recorded maps are atomized and nested map keys are left as is.
[Worker] Pass changes from before_process in after_process
Changes made to jobs in a before_process hook are now passed through to after_process.
Previously, only modified args were retained.
[Plugins] Add logging for unexpected messages to all processes
Add a catch all to log a warning when plugins, or other GenServers, receive unexpected messages.
[Refresher] Prevent simultaneous producer refresh deadlocks.
Deleting the same producer records from multiple nodes simultaneously could lead to a deadlock. Now, only the leader will perform deletions.
[Workflow] Include grafted workflows in cascade context
The results from a graft are now passed down to the context for cascade jobs. In addition, nested map keys from grafted cascade jobs are atomized when possible.
[Workflow] Increase rescue limit to capture more potentially stuck workflows
The decreased limit failed to rescue stalled workflows in especially busy systems.
[Worker] Dump structs nested within maps in structured args
Structs within a map weren't dumped to a map the way the were at the top level. Now structs are uniformly escaped at the top level, within embeds, within lists, and within maps.
[Smart] Cast states to strings on unique violation
Handling unique violations from the available->executing transition correctly uses strings
instead of atoms now.
The workflow indexes changed between v1.6.0-rc.5 and v1.6.0. If you're upgrading from a v1.6 release candidate, you should rerun index migrations:
Oban.Pro.Migration.down(only: :indexes)
Oban.Pro.Migration.up(only: :indexes)
[Workflow] Optimize workflow deps queries for index use
Revamped queries are able to fully utilize workflow indexes for all dependency checks. The resulting queries remain extremely fast even in systems with thousands of parallel workflows.
[Workflow] Correct add_cascade_opts type definition.
The type incorrectly combined Job.option and a list of add options, rather than forming a single list of types.
[Refresher] Ensure the refresher flushes continuously.
This is a critical bug that would cause producer records to accumulate indefinitely after the first 15 seconds of the refresher running.
[Smart] Block chain preparation during ack updates
Acking chains while inserting a new job in the chain was prone to a race condition that would lead to "stuck" chains. This augments the chain insertion query so that it blocks until other jobs in the chain are committed to prevent transactional races.
[Smart] Correct locking when flush handlers present
Nodes must acquire a lock to safely coordinate workflow, batch, and chain queries while acking. This corrects the logic used to check whether there are pending flushes during a transaction.
[Smart] Block chain state checks while acking.
Acking chains while inserting a new job in the chain was prone to a race condition that would lead to "stuck" chains. This augments the chain insertion query so that it blocks until other jobs in the chain are committed to prevent transactional races.
[Smart] Correct locking when flush handlers present.
Nodes must acquire a lock to safely coordinate workflow, batch, and chain queries while acking. This corrects the logic used to check whether there are pending flushes during a transaction.
[Pro] Address deprecation warnings from Elixir v1.19.
[Refresher] Simplify refreshing logic for stale producers.
Now that producer tracking is centralized, the queries for purging stale records can be much simpler and purge records even sooner.
[Decorator] Ensure decorated modules are loaded on process.
Ensure the module is loaded before converting the function to prevent unknown atom issues.
[Migration] Refactor migration versioning to support separate schema/index tracking
Split version tracking to independently track schema and index migrations, enabling more
granular migration control when using the :only option.
[Migration] Fix migration version check and warning logic when using split migrations.
[Workflow] Fix sub-workflow context merging for cascades
Scoping within a sub-workflow prevented correctly fetching recorded cascade values. Now context is merged between parent and child cascade jobs.
[Workflow] Prevent seq scan from workflow deps query
The workflow deps query checked for the presence of a 'workflow_id' on the wrong table reference, which prevented the correct usage.
[Workflow] Correctly resolve downstream graft dependencies
A race condition in deps checking could allow workflow jobs that dependeded on a graft job to execute too soon.
[Workflow] Introduce add_graft/3 and apply_graft/2.
Graft jobs serve as placeholders in a workflow. When a graft job executes, it must build and attach a new sub-workflow at that point. Any downstream jobs that depend on the grafter will wait for the entire grafted sub-workflow to complete.
[Plugins] Enable default logger output for all plugins.
All plugins now implement a callback that allows their output to be logged automatically by the
default Oban.Telemetry logger.
[Workflow] Optimize workflow deps query to allow index usage.
The query now uses an index enabled operator rather than a function. The resulting is much uglier, but hundreds of times faster on larger tables.
[Smart] Safely decode legacy rate limit data from existing producers.
[Smart] Expire locally cached producers after a brief time to prevent memory leaks and ensure producers are cleaned up shortly after a crash.
[Migration] Record separate schema and index migration version
The recorded migration version now indicates if only schemas or indexes were migrated, which
ensures an indexes migration will always run after a schemas migration.
[Migration] Make dropping the old workflow index concurrent when index-only migrations are enabled.
[Migration] Respect dynamic repo configuration during version checks.
[Migration] Add concurrent migration options.
The new only option makes it possible to split migrations into schema and index changes.
That allows for concurrent index creation to prevent locking the jobs table during migrations
for tables with a large number of retained jobs.
This removes the need to set concurrently manually, deriving instead from the only option
itself. It also ensures all down migrations are using the concurrently option as well.
[Smart] Track partition keys on insert.
For queues without a backlog, but frequent inserts, partitioned jobs wouldn't be fetched immediately. Now partition keys are added to the available keys cache immediately on insert for the current node.
[Smart] Briefly retain tracked global keys.
Retaining tracked keys for several cycles beyond when they've had something to fetch helps compensate for partitions without a backlog of jobs
[Smart] Lower partition keys cache TTL.
The TTL is reduced to 5s to reduce the lag before new partitions are processed in most systems. This setting is now documented as well, in the Smart engine documentation.
[Workflow] Correct logic for sub-workflow dependency checks.
Jobs with dependencies on sub-workflows wouldn't be moved to the available state in some
situations. This corrects the order of status checks, and merges staging with rescuing logic.
[Workflow] Ensure arg value is always set for cascade jobs.
The default nil value is stripped from the encoded args map. This changes the default to a
non-nil value so it's retained for all cascade jobs.
[Workflow] Add cascading workflow mode with context-aware functions.
Cascade mode simplifies building workflows that rely on context or output from previous functions. Cascade jobs use a strictly defined function signature and conventions to make expressive workflows that "cascade" context between functions according to dependencies.
[Workflow] Create and render subgraphs for sub-workflows.
Graphs for sub-workflows now correctly track dependencies between jobs and subs, and sub-graphs
are created for sub-workflows. That includes subgraph rendering for to_dot/1 and
to_mermaid/1 rendered output.
[Workflow] Enhance status/1 output with the workflow id, atom statuses, and sub workflows.
Sub-workflows are now included in the output of the status/1 function, along with the workflow
id. The workflow status is now reported as an atom rather than a string, and there is a full
typespec for the return value.
[Worker] Increase the default recorded value to 64mb
The default record value of 32k was highly conservative and a poor fit for decorated jobs or cascading workflows. The default is now much larger than any value that should be stored as a return value.
[Workflow] Fix grouping of workflow and sup workflow in query.
The workflow query lacked the grouping parenthesis required to select the correct partial index.
[DynamicCron] Remove double wrapping of cron_opt keyword
The cron_opt type changed to a keyword list, but it was still wrapped in a list. This removes
the double wrapping.
[Smart] Safely accept legacy rate limit windows.
The legacy windows format is a list of maps, not a single map. This adds a custom type to translate legacy values, and correctly handles those values in the limiter itself to prevent crashes.
[Smart] Safely handle legacy global tracking data.
During the initial upgrade, legacy producers will have a different shape of tracked data for partitioned and non-partitioned queues. This addds a translation step to prevent crashing new producers.
[Smart] Backport injecting partition keys into job meta.
This injects partition details into jobs as they're inserted. The partition keys will be used for a generated column in v1.6, and this eases the migration process.
[Workflow] Add add_workflow/4 for creating nested sub-workflows.
Sub-workflows simplify organizing and managing complex job dependencies by grouping related jobs:
Workflow.new()
|> Workflow.add_workflow(:sub, MyApp.SubWorkflow.new())
|> Workflow.add(:final, FinalWorker.new(%{}), deps: :sub)
Downstream dependencies can reference the sub-workflow as a whole, including any jobs that may be dynamically added during runtime.
[Workflow] Add add_many/4 function for creating fan-out sub-workflows.
This helper enables adding multiple jobs to a workflow with a single name:
email_jobs = Enum.map(users, &EmailWorker.new(%{user_id: &1.id})
workflow =
Workflow.new()
|> Workflow.add_many(:emails, email_jobs))
|> Workflow.add(:report, ReportWorker.new(), deps: :emails)
[Workflow] Add sub-workflow metadata and query support.
Support retrieving sub-workflow jobs with new with_subs option. This makes it possible to
fetch all sub-workflow jobs, including all recordings, with a single call.
Workflow.all_recorded(job, with_subs: true)
[Workflow] Add put_context/3 for sharing values between workflow jobs.
The helper inserts a completed job with a recorded value for all other jobs to fetch. This simplifies context sharing and eliminates the need to add the same values as args to all jobs in a workflow.
workflow = Workflow.put_context(workflow, %{id: 123, name: "Alice"})
def process(job) do
%{id: id, name: name} = Workflow.get_recorded(job, :context)
end
[Workflow] Add status/1 helper for getting workflow execution information.
The new helper simplifies gathering runtime information about a workflow, including the name, total jbs, overall state of the workflow, elapsed duration, state counts, and timestamps.
%{total: 5, counts: %{completed: 5}} = Workflow.status(workflow.id)
[Workflow] Add retry_jobs/2 for retrying workflow jobs.
The new helper function will retry jobs in the workflow and hold jobs with dependencies accordingly.
Workflow.retry_jobs(job)
[Workflow] Optimize workflow indexes and eliminate containment queries.
A new partial workflow index checks held jobs, workflow ID, and sub-workflow ID at once, eliminating the need for a GIN index on meta and args.
[Smart] Add burst mode for global partitioned queues.
Allows partitioned queues to exceed per-partition limits when capacity is available, while still respecting the overall concurrency limit:
config :my_app, Oban,
queues: [
exports: [
global_limit: [
allowed: 5,
burst: true,
partition: [args: :tenant_id]
],
local_limit: 100
]
]
[Smart] Allow using :meta in partition configuration.
Now queue partitioning can use values from the job's metadata, including composition values such
as workflow_id.
config :my_app, Oban,
queues: [media: [global_limit: [allowed: 1, partition: [meta: :workflow_id]]]]
[Smart] Overhaul queue partitioning for significant performance improvements.
Adds a generated partition_key column with a partial index to oban_jobs. Jobs are
pre-partitioned on insert based on queue configuration, which simplifies partition queries and
provides ~20x faster performance.
[Smart] Centralize producer refresh and cleanup.
Producer refreshing and cleanup is now centralized to reduce database load. Rather than one transaction per queue per node every 30 seconds, there is now one transaction per node every 30 seconds. This offers significant query savings in systems running numerous queues.
[Smart] Provide alternate map hashing to avoid collision.
The use of phash2/1 could cause collisions between values such as uuids, which led to false
positives for unique checks. This adds an alternate mechanism for hashing that avoids any such
collisions, but it is opt-in for backward compatibility.
To switch, set the following compile time config:
config :oban_pro, Oban.Pro.Utils, safe_hash: true
[Decorator] Add current_job/0 to decorated modules.
The current_job/0 function allows decorated jobs to access the underlying job for inspection
or to pass as context to other functions such as Backoff, or Workflow helpers.
[Decorator] Support recording as a runtime option.
Any job may now be marked as recorded at runtime, including decorated jobs:
@job queue: :processing, recorded: true
def process_account(account_id) do
# ...
end
[DynamicPrioritizer] Expand options for finer control.
New :limit and :max_priority options add control for many jobs are prioritized and their
maximum priority level.
[DynamicQueues] Preserve runtime updates without configuration changes.
Add automatic sync_mode to insert/update/delete queues based on configuration and prevent
overwriting runtime updates to queues unless the configuration changes.
[DynamicCron] Use optimized cron expression calculation.
The new last_at/2 and next_at/2 cron expression functions are vastly faster and more
efficient than the previous implementation. This improves cron job insertion performance in
guaranteed mode.
[Migration] Drop tables entirely during initial down migration
The presence of a uniq_key column prevents dropping the partitioned table after a rollback.
The specific sequence of migrating and backfilling is unlikely to be used in the real world.
[Smart] Use concat operator for workflow rescues.
The concat operator works with both array and json data, so appending to arrays works correctly with either data type. This is particularly important for CRDB systems because they don't currently support arrays of JSON.
[Smart] Provide alternate args/meta hashing to avoid collision.
The use of phash2/1 could cause collisions between values such as uuids, which led to false positives for unique checks. This adds an alternate mechanism for hashing that avoids any such collisions, but it is opt-in for backward compatibility.
To switch, set the following compile time config:
config :oban_pro, Oban.Pro.Utils, safe_hash: true
[Smart] Catch unique violations while fetching jobs
The fetch_jobs/3 function handles numerous state transitions that could trigger a unique
violation. To prevent producer crashes, those violations are caught and fixed before fetching
a
[Worker] Ensure hook modules are loaded at runtime
In dynamic, lazy loaded mode it was possible for hooks not to run because calling function_exported doesn't load the module. Now modules are checked before detecting functions. gain.
[Cron] Include configured timezone in cron job metadata.
Along with the cron expression, stored as cron_expr, the configured timezone is also recorded
as cron_tz in cron job metadata. This mirrors new functionality in Oban v2.19.2.
[DynamicQueues] Add get/2 for fetching a particular dynamic queue.
Using all/1 to fetch and filter jobs is inefficient, particularly when there are a large number
of queues.
[Smart] Further optimize unique checks for partitioned tables.
The previous attempt at optimizing unique checks for partitioned tables wasn't effective. Changes were made to enable fully index backed queries.
[Decorator] Namespace decorator module attribute.
The decorated module attribute could conflict with module attributes outside of Oban.
[DynamicScaler] Avoid casting last_scaled_at for skipped events
When scaling info is handed off quickly without any scaling events the last_scaled_at
timestamp may be nil.
[Smart] Optimize fallback partitioned table unique check
The query used a bad combination of join: true and an or where, which would lead to querying far too many rows for each dupe check.
[Smart] Handle smart compile time config with access
Configuration is typically done with keyword lists, not maps. Using an Access call supports
either for backward compatibility.
[Migration] Detect partitioning with a backward compatible query
Prior to v1.3.4 the oban_jobs_incomplete table didn't exist, which breaks detecting
partitioned tables for the v1.5 upgrade. This changes detection to look for
oban_jobs_completed instead, which has always been created by the dynamic partitioning
migrations.
[Testing] Use real uuid to record draining attempts
The value "draining" can't be cast as a uuid, which made rehydrating the attempted_by field of
drained jobs impossible. This replaces it with a hard-coded, all zero UUID.