resile

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: MIT Imports: 8 Imported by: 0

README

Resile: Ergonomic Execution Resilience for Go

Go Reference License Build Status Codecov Go Report Card

Resile is a production-grade execution resilience and retry library for Go, inspired by Python's stamina. It provides a type-safe, ergonomic, and highly observable way to handle transient failures in distributed systems.


Table of Contents


Installation

go get github.com/cinar/resile

Why Resile?

In distributed systems, transient failures are a mathematical certainty. Resile simplifies the "Correct Way" to retry:

  • AWS Full Jitter: Uses the industry-standard algorithm to prevent "thundering herd" synchronization.
  • Adaptive Retries: Built-in token bucket rate limiting to prevent "retry storms" across a cluster.
  • Generic-First: No interface{} or reflection. Full compile-time type safety.
  • Context-Aware: Strictly respects context.Context cancellation and deadlines.
  • Zero-Dependency Core: The core library only depends on the Go standard library.
  • Opinionated Defaults: Sensible production-ready defaults (5 attempts, exponential backoff).

Articles & Tutorials

Want to learn more about the philosophy behind Resile and advanced resilience patterns in Go? Check out these deep dives:

Examples

The examples/ directory contains standalone programs showing how to use Resile in various scenarios:


Common Use Cases

1. Simple Retries

Retry a simple operation that only returns an error.

err := resile.DoErr(ctx, func(ctx context.Context) error {
    return db.PingContext(ctx)
})
2. Value-Yielding Retries (Generics)

Fetch data with full type safety. The return type is inferred from your closure.

// val is automatically inferred as *User
user, err := resile.Do(ctx, func(ctx context.Context) (*User, error) {
    return apiClient.GetUser(ctx, userID)
}, resile.WithMaxAttempts(3))
3. Request Hedging (Speculative Retries)

Speculative retries reduce tail latency by starting a second request if the first one doesn't finish within a configured HedgingDelay. The first successful result is used, and the other is cancelled.

// For value-yielding operations
data, err := resile.DoHedged(ctx, action, 
    resile.WithMaxAttempts(3),
    resile.WithHedgingDelay(100 * time.Millisecond),
)

// For error-only operations
err := resile.DoErrHedged(ctx, action,
    resile.WithMaxAttempts(2),
    resile.WithHedgingDelay(50 * time.Millisecond),
)
4. Stateful Retries & Endpoint Rotation

Use DoState (or DoErrState) to access the RetryState, allowing you to rotate endpoints or fallback logic based on the failure history.

endpoints := []string{"api-v1.example.com", "api-v2.example.com"}

data, err := resile.DoState(ctx, func(ctx context.Context, state resile.RetryState) (string, error) {
    // Rotate endpoint based on attempt number
    url := endpoints[state.Attempt % uint(len(endpoints))]
    return client.Get(ctx, url)
})
5. Handling Rate Limits (Retry-After)

Resile automatically detects if an error implements RetryAfterError. It can override the jittered backoff with a server-dictated duration and can also signal immediate termination (pushback).

type RateLimitError struct {
    WaitUntil time.Time
}

func (e *RateLimitError) Error() string { return "too many requests" }
func (e *RateLimitError) RetryAfter() time.Duration {
    return time.Until(e.WaitUntil)
}
func (e *RateLimitError) CancelAllRetries() bool {
    // Return true to abort the entire retry loop immediately.
    return false 
}

// Resile will sleep exactly until WaitUntil when this error is encountered.
6. Aborting Retries (Pushback Signal)

If a downstream service returns a terminal error (like "Quota Exceeded") that shouldn't be retried, implement CancelAllRetries() bool to abort the entire retry loop immediately.

type QuotaExceededError struct{}
func (e *QuotaExceededError) Error() string { return "quota exhausted" }
func (e *QuotaExceededError) CancelAllRetries() bool { return true }

// Resile will stop immediately if this error is encountered,
// even if more attempts are remaining.
_, err := resile.Do(ctx, action, resile.WithMaxAttempts(10))
7. Fallback Strategies

Provide a fallback function to handle cases where all retries are exhausted or the circuit breaker is open. This is useful for returning stale data or default values.

data, err := resile.Do(ctx, fetchData,
    resile.WithMaxAttempts(3),
    resile.WithFallback(func(ctx context.Context, err error) (string, error) {
        // Return stale data from cache if the primary fetch fails
        return cache.Get(ctx, key), nil 
    }),
)
8. Layered Defense with Circuit Breaker

Combine retries (for transient blips) with a circuit breaker (for systemic outages).

import "github.com/cinar/resile/circuit"

cb := circuit.New(circuit.Config{
    FailureThreshold: 5,
    ResetTimeout:     30 * time.Second,
})

// Returns circuit.ErrCircuitOpen immediately if the downstream is failing consistently.
err := resile.DoErr(ctx, action, resile.WithCircuitBreaker(cb))
9. Macro-Level Protection (Adaptive Retries)

Prevent "retry storms" by using a token bucket that is shared across your entire cluster of clients. If the downstream service is degraded, the bucket will quickly deplete, causing clients to fail fast locally instead of hammering the service.

// Share this bucket across multiple executions/goroutines
bucket := resile.DefaultAdaptiveBucket()

err := resile.DoErr(ctx, action, resile.WithAdaptiveBucket(bucket))
10. Structured Logging & Telemetry

Integrate with slog or OpenTelemetry without bloating your core dependencies.

import "github.com/cinar/resile/telemetry/resileslog"

logger := slog.Default()
resile.Do(ctx, action, 
    resile.WithName("get-inventory"), // Name your operation for metrics/logs
    resile.WithInstrumenter(resileslog.New(logger)),
)
11. Panic Recovery ("Let It Crash")

Convert unexpected Go panics into retryable errors, allowing your application to reset to a known good state without a hard crash.

// val will succeed even if the first attempt panics
val, err := resile.Do(ctx, riskyAction, 
    resile.WithPanicRecovery(),
)
12. Fast Unit Testing

Never let retry timers slow down your CI. Use WithTestingBypass to make all retries execute instantly.

func TestMyService(t *testing.T) {
    ctx := resile.WithTestingBypass(context.Background())
    
    // This will retry 10 times instantly without sleeping.
    err := service.Handle(ctx)
}
13. Reusable Clients & Dependency Injection

Use resile.New() to create a Retryer interface for cleaner code architecture and easier testing.

// Create a reusable resilience strategy
retryer := resile.New(
    resile.WithMaxAttempts(3),
    resile.WithBaseDelay(200 * time.Millisecond),
)

// Use the interface to execute actions
err := retryer.DoErr(ctx, func(ctx context.Context) error {
    return service.Call(ctx)
})
14. Marking Errors as Fatal

Sometimes you know an error is terminal and shouldn't be retried (e.g., "Invalid API Key"). Use resile.FatalError() to abort the retry loop immediately.

err := resile.DoErr(ctx, func(ctx context.Context) error {
    err := client.Do()
    if errors.Is(err, ErrAuthFailed) {
        return resile.FatalError(err) // Stops retries immediately
    }
    return err
})
15. Resilient State Machines

Build self-healing state machines where every transition is protected by resilience policies, inspired by Erlang's gen_statem.

// Create a state machine: initialState, initialData, transitionFunc, opts
sm := resile.NewStateMachine(Disconnected, data, transition, resile.WithMaxAttempts(3))

// Transitions are protected by retries, circuit breakers, etc.
err := sm.Handle(ctx, Connect)
16. Custom Error Filtering

Control which errors trigger a retry using WithRetryIf (for exact matches) or WithRetryIfFunc (for custom logic like checking status codes).

err := resile.DoErr(ctx, action,
    // Only retry if the error is ErrConnReset
    resile.WithRetryIf(ErrConnReset),
    
    // OR use a custom function for complex logic
    resile.WithRetryIfFunc(func(err error) bool {
        return errors.Is(err, ErrTransient) || isTimeout(err)
    }),
)

Configuration Reference

Option Description Default
WithName(string) Identifies the operation in logs/metrics. ""
WithMaxAttempts(uint) Total number of attempts (initial + retries). 5
WithBaseDelay(duration) Initial backoff duration. 100ms
WithMaxDelay(duration) Maximum possible backoff duration. 30s
WithBackoff(Backoff) Custom backoff algorithm (e.g. constant). Full Jitter
WithHedgingDelay(duration) Delay before speculative retries. 0
WithRetryIf(error) Only retry if errors.Is(err, target). All non-fatal
WithRetryIfFunc(func) Custom logic to decide if an error is retriable. nil
WithCircuitBreaker(cb) Attaches a circuit breaker state machine. nil
WithAdaptiveBucket(b) Attaches a token bucket for adaptive retries. nil
WithInstrumenter(inst) Attaches telemetry (slog/OTel/Prometheus). nil
WithFallback(f) Sets a generic fallback function. nil
WithFallbackErr(f) Sets a fallback function for error-only actions. nil
WithPanicRecovery() Enables "Let It Crash" panic handling. false

Architecture & Design

Resile is built for high-performance, concurrent applications:

  • Memory Safety: Uses time.NewTimer with proper cleanup to prevent memory leaks in long-running loops.
  • Context Integrity: Every internal sleep is a select between the timer and ctx.Done().
  • Zero Allocations: Core execution loop is designed to be allocation-efficient.
  • Errors are Values: Leverage standard errors.Is and errors.As for all policy decisions.

Acknowledgements

  • AWS Architecture Blog: For the definitive Exponential Backoff and Jitter algorithm (Full Jitter).
  • Stamina & Tenacity: For pioneering ergonomic retry APIs in the Python ecosystem that inspired the design of Resile.

License

Resile is released under the MIT License.

Copyright (c) 2026 Onur Cinar.
The source code is provided under MIT License.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Do

func Do[T any](ctx context.Context, action func(context.Context) (T, error), opts ...Option) (T, error)

Do executes an action with retry logic using the provided options. This generic function handles functions returning (T, error).

func DoErr

func DoErr(ctx context.Context, action func(context.Context) error, opts ...Option) error

DoErr executes an action with retry logic using the provided options. This function handles functions returning only error.

func DoErrHedged

func DoErrHedged(ctx context.Context, action func(context.Context) error, opts ...Option) error

DoErrHedged executes an action using speculative retries (hedging).

func DoErrState

func DoErrState(ctx context.Context, action func(context.Context, RetryState) error, opts ...Option) error

DoErrState executes a stateful action with retry logic using the provided options.

func DoErrStateHedged

func DoErrStateHedged(ctx context.Context, action func(context.Context, RetryState) error, opts ...Option) error

DoErrStateHedged executes a stateful action with speculative retries (hedging).

func DoHedged

func DoHedged[T any](ctx context.Context, action func(context.Context) (T, error), opts ...Option) (T, error)

DoHedged executes an action using speculative retries (hedging). It starts multiple attempts concurrently if previous ones take too long.

func DoState

func DoState[T any](ctx context.Context, action func(context.Context, RetryState) (T, error), opts ...Option) (T, error)

DoState executes a stateful action with retry logic using the provided options. The RetryState is passed to the closure, allowing it to adapt to failure history.

func DoStateHedged

func DoStateHedged[T any](ctx context.Context, action func(context.Context, RetryState) (T, error), opts ...Option) (T, error)

DoStateHedged executes a stateful action with speculative retries (hedging).

func FatalError

func FatalError(err error) error

FatalError wraps an error to indicate that the retry loop should terminate immediately.

func WithTestingBypass

func WithTestingBypass(ctx context.Context) context.Context

WithTestingBypass returns a new context that signals the retry loop to skip all sleep delays. This is intended for use in unit tests to prevent CI pipelines from being slowed down by backoff.

Types

type AdaptiveBucket

type AdaptiveBucket struct {
	// contains filtered or unexported fields
}

AdaptiveBucket implements a client-side token bucket rate limiter for retries. It protects downstream services from thundering herds by depleting tokens when retries occur and refilling them on successful responses. This allows a fleet of clients to quickly cut off traffic to a degraded system, providing macro-level protection.

func DefaultAdaptiveBucket

func DefaultAdaptiveBucket() *AdaptiveBucket

DefaultAdaptiveBucket creates a new AdaptiveBucket with standard defaults (max capacity: 500, retry cost: 5, success refill: 1).

func NewAdaptiveBucket

func NewAdaptiveBucket(maxCapacity, retryCost, successRefill float64) *AdaptiveBucket

NewAdaptiveBucket creates a new AdaptiveBucket with the specified configuration. A common AWS-style configuration is maxCapacity=500, retryCost=5, successRefill=1.

func (*AdaptiveBucket) AcquireRetryToken

func (b *AdaptiveBucket) AcquireRetryToken() bool

AcquireRetryToken attempts to consume a token for a retry. Returns true if a token was successfully consumed, false if the bucket is empty.

func (*AdaptiveBucket) AddSuccessToken

func (b *AdaptiveBucket) AddSuccessToken()

AddSuccessToken adds a fraction of a token back to the bucket upon a successful response.

type Backoff

type Backoff interface {
	// Next calculates the duration to wait before the specified attempt.
	// The attempt parameter is 0-indexed.
	Next(attempt uint) time.Duration
}

Backoff defines the interface for temporal distribution of retries.

func NewFullJitter

func NewFullJitter(base, cap time.Duration) Backoff

NewFullJitter returns a Backoff implementation using the AWS Full Jitter algorithm. The base duration dictates the initial delay, and the cap defines the absolute maximum.

type Config

type Config struct {
	Name           string
	MaxAttempts    uint
	BaseDelay      time.Duration
	MaxDelay       time.Duration
	HedgingDelay   time.Duration
	Backoff        Backoff
	Policy         *retryPolicy
	Instrumenter   Instrumenter
	CircuitBreaker *circuit.Breaker
	Fallback       any
	AdaptiveBucket *AdaptiveBucket
	RecoverPanics  bool
}

Config represents the configuration for the retry execution.

func DefaultConfig

func DefaultConfig() *Config

DefaultConfig returns a reasonable production-grade configuration.

func (*Config) Do

func (c *Config) Do(ctx context.Context, action func(context.Context) (any, error)) (any, error)

Do satisfies the Retryer interface. Note: returns any for interface compliance.

func (*Config) DoErr

func (c *Config) DoErr(ctx context.Context, action func(context.Context) error) error

DoErr satisfies the Retryer interface.

func (*Config) DoErrHedged

func (c *Config) DoErrHedged(ctx context.Context, action func(context.Context) error) error

DoErrHedged satisfies the Retryer interface.

func (*Config) DoHedged

func (c *Config) DoHedged(ctx context.Context, action func(context.Context) (any, error)) (any, error)

DoHedged satisfies the Retryer interface.

type Instrumenter

type Instrumenter interface {
	// BeforeAttempt is called before each execution attempt.
	// It can return a new context (e.g., to inject trace spans).
	BeforeAttempt(ctx context.Context, state RetryState) context.Context
	// AfterAttempt is called after each execution attempt.
	AfterAttempt(ctx context.Context, state RetryState)
}

Instrumenter defines the lifecycle hooks for monitoring retry executions. It is a zero-dependency interface to allow custom implementations for logging, metrics, and tracing.

type Option

type Option func(*Config)

Option defines a functional option for configuring a retry execution.

func WithAdaptiveBucket

func WithAdaptiveBucket(bucket *AdaptiveBucket) Option

WithAdaptiveBucket sets a token bucket for adaptive retries. The bucket should be shared across multiple executions to protect downstream services globally.

func WithBackoff

func WithBackoff(backoff Backoff) Option

WithBackoff sets a custom backoff algorithm.

func WithBaseDelay

func WithBaseDelay(delay time.Duration) Option

WithBaseDelay sets the initial delay for the backoff algorithm.

func WithCircuitBreaker

func WithCircuitBreaker(cb *circuit.Breaker) Option

WithCircuitBreaker integrates a circuit breaker into the retry execution.

func WithFallback

func WithFallback[T any](f func(context.Context, error) (T, error)) Option

WithFallback sets a function to be called if all retries are exhausted or if the circuit breaker is open. T must match the return type of the retry action.

func WithFallbackErr

func WithFallbackErr(f func(context.Context, error) error) Option

WithFallbackErr sets a fallback function for operations that only return an error.

func WithHedgingDelay

func WithHedgingDelay(delay time.Duration) Option

WithHedgingDelay sets the delay for speculative retries (hedging). If a response doesn't arrive within this delay, another attempt is started concurrently.

func WithInstrumenter

func WithInstrumenter(instr Instrumenter) Option

WithInstrumenter sets a telemetry instrumenter.

func WithMaxAttempts

func WithMaxAttempts(attempts uint) Option

WithMaxAttempts sets the maximum number of execution attempts.

func WithMaxDelay

func WithMaxDelay(delay time.Duration) Option

WithMaxDelay sets the maximum delay for the backoff algorithm.

func WithName

func WithName(name string) Option

WithName sets the name for the operation. This is used in telemetry labels.

func WithPanicRecovery added in v1.0.1

func WithPanicRecovery() Option

WithPanicRecovery enables recovering from panics during execution. If a panic occurs, it is converted into a PanicError and treated as a retryable error.

func WithRetryIf

func WithRetryIf(target error) Option

WithRetryIf sets a specific error to trigger a retry.

func WithRetryIfFunc

func WithRetryIfFunc(f func(error) bool) Option

WithRetryIfFunc sets a custom function to determine if an error should be retried.

type PanicError added in v1.0.1

type PanicError struct {
	Value      any
	StackTrace string
}

PanicError represents a recovered panic during execution.

func (*PanicError) Error added in v1.0.1

func (p *PanicError) Error() string

Error implements the error interface.

type RetryAfterError

type RetryAfterError interface {
	error
	RetryAfter() time.Duration
	CancelAllRetries() bool
}

RetryAfterError is implemented by errors that specify how long to wait before retrying. This is commonly used with HTTP 429 (Too Many Requests) or 503 (Service Unavailable) to respect Retry-After headers. It also supports pushback signals to cancel all retries.

type RetryState

type RetryState struct {
	// Name is the optional identifier for the operation being retried.
	Name string
	// Attempt is the current 0-indexed retry iteration.
	Attempt uint
	// MaxAttempts is the maximum number of attempts allowed.
	MaxAttempts uint
	// LastError is the error encountered in the previous attempt.
	LastError error
	// TotalDuration is the cumulative time spent across all attempts and sleeps.
	TotalDuration time.Duration
	// NextDelay is the duration to be slept before the next attempt.
	NextDelay time.Duration
}

RetryState encapsulates the current state of a retry execution.

type Retryer

type Retryer interface {
	// Do executes a function that returns a value and an error.
	Do(ctx context.Context, action func(context.Context) (any, error)) (any, error)
	// DoHedged executes a function using speculative retries (hedging).
	DoHedged(ctx context.Context, action func(context.Context) (any, error)) (any, error)
	// DoErr executes a function that returns only an error.
	DoErr(ctx context.Context, action func(context.Context) error) error
	// DoErrHedged executes a function using speculative retries (hedging).
	DoErrHedged(ctx context.Context, action func(context.Context) error) error
}

Retryer defines the interface for executing actions with resilience.

func New

func New(opts ...Option) Retryer

New returns a new Retryer pre-configured with the provided options. This is useful for dependency injection and reusable resilience clients.

type StateMachine added in v1.0.2

type StateMachine[S any, D any, E any] struct {
	// contains filtered or unexported fields
}

StateMachine represents a resilient state machine inspired by Erlang's gen_statem. Every state transition is protected by the configured resilience policies.

func NewStateMachine added in v1.0.2

func NewStateMachine[S any, D any, E any](initialState S, initialData D, transition TransitionFunc[S, D, E], opts ...Option) *StateMachine[S, D, E]

NewStateMachine creates a new resilient state machine with the provided initial state, data, and transition function.

func (*StateMachine[S, D, E]) GetData added in v1.0.2

func (sm *StateMachine[S, D, E]) GetData() D

GetData returns the current data of the state machine.

func (*StateMachine[S, D, E]) GetState added in v1.0.2

func (sm *StateMachine[S, D, E]) GetState() S

GetState returns the current state of the state machine.

func (*StateMachine[S, D, E]) Handle added in v1.0.2

func (sm *StateMachine[S, D, E]) Handle(ctx context.Context, event E) error

Handle processes an event and performs a state transition. The transition is executed within a resilience envelope (Retries, Circuit Breakers, etc.). If the transition succeeds, the state machine updates its internal state and data.

type TransitionFunc added in v1.0.2

type TransitionFunc[S any, D any, E any] func(ctx context.Context, state S, data D, event E, rs RetryState) (S, D, error)

TransitionFunc defines the signature for a state transition function. It takes the current state, data, and event, and returns the next state and data. It also receives the current RetryState to allow transitions to adapt to failure history.

Directories

Path Synopsis
examples
adaptiveretry command
basic command
circuitbreaker command
fallback command
hedging command
http command
panicrecovery command
pushback command
stateful command
statemachine command
telemetry

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL