Resile is a production-grade execution resilience and retry library for Go, inspired by Python's stamina. It provides a type-safe, ergonomic, and highly observable way to handle transient failures in distributed systems.
- Installation
- Why Resile?
- Examples
- Common Use Cases
- Configuration Reference
- Architecture & Design
- License
go get github.com/cinar/resileIn distributed systems, transient failures are a mathematical certainty. Resile simplifies the "Correct Way" to retry:
- AWS Full Jitter: Uses the industry-standard algorithm to prevent "thundering herd" synchronization.
- Generic-First: No
interface{}or reflection. Full compile-time type safety. - Context-Aware: Strictly respects
context.Contextcancellation and deadlines. - Zero-Dependency Core: The core library only depends on the Go standard library.
- Opinionated Defaults: Sensible production-ready defaults (5 attempts, exponential backoff).
The examples/ directory contains standalone programs showing how to use Resile in various scenarios:
- Basic Retry: Simple
DoandDoErrcalls. - HTTP with Rate Limits: Respecting
Retry-Afterheaders and usingslog. - Fallback Strategies: Returning stale data when all attempts fail.
- Stateful Rotation: Rotating API endpoints using
RetryState. - Circuit Breaker: Layering defensive strategies.
Retry a simple operation that only returns an error.
err := resile.DoErr(ctx, func(ctx context.Context) error {
return db.PingContext(ctx)
})Fetch data with full type safety. The return type is inferred from your closure.
// val is automatically inferred as *User
user, err := resile.Do(ctx, func(ctx context.Context) (*User, error) {
return apiClient.GetUser(ctx, userID)
}, resile.WithMaxAttempts(3))Use DoState to access the RetryState, allowing you to rotate endpoints or fallback logic based on the failure history.
endpoints := []string{"api-v1.example.com", "api-v2.example.com"}
data, err := resile.DoState(ctx, func(ctx context.Context, state resile.RetryState) (string, error) {
// Rotate endpoint based on attempt number
url := endpoints[state.Attempt % uint(len(endpoints))]
return client.Get(ctx, url)
})Resile automatically detects if an error implements RetryAfterError and overrides the jittered backoff with the server-dictated duration.
type RateLimitError struct {
WaitUntil time.Time
}
func (e *RateLimitError) Error() string { return "too many requests" }
func (e *RateLimitError) RetryAfter() time.Duration {
return time.Until(e.WaitUntil)
}
// Resile will sleep exactly until WaitUntil when this error is encountered.Provide a fallback function to handle cases where all retries are exhausted or the circuit breaker is open. This is useful for returning stale data or default values.
data, err := resile.Do(ctx, fetchData,
resile.WithMaxAttempts(3),
resile.WithFallback(func(ctx context.Context, err error) (string, error) {
// Return stale data from cache if the primary fetch fails
return cache.Get(ctx, key), nil
}),
)Combine retries (for transient blips) with a circuit breaker (for systemic outages).
import "github.com/cinar/resile/circuit"
cb := circuit.New(circuit.Config{
FailureThreshold: 5,
ResetTimeout: 30 * time.Second,
})
// Returns circuit.ErrCircuitOpen immediately if the downstream is failing consistently.
err := resile.DoErr(ctx, action, resile.WithCircuitBreaker(cb))Integrate with slog or OpenTelemetry without bloating your core dependencies.
import "github.com/cinar/resile/telemetry/resileslog"
logger := slog.Default()
resile.Do(ctx, action,
resile.WithName("get-inventory"), // Name your operation for metrics/logs
resile.WithInstrumenter(resileslog.New(logger)),
)Never let retry timers slow down your CI. Use WithTestingBypass to make all retries execute instantly.
func TestMyService(t *testing.T) {
ctx := resile.WithTestingBypass(context.Background())
// This will retry 10 times instantly without sleeping.
err := service.Handle(ctx)
}| Option | Description | Default |
|---|---|---|
WithName(string) |
Identifies the operation in logs/metrics. | "" |
WithMaxAttempts(uint) |
Total number of attempts (initial + retries). | 5 |
WithBaseDelay(duration) |
Initial backoff duration. | 100ms |
WithMaxDelay(duration) |
Maximum possible backoff duration. | 30s |
WithRetryIf(error) |
Only retry if errors.Is(err, target). |
All non-fatal |
WithRetryIfFunc(func) |
Custom logic to decide if an error is retriable. | nil |
WithCircuitBreaker(cb) |
Attaches a circuit breaker state machine. | nil |
WithInstrumenter(inst) |
Attaches telemetry (slog/OTel/Prometheus). | nil |
WithFallback(f) |
Sets a generic fallback function. | nil |
WithFallbackErr(f) |
Sets a fallback function for error-only actions. | nil |
Resile is built for high-performance, concurrent applications:
- Memory Safety: Uses
time.NewTimerwith proper cleanup to prevent memory leaks in long-running loops. - Context Integrity: Every internal sleep is a
selectbetween the timer andctx.Done(). - Zero Allocations: Core execution loop is designed to be allocation-efficient.
- Errors are Values: Leverage standard
errors.Isanderrors.Asfor all policy decisions.
- AWS Architecture Blog: For the definitive Exponential Backoff and Jitter algorithm (Full Jitter).
- Stamina & Tenacity: For pioneering ergonomic retry APIs in the Python ecosystem that inspired the design of Resile.
Resile is released under the MIT License.
Copyright (c) 2026 Onur Cinar.
The source code is provided under MIT License.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.