Fixing retries with token buckets and circuit breakers
Very interesting writeup from Marc Brooker detailing a new (to me) retry strategy, the “retry token bucket”:
Adaptive Retries (aka the retry token bucket). When a client wants to make a call, it makes that call as normal. If it succeeds, it drops part of a token into a limited-size token bucket. If the call fails, retry up to N times as long as there are (whole) tokens in the bucket. For example, each success could deposit 0.1 tokens, and each retry could consume 1 token.
(tags: circuit-breakers distributed-systems distcomp retries retrying token-buckets algorithms reliability marc-brooker)