Skip to content

Archives

Handling billions of invocations – best practices from AWS Lambda

  • Handling billions of invocations – best practices from AWS Lambda

    Good write-up on how to horizontally scale a multi-tenant async API service, from AWS. I particularly found this shuffle-sharding-based technique to be an excellent idea:

    Drawing inspiration from the “The Power of Two Random Choices” paper, the Lambda team explored the shuffle-sharding technique for its asynchronous invocations processing. Using this technique, you shuffle-shard tenants into several randomly assigned queues. Upon receiving an asynchronous invocation, you place the message in the queue with the smallest backlog to optimize load distribution. This approach helps to minimize the likelihood of assigning tenants to a busy queue. [....]

    The shuffle-sharding technique proved remarkably effective. By distributing tenants across shards, the approach ensures that only a very small subset of tenants could be affected by a noisy neighbor. The potential impact is also minimized since each affected tenant maintains access to unaffected queues. As your workloads grow, increasing the number of queues enhances resilience and further reduces the probability of multiple tenants being assigned to the same shard. This significantly lowers the risk of a single point of failure, making shuffle sharding a robust strategy for workload isolation and fault tolerance.

    Automated Isolation, covered in the next section, is also a neat trick. (via Last Week In AWS)

    Tags: via:lwia sharding architecture services horizontal-scaling shuffle-sharding algorithms load-balancing async queues aws multitenant