Blog Post

Unraveling AWS Lambda: Exploring Scalability and Applicability

Published

August 8, 2023

mins read

Serkan Ozal

Sergey Katsev

in this blog post

In our previous blog, we shared our firsthand experience of implementing a tracing collector API using serverless components. Drawing parallels with Amazon Prime Video’s architectural redesign, we discussed the challenges we encountered, such as cold-start delays and increased costs, which prompted us to transition to a non-serverless architecture for more efficient solutions.

In this final part of our blog series on microservices and serverless architectures, we probe deeper into one of the central questions that arose from our experience: Does AWS Lambda have scalability issues? We’ll dive deep into the intricacies of AWS Lambda’s scalability, examining its strengths and potential bottlenecks. We will also consider the applicability of serverless architectures for different scenarios and highlight the importance of monitoring in any architectural setup and how you can achieve this in a serverless world. Let’s dive in.

Does AWS Lambda Have Scalability Issues?

AWS Lambda applies two types of concurrency limits: a general account-level limit and a function-level limit.

Account-level Concurrency Limit: This is the total concurrent executions limit across all functions within a single region for an AWS account. The default limit is 1000 concurrent executions per region, but this can be increased to 10K by requesting a limit increase from AWS.
Function-level Concurrency Limit (or Reserved Concurrency): You can set a specific concurrency limit for a single function within your account. This can be useful to ensure that a particular function does not consume all of your account’s available concurrency. By reserving concurrency for a function, you ensure it has a minimum capacity to run a specific number of executions simultaneously.

Additionally, AWS introduced a feature known as “Provisioned Concurrency”. This allows you to keep functions initialized and hyper-ready to respond quickly without hitting cold-start (Lambda container initialization and Lambda function user code initialization) delays which is helpful for applications with predictable traffic patterns since it reduces any cold-start latency.

All of this means that if you need to handle a lot of concurrent executions (more than 10K), AWS Lambda might not fit your case. In that case, you may need to consider going with containers – but first, decide whether you genuinely need concurrency which is this high. The main two approaches to reduce concurrency requirements for a single serverless application are:

Consider batch processing, especially if you have an event-driven architecture (for example, trigger Lambda from a Kinesis stream with batches of records up to 10K or SQS queue with batches of messages up to 10K). The tradeoff is slightly more complex logic, and slightly delayed processing. Of course you can also add logic to minimize delays, especially if the traffic patterns are unpredictable.
If batching is not an option in your architecture, you may consider running AWS Lambda functions on multiple regions or even multiple AWS accounts. This option is more of a workaround – it doesn’t have the scalability that your application probably should if you have so many users already.

In addition to concurrency limits, AWS Lambda uses two types of concurrency controls for your function: “reserved concurrency” and “unreserved concurrency”. The unreserved concurrency pool, which has an account-level limit, is shared among functions that don’t have a specified level of reserved concurrency. Additionally, AWS controls burst concurrency depending on regional capacity. Even with these limits, Lambda functions scale much better than other types of computes on AWS.

Check out this blog post to understand how AWS Lambda scaling speed compares to the other AWS services.

Does Serverless Make Sense for Every Case?

While AWS Lambda is a powerful tool for many applications, there are certain situations where it might not be the most suitable choice:

Long-Running Processes: AWS Lambda functions have a maximum execution limit (currently 15 minutes). Therefore, tasks that require longer processing times may not be suitable for Lambda and might be better handled by an EC2 instance or a container on ECS or EKS.
Stateful Applications: AWS Lambda is designed for stateless computing. If your application requires a persistent state between invocations or long-lived connections, it may not be a good fit for Lambda.
High-Performance Computing: Lambda may not be the best fit if your workload requires high-performance computing with powerful processing capabilities, such as graphics-intensive applications.
Large File Processing: AWS Lambda has a limit on the deployment package size and also on the ephemeral disk capacity (“/tmp” space). If your application needs to process huge files that exceed these limits, it might be better suited to a different compute service.
Real-Time, Multiplayer Gaming: AWS Lambda may not be the best choice for real-time, multiplayer games which require persistently open connections and low latency communications.

Remember to evaluate your specific use case and requirements before deciding whether AWS Lambda or another service is the best fit, because there are no silver bullets. As economist Thomas Sowell (A Conflict of Visions: Ideological Origins of Political Struggles) said, “There are no solutions. There are only trade-offs.”

Monitoring serverless applications

Whenever designing any application, ask yourself these questions from the start: “When something goes wrong, how will I know that something is wrong, how will I know what is wrong, how will I know what to do to fix it, and how can I verify that the fix worked?” Remember that potential issues can be part of your application (“bugs”), caused by an interaction between your application and another application or service (overloaded upstream queue? Unhandled exception from a service provider?), or it can be caused by a service you may not know you rely on (an ISP failure between your application and your users?)

At Catchpoint, we always start monitoring by measuring what the customer experiences. This part doesn’t change between serverless or traditional architectures – a web browser connecting to an application still makes a DNS lookup, a request to the application URL, individual requests for each component, etc. With monolithic applications, a single-parent service may handle all of the requests, so this service can keep track of them all. What’s different with serverless is that the individual requests may each be processed by an entirely different function. The monitoring mechanism needs to ensure that a request handled by one function can be correlated to a request handled by another function.

One nice thing about monitoring serverless applications is that the architecture is already broken down into small components – so tracing the performance of a request through each component is simply a matter of finding a start and stop time of the function. This is the same promise that was made by microservices: they break the overall architecture into pieces so that it’s easier to understand what each piece does. Just like with the overall system architecture – whether you use functions, microservices, containers, or traditional computes, monitoring should be something analyzed from the beginning and planned into the design.

Catchpoint advances microservices and serverless monitoring

Earlier this year, we announced the acquisition of the assets of Thundra.io, a pioneer in cloud monitoring, a move that strengthens Catchpoint’s Application Experience Solution with advanced microservices and API monitoring capabilities, both critical aspects of Internet Performance Monitoring (IPM). Catchpoint engineering teams are currently in the process of seamlessly incorporating Thundra’s core features into our already industry-leading IPM platform. Soon, Catchpoint users will be able to observe the entire customer experience from the user to the API call, and to the server trace.

Navigating an ever-evolving landscape

We’ve conducted a comprehensive exploration of the benefits, trade-offs, and real-world applications of microservices and serverless architectures throughout this blog series. From understanding the rationale behind Amazon Prime Video's strategic architectural redesign to navigating our own journey with the tracing collector API, we unraveled the complexities and lessons learned from integrating serverless components into applications.

As we conclude this series, we acknowledge that the world of microservices and serverless architectures is ever evolving, and there are no silver bullets. Each architectural choice comes with its unique set of trade-offs and challenges. By adopting a thoughtful and informed approach, understanding the specific requirements of our applications, and leveraging the power of monitoring and optimization, you can build robust, scalable, and reliable systems that drive superior performance and deliver exceptional user experiences.

Happy building!

Does AWS Lambda Have Scalability Issues?

AWS Lambda applies two types of concurrency limits: a general account-level limit and a function-level limit.

Account-level Concurrency Limit: This is the total concurrent executions limit across all functions within a single region for an AWS account. The default limit is 1000 concurrent executions per region, but this can be increased to 10K by requesting a limit increase from AWS.
Function-level Concurrency Limit (or Reserved Concurrency): You can set a specific concurrency limit for a single function within your account. This can be useful to ensure that a particular function does not consume all of your account’s available concurrency. By reserving concurrency for a function, you ensure it has a minimum capacity to run a specific number of executions simultaneously.

Consider batch processing, especially if you have an event-driven architecture (for example, trigger Lambda from a Kinesis stream with batches of records up to 10K or SQS queue with batches of messages up to 10K). The tradeoff is slightly more complex logic, and slightly delayed processing. Of course you can also add logic to minimize delays, especially if the traffic patterns are unpredictable.
If batching is not an option in your architecture, you may consider running AWS Lambda functions on multiple regions or even multiple AWS accounts. This option is more of a workaround – it doesn’t have the scalability that your application probably should if you have so many users already.

Check out this blog post to understand how AWS Lambda scaling speed compares to the other AWS services.

Does Serverless Make Sense for Every Case?

While AWS Lambda is a powerful tool for many applications, there are certain situations where it might not be the most suitable choice:

Long-Running Processes: AWS Lambda functions have a maximum execution limit (currently 15 minutes). Therefore, tasks that require longer processing times may not be suitable for Lambda and might be better handled by an EC2 instance or a container on ECS or EKS.
Stateful Applications: AWS Lambda is designed for stateless computing. If your application requires a persistent state between invocations or long-lived connections, it may not be a good fit for Lambda.
High-Performance Computing: Lambda may not be the best fit if your workload requires high-performance computing with powerful processing capabilities, such as graphics-intensive applications.
Large File Processing: AWS Lambda has a limit on the deployment package size and also on the ephemeral disk capacity (“/tmp” space). If your application needs to process huge files that exceed these limits, it might be better suited to a different compute service.
Real-Time, Multiplayer Gaming: AWS Lambda may not be the best choice for real-time, multiplayer games which require persistently open connections and low latency communications.