Blog Post

Navigating the Serverless Landscape: Lessons from our Tracing Collector API Journey

Published

July 27, 2023

mins read

Serkan Ozal

Sergey Katsev

in this blog post

In the previous blog in this series, we delved into the redesigned architecture of Amazon Prime Video and how they integrated different architectural styles for optimal performance and cost efficiency. We also discussed the impact of Amazon’s decision on the concept of a “serverless-first” mindset, highlighting the importance of considering alternative architectural approaches based on specific use cases and requirements.

In this installment, we turn the spotlight on our journey with the implementation of our tracing collector API, drawing parallels with the Prime Video case study. We will share our experience, the challenges we faced, and the lessons we learned. Furthermore, we aim to address critical questions around the scalability of AWS Lambda, the applicability of serverless architectures in different scenarios, and the importance of monitoring in any architectural setup.

Our Experience with Serverless Components: The Tracing Collector API Case Study

Our journey with implementing our tracing collector API bears a striking resemblance to Prime Video’s experience. The initial implementation, based on AWS Lambda and AWS API Gateway, allowed us to deploy the first version swiftly and gather user feedback promptly.

As we worked to productize this initial version, we ran into some challenges:

We developed the collector Lambda function in Java, and it was running on the JVM – and encountered high cold-start delays. As the number of incoming requests (and users) increased, the cold-start average was very low compared to the total number of requests. However, adding 3-4 seconds latency to user telemetry data requests was not good – even if it was rare. Today, AWS Lambda offers features such as “Provisioned Concurrency” or “SnapStart” to solve this problem, but these didn’t exist at that time. To solve the cold-start problem, we developed an in-house tool to prevent Lambda instances from being destroyed by triggering Lambda functions with fake empty requests (available as open source). With this tool, we could reduce the incidence of cold-start by several orders of magnitude. However, it still occasionally happened – so this wasn’t a satisfactory result for our platform – but it might be for other applications.
Another problem we encountered is that as the number of our users increased, the number of requests coming to our collector API increased, and the AWS API Gateway cost per request increased. In addition, since we often do I/O bounded operations in the collector (for example, sending telemetry data to the Kinesis stream), a significant part of the Lambda execution was spent waiting for the I/O operation to finish. This is important because Lambdas handle only one request at a time, so although the CPU was idle in Lambda instance, we were paying for compute time waiting for the I/O response.

For these reasons (cold-start delay, API Gateway request cost, AWS Lambda idle CPU cost), we decided to migrate our collector API from a serverless to a non-serverless architecture. In this case, we redesigned our collector API by replacing AWS API Gateway with AWS ALB service and AWS Lambda with AWS ElasticBeanstalk service.

After performing this transformation, the rest of our platform (especially the parts with async processing, which weren’t directly user-facing) remained serverless for years and provided great scalability with no issues for cost.

Starting the collector architecture using serverless architecture allowed us to release our product earlier, an essential requirement. It allowed us to get crucial early feedback from customers. If we reencounter a similar scenario, we will probably follow the serverless-first mindset and consider starting with a serverless architecture again - and redesign it later if necessary.

Key takeaways

Our journey with implementing our tracing collector API illuminated various challenges, including cold-start delays and increased costs associated with using AWS API Gateway and AWS Lambda. These challenges were compounded as our user base and, consequently, the volume of incoming requests to our collector API expanded. Despite the undoubted advantages of a serverless architecture, we recognized that, in our particular case, transitioning to a non-serverless architecture provided more efficient solutions to these challenges. This led us to ask an important question: Does AWS Lambda have scalability issues? Answering that question will form the basis for the next installment in this series.

Our Experience with Serverless Components: The Tracing Collector API Case Study

As we worked to productize this initial version, we ran into some challenges:

We developed the collector Lambda function in Java, and it was running on the JVM – and encountered high cold-start delays. As the number of incoming requests (and users) increased, the cold-start average was very low compared to the total number of requests. However, adding 3-4 seconds latency to user telemetry data requests was not good – even if it was rare. Today, AWS Lambda offers features such as “Provisioned Concurrency” or “SnapStart” to solve this problem, but these didn’t exist at that time. To solve the cold-start problem, we developed an in-house tool to prevent Lambda instances from being destroyed by triggering Lambda functions with fake empty requests (available as open source). With this tool, we could reduce the incidence of cold-start by several orders of magnitude. However, it still occasionally happened – so this wasn’t a satisfactory result for our platform – but it might be for other applications.
Another problem we encountered is that as the number of our users increased, the number of requests coming to our collector API increased, and the AWS API Gateway cost per request increased. In addition, since we often do I/O bounded operations in the collector (for example, sending telemetry data to the Kinesis stream), a significant part of the Lambda execution was spent waiting for the I/O operation to finish. This is important because Lambdas handle only one request at a time, so although the CPU was idle in Lambda instance, we were paying for compute time waiting for the I/O response.