How to avoid DoS and design resilient serverless applications is one of the most common topics we hear when discussing AWS Lambda security with organizations that are in the process of adopting serverless architectures.
In this blog post, I’ll cover the different methods for invoking AWS Lambda functions, why it’s important to be aware of things such as retry behavior and concurrency limits, how attackers can leverage poor application and software design to cause Denial of Service, and what are the recommended mitigation strategies.
The first thing that usually comes to mind when we think of the word “serverless” is scale. One of the biggest advantages of going serverless is that you don’t need to worry about scale or capacity planning anymore. The cloud provider does all the “heavy lifting” for you.
In reality, this is only partially true. When designed correctly, serverless applications are indeed much more resilient to spikes in traffic and can easily scale to support high bandwidth. However, there are certain limitations that you need to be aware of and best practices that you must follow for that to happen as planned. Otherwise, serverless applications can be vulnerable to Denial of Service attacks as any other application out there.
Lambda functions can be invoked either synchronously or asynchronously. To be clear, a synchronous invocation means that the service or API that invoked the Lambda function is going to wait for the function to finish running. On the other hand, when a Lambda function is invoked asynchronously, the invoker does not wait for a result.
When you manually invoke a Lambda function (using either AWS CLI or AWS SDK) you can specify what invocation type you want to use:
However, when you use an AWS service as a trigger, the invocation type is predetermined for each service. You have no control over the invocation type that these event sources use when they invoke your Lambda function. Below is a summary table, describing the different services, their invocation types and their behavior upon throttling:
“If the function is invoked synchronously and is throttled, Lambda returns a 429 error and the invoking service is responsible for retries. The ThrottledReason error code explains whether you ran into a function level throttle (if specified) or an account level throttle (see note below). Each service may have its own retry policy.” (AWS Documentation)
Let’s have a look at API Gateway events as an example for synchronous invocations.
An attacker that can control the amount of requests sent to API Gateway, will be able to cause throttling and as a result Denial of Service. Applications which use synchronous invocations are easier for an attacker to target since the feedback is immediate and the attacker quickly figures out if the attack is successful or not.
To demonstrate that, let’s do a small test. I’ve created a very simple Lambda function that waits 5 seconds before returning the response. Then, I used a simple Bash script to execute 3 batches of concurrent executions (50, 100 and 150):
I set a limit of 100 concurrent executions using the reserved capacity feature (when not set, the function will be able to fully consume the account l imit). As you can see in the metrics below, the third batch of 150 concurrent executions was throttled.
The same idea applies to other event sources in the same category. An attacker can take leverage of PreAuthentication Cognito triggers or mount an attack against a chat-bot application by causing throttling through the Lex intent integration.
Asynchronous Invocations“If your Lambda function is invoked asynchronously and is throttled, AWS Lambda automatically retries the throttled event for up to six hours, with delays between retries. For example, CloudWatch Logs retries the failed batch up to five times with delays between retries. Remember, asynchronous events are queued before they are used to invoke the Lambda function. You can configure a Dead Letter Queue (DLQ) to investigate why your function was throttled” (AWS Documentation)
Let’s take AWS S3 as an example. An application where the user controls the frequency in which objects are uploaded to the bucket, and as a result the concurrent executions of the Lambda functions, has a potential to be throttled.
I mimicked the previous test, now with a Lambda trigger by S3. Same scenario, a Lambda function that sleeps for 5 seconds with 3 batches of concurrent events (50, 100 and 150):
Have a look at the results (on the right).
When I tried to execute 150 concurrent S3 events while the function’s limit was 100
All of the events were processed successfully!
That’s the power of the AWS Lambda “retry” mechanism. We can also see that there were 71throttles , meaning that for some events, the Lambda service issued a retry more than once.
I then made another test, similar to the third batch of 150 events, now with a sleep time of 5 minutes instead of 5 seconds. Let’s see what happened (on the right).
The results show that at first, only 100 events were successfully processed. After 5 minutes another 46 events, and after another 5 minutes, the last 4 events were processed successfully as well. This really demonstrates how events are being retried when the concurrency limit is reached.
AWS states that Lambda “ automatically retries the throttled event for up to six hours …” meaning that a long Denial of Service attack can eventually cause loss of data .
Another possible danger with asynchronous invocations, besides the possibility of being throttled is the unexpected behavior of the application due to the ‘retry’ mechanism. If our Lambda functions are being invoked more than once, and we designed and planned for only one execution the application flow might break.Poll Based & Stream Based Invocations “AWS Lambda polls your stream and invokes your Lambda function. When your Lambda function is throttled, Lambda attempts to process the throttled batch of records until the time the data expires. This time period can be up to seven days for Amazon Kinesis. The throttled request is treated as blocking per shard, and Lambda doesn’t read any new records from the shard until the throttled batch of records either expires or succeeds. If there is more than one shard in the stream, Lambda continues invoking on the non-throttled shards until one gets through” (AWS Documentation) The potential victims here are applications with a DynamoDB Streams or Kinesis Streams triggers. An attacker can send a malformed batch of events to the stream (meaning events that will trigger an