Set Retention For CloudWatch Logs Using A Lambda Function

In one of my earlier post, I demonstrated how one could set retention for CloudWatch Logs using CLI command that could be run from cron or a CI/CD server like Jenkins.

A user asked if the same could be done from the console or a function.

I am not sure how one would go about setting the retention period from the console for all Log Groups in one go, but a Lambda Function can certainly be used to accomplish this.

In this post, I am going to show how we could set retention for CloudWatch Logs using a Lambda function. CloudWatch Events rule will be used to trigger the Lambda Function execution.

Background

Amazon provides us with CloudWatch logs to monitor, store and access log from various AWS services like, EC2 instances, Route 53, RDS, AWS CloudTrail, AWS VPC Flow Logs, Lambda and many others.

By default CloudWatch Logs are kept indefinitely and never expire. We are allowed to set a retention period and at present it can be set to a period between 10 years and one day.

One of the big users of CloudWatch Logs is Lambda service. All logging statements from Lambda are written to CloudWatch Logs. As Lambda usage grows in an account, so will the amounts of logs in CloudWatch Logs.

All Lambda functions by default will create a Log Group inside CloudWatch logs matching their name. As an example, a HellowWorld Lambda function will create a Log Group /aws/lambda/HelloWorld with retention period set to ‘Never Expire’

In an AWS account with a lot of work going on in Serverless space one could end up with many log group which retain logs indefinitely.  

Amazon has given us the ability to change the retention period therefore as an admin we should periodically review and set accordingly. However, in a multi-region account doing this weekly or daily can be a burden.

Requirements

Make sure you have installed the Serverless Framework on your workstation.

Ensure that the workstation from where you are going to run the serverless framework has all the necessary permissions to publish your Lambda function. The workstation should have a ‘IAM Role’ or ‘Access Keys’ configured correctly.

In my case I am using an AWS Profile called ‘automation’ that I have configured previously.

Solution

To create the template for our work, I run the commands as shown below.


sls create -v --template aws-python3 --path LogRetention
Serverless: Generating boilerplate...
Serverless: Generating boilerplate in "/data/sbali/workarea/aws/lambda/LogRetention"
 _______                             __
|   _   .-----.----.--.--.-----.----|  .-----.-----.-----.
|   |___|  -__|   _|  |  |  -__|   _|  |  -__|__ --|__ --|
|____   |_____|__|  \___/|_____|__| |__|_____|_____|_____|
|   |   |             The Serverless Application Framework
|       |                           serverless.com, v1.72.0
 -------'

Serverless: Successfully generated boilerplate for template: "aws-python3"

cd LogRetention

Here you will see two files serverless.yml and handler.py, I are going to update these files with our code as shown below.

The file serverless.yml contains information about our Lambda function and the IAM role that will be needed by the function.

File handler.py is where we will write the code which accomplishes our task.

Let us take a look at the serverless.yml file.

Provider Section

  • The Cloud Provider and runtime
  • The profile that will be used to publish this function. Instead of a profile you can set the AWS Access Keys on the shell.
  • Tags associated with my Lambda Function.
  • The AIM role that will be associated with the Lambda Function when it runs.

Function Section

  • memorySize for the Lambda function.
  • The timeout period for the Lambda Function. If function takes longer than 10 seconds, it will timeout.
  • The CloudWatch events schedule that runs our function twice a day at 00:01 and 12:01 GMT.
  • enabled:true implies that the rule is enabled and the function would run as specified above.

The Lambda function code is in handler.py and is quite self explanatory.

There is a paginator that has been setup for ‘describe_log_group’ which will get all the Log Groups in your account in the region in which the function is running.

I have set PageSize to 10, but you can increase this to a higher number.

If the retrieved log group already has ‘retentionInDays’ key then I am ignoring it, otherwise I use ‘put_retention_policy’ for the log group and set it to 7 days.

The code listed here is very basic and meant to be a running example. You should enhance it for running in a production environment.

Deploy

If you are ready to deploy the function, go ahead and run the command as shown next.


sls deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
........
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service cwl.zip file to S3 (639 B)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
........................
Serverless: Stack update finished...
Service Information
service: cwl
stage: prd
region: us-east-1
stack: cwl-prd
resources: 8
api keys:
  None
endpoints:
  None
functions:
  logretention: cwl-prd-logretention
layers:
  None
Serverless: Run the "serverless" command to setup monitoring, troubleshooting and testing.

If there are no errors, you should see output similar to the above.

Most of the times, errors if any would be related to access/AWS profile issues. I have listed some links below at the end of the article which may be useful for you to resolve your errors.

Testing

You can either wait for the CloudWatch Events rule to trigger your function, or you can go to the Lambda console and create a test event to manually run your function.

See my other posts to get an idea of how to do a test run of a Lambda function.

Here is an image showing ‘Never Expire’ setting for the Log Groups.

Here is the output of my test run.


START RequestId: xxxxxxxxx Version: $LATEST
token  None
/aws/lambda/cwl-prd-logretention
token  /aws/lambda/cwl-prd-logretention
END RequestId: 8153a0ba-8bc8-4e1a-a230-29e6f58be40b
REPORT RequestId: xxxxxxxxx	Duration: 395.88 ms	Billed Duration: 400 ms	Memory Size: 128 MB	Max Memory Used: 71 MB	Init Duration: 359.66 ms

The following image shows that the function ran and set the log retention period for log groups which did not have any retention period.

Remove Resources

If you no longer wish to keep this function, you can remove the resources that were created by running this command.

You may incur AWS charges if you keep this function enabled.


sls remove

Further Reading And Improvements

Photo Credit

unsplash-logoChris Ried

Leave a Reply