CloudTrail Monitoring with CloudWatch

Setup with Terragrunt and Terraform

While studying AWS, I challenged myself to build the following CloudTrail monitoring with Terraform.

Basically, we create a trail tracking the management events and forward it to CloudWatch, from which we create metrics based on some security-related events, such as user creation and deletion, and security group activities, then create CloudWatch alarms with some simple threshold, and forward the alarm to SNS, and notify the user through email.

Toolings and directory structure

As we will create the base AWS resources, then test with different resource creation/deletion in the second step, I use the Terraform wrapper — Terragrunt to achieve this.

The directory structure is something like below,

The cloudtrail-n-cloudwatch is where the audit trail flow is set up. The users and security-groups are for the testing purpose in step two.

The root terrragrunt.hcl is listed as below,

The inputs block defines the input value for the access key, secret key, and region. The generate block creates a provider.tffile automatically under each directory, which initializes the AWS provider, defines the variables of access key, secret key, and region to receive the input values.

For each directory, create a terragrunt.hcl file with the following content,

It will look for the terragrunt.hcl file in the parent folder. Includes it in, and then auto-generates the provider.tf file, therefore initializing the provider configuration without repeatedly copying the file manually.

Trail Dependency — S3 Bucket and Log Group

To set up a trail, we need the dependent S3 bucket first. The s3.tf file creates the bucket and the bucket policy for CloudTrail to access.

We forward the logs to a CloudWatch log group, which is shown below,

Trail Setup

Before we create the trail, we need a role for the trail to send to the CloudWatch log group.

In the above role.tf file, create a variable to define the AssumeRole document. Then create an IAM role named as cloudtrail-roles with the AssumeRole content. Followed that create the permission policy with the JSON content. Lastly, use the aws_iam_role_policy_attachment resource to attach the policy to the role.

It is noticed that the arn for the log group is appended with “:*” to represent all the log streams under this log group.

Now we can create a trail,

The depends_on makes sure the s3 bucket policy is created before the trail creation.

CloudWatch Metric Filters and Alarms

We use a declarative approach to create the metric filters and alarms in the CloudWatch.

First, create the following locals block defining the metric and alarm attributes.

The log group stream forwarded by cloudtrail is in JSON format. So we can filter the event with the format of { $.eventName = DeleteUser } . When such an event is matched, we can create the corresponding metric. Define the alarm settings for the metric by using a simple threshold comparison to see within 5 minutes if the sum of such events appears more than a threshold.

The metric is created with the following block,

for_each block is used. First, create a map object using the metrics array defined. Then use the each.value to create the metric respectively.

Similarly, the CloudWatch alarm can be created with,

The alarm_action forwards the alert to SNS, which leads to the following resource.

SNS and Email Notification

The SNS and subscription are defined by the following.

Create the CloudTrail and CloudWatch Resources

Change directory to cloudtrail_n_cloudwatch, create the resources with Terragrunt,

Watch the resources are created successfully.

Validate the trail is created.

Validate the following S3 bucket and folders are created,

Validate the CloudWatch log group and metric filter are created. The alarm is created as below,

Check email and confirm the SNS topic email subscription.

Testing

In the user's directory, create the IAM user with the following,

Change to this directory, run terragrunt run-all apply to create 5 users.

Monitor the Alarm “new-user-created-alarm” is in an alarm state. The email notification is received.

If we create a security group, another alarm “security-group-changed” is fired.

If we destroy the users by running, terragrunt run-all destroy, monitor the user-deleted alarm is created.

Cloud explorer