Cloud Logging Tip & Tricks: Getting the most value out of your cloud logs

In the cloud, logs are often the only way to get real-time visibility into what's happening, making them critical to any cloud detection and response program.

13 minutes read

Cloud logging is a critical piece of any cloud detection and response program. Logs have always been an important source of information for the security operations center, but in the cloud, where the shared responsibility model often limits the access that organizations have to managed services and where infrastructure is managed through centralized APIs, logs are often the only way to get real-time visibility into what’s happening in the cloud environment. 

Managing cloud logs is essential for cloud security, but understanding the world of logging can be complex. Each cloud provider offers different log categories, each with unique names, formats, and configuration options, and ensuring all necessary logs are enabled can be quite difficult. Understanding which logs you need to collect and the relevant use cases for each is often a real challenge for teams.

We’re here to help!  In this guide, we'll introduce you to different log types and unveil some clever tricks we've picked up over the years to optimize logging configuration without straining your budget. So, fasten your seatbelts and join us on this enlightening expedition into cloud logging!

Our cloud logging framework: categorizing and prioritizing log collection 

To make cloud logging more approachable, we’ve developed a framework to categorize cloud logs by their security use case. Our framework has two approaches. First, we’ve mapped each cloud log source into categories: logs that are related to the identity, data, network, compute, and the control plane. Each category includes the most relevant logs that can assist in preparing for a cyber attack targeting the specific resources in the category. For example, to detect data exfiltration targeting S3 buckets, organizations would need to monitor S3 data events in CloudTrail.

Logs by Category

The second approach uses the MITRE ATT&CK cloud matrix, a widely recognized framework for benchmarking attacker tactics. We map the various log categories according to the MITRE scheme techniques. For example, to detect attackers leveraging valid accounts techniques, organizations should ensure they’re monitoring Azure Entra logs. By understanding these categories, you can determine which logs are crucial for preparing for or investigating each phase of an attack.

MITRE chart color-coded by log category

And now, let’s dive into each category to better understand why each is useful, the types of logs that are included in each, and practical tips to optimize your log collection.

Category 1: Control

AWS Cloudtrail, Azure activity log, Azure sign in, Azure audit, GCP Audit Logs, Google Workspace Audit Logs

If you have to prioritize which logs to collect, logs from the control category are the prime choice. These logs are often referred to as management logs, and are invaluable for capturing a wide spectrum of potential threats along the cyber kill chain. While other logs give insight into specific parts of the kill chain, management logs offer the most comprehensive coverage and give insights into all parts of a cyber attack, from initial access to impact.

What do these logs contain? 

They cover control plane identity and resource management activities such as resource and user creation, deletion, and modification. Additionally, they record user and app sign-ins and read activities on resources, including listing all resources and retrieving metadata about specific resources. In all cloud providers, these logs are available by default in the portal/console.

Why should you collect them? 

The management category offers the broadest coverage, providing visibility into multiple aspects of the kill chain. Most significantly, they excel at capturing cloud-native initial access and discovery techniques, offering proactive detection capabilities against potential attackers. 

What Can You Detect Using The Logs?

  • Sign ins from unusual locations/providers

  • Password brute force

  • Creation of users, roles, VMs or Functions for persistence 

  • Enumeration activity across multiple services

  • Unusual resource creations

  • Data destruction

Configuration Tips and Tricks

AWS – CloudTrail logs

  • Use Organization Trail. Logs are saved by default in the Cloudtrail console for 90 days, and you can create a trail to extend the retention or to send the logs to a different location. We recommend using the Organization trail option, which automatically creates a new trail for new accounts, ensuring comprehensive logging across all accounts within your organization. It's surprising how easy it is to lose track of newly created accounts lacking any logs, and organization trails are the solution to ensure you maintain visibility.

  • The first trail is free - You can deliver one copy of your ongoing management events to an S3 bucket for free using a trail.

Azure – Subscription activity logs and Entra sign-in and audit logs

  • Extend retention. The default retention period for subscription activity logs in the portal is 90 days, while AD audit and sign in logs range from 7 to 30 days, depending on the license. In practice, longer retention periods are often needed when investigating threats. We highly recommend setting up a diagnostic setting to send the logs to an additional destination where they can be retained for longer.

  • No read activity. Azure control logs lack visibility into most control plane read activities, making them insufficient for detecting discovery attempts. Although you can find logs about Access to data resources in Resource level logs, actions such as listing resources, users and policies are not logged anywhere.

GCP

  • Audit Logs are highly centralized. Unlike the other providers, GCP centralizes most log data within a single Audit Log instead of separating data into different log streams. Organizations can simply enable different event types within the single audit log stream.

  • Write Activity Logs are free. Audit Logs encompass several types: Admin Activity, Data Access, System Event, and Policy Denied. Admin Activity logs, which cover control plane write activity, are enabled by default and retained for 400 days, which is typically sufficient for most use cases.

  • Enable Read activity. The Admin Activity log only includes control plane write operations. To enable control plane read operation you need to enable Data Access “ADMIN_READ” logs for all services.

  • Additional Write Activity. It's worth noting that specific changes to resource configurations are logged in the Data Access "DATA_WRITE" logs, but these logs are not enabled by default. We will cover those logs in the upcoming sections.

Google Workspace

The five types of Google Workspace logs (Admin, Login, SAML, OAuth, Groups) are enabled by default and retained for 6 months in the Admin console.

  • Stream logs to GCP. We recommend streaming Google Workspace logs into GCP audit logs. Streaming logs is available for free and allows the security team to consume them in a single location within the organization.

  • No logs for errors. It's important to note that all log types, except for the Login category, do not include error logs. This omission can complicate the detection of unauthorized attempts to modify identity resources.

Here is a partial Azure user sign-in log:

{

  "appDisplayName": "Azure Portal",

  "appId": "c44b4083-3bb0-49c1-b47d-974e53cbdf3c",

  "resourceDisplayName": "Windows Azure Service Management API",

  "resourceId": "797f4846-ba00-4fd7-ba43-dac1f8f63013",

  "authenticationRequirement": "multiFactorAuthentication",

  "userDisplayName": "User Name",

  "userId": "11111111-1111-1111-1111-111111111111",

  "userPrincipalName": "user@company.co",

  "ipAddress": "12.23.34.45",

  "location": {

    "city": "New York",

    "countryOrRegion": "US",

    "geoCoordinates": {

      "altitude": null,

      "latitude": 40.74839,

      "longitude": -73.9856

    },

    "state": "New York"

  },

  "signInEventTypes": [

    "interactiveUser"

  ],

  "status": {

    "additionalDetails": null,

    "errorCode": 0,

    "failureReason": "Other."

  },

  "userAgent": "Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion",

  ...

An effective method for detecting compromised user accounts is to analyze sign-in logs such as the previous one, particularly by identifying sign-ins from unusual countries or providers. Azure simplifies this process by including location information directly within the “location” object in the log.

Category 2: Data Access

A significant proportion of cloud-focused attacks in recent years have targeted data, including data destruction, exfiltration, and ransomware, like in Peach Sandstorm’s password spray campaigns, LAPSUS$ campaigns and more. Therefore, prioritizing efforts to protect company data becomes imperative. To achieve this goal, it's essential to familiarize yourself with various logs related to data access.

What do these logs contain? 

In all Cloud Providers, there are numerous data-related services such as S3 and RDS in AWS, Storage accounts and SQL databases in Azure, Storage Buckets and Cloud SQL in GCP and much more. Each of these services has its method of collecting data access information. For instance, services used for file storage typically contain logs detailing file retrievals, while those related to databases typically record SQL query executions.

Why should you collect them? 

Monitoring these logs serves as a crucial defense against a multitude of attack scenarios targeting data, including data exfiltration, manipulation, and ransomware attacks. The importance of these logs also becomes clear when investigating the consequences of a misconfiguration or a cyberattack: they play a crucial role in determining whether exposed company data was actually exfiltrated, and, if so, identifying the specific data that was compromised.

What can you detect using the logs?

  • Unusually large amounts of data retrieval, copying, or deletion requests.

  • Abnormalities based on geographic locations.

  • Unusual access from anonymous users.

  • Data destruction.

Configuration Tips and Tricks

AWS

  • Activate CloudTrail Data events for data resources. CloudTrail Data events contain logs for multiple data related services, including S3 Buckets, DynamoDB and EMR (they also include non-data resources like Lambda functions). They record access to stored data in detail and can be set up in a trail using event selectors. Data events can often be very high-volume - to optimize collection and reduce unnecessary storage costs, you can use advanced event selectors to collect data events only from chosen resources.

  • Avoid common S3 bucket log duplication. You can also utilize S3 Access logs to track access to your Amazon S3 bucket. These logs provide detailed records of requests made to the bucket. However, it's important not to enable both S3 data events and S3 Access logs for the same bucket, especially if you're concerned about costs. Since these logs provide similar information, duplicating data isn't justified by the added expense. Generally, we recommend enabling only Data events because they contain more AWS identity-based information, are easier to configure, and store all the necessary details for understanding access to the buckets.

  • Collect Redshift and RDS logs. Redshift logs are enabled per cluster and can be directed to either CloudWatch or S3 for storage. Configuring RDS logs is more complex: they are enabled per cluster or instance using Parameter Groups or Option Groups (depending on the type of RDS). These logs are gathered within the resource and can be sent to CloudWatch for centralized viewing.

Azure

  • Enable data logs using diagnostic settings. Azure data logs are configured for each resource separately using its Diagnostic settings.

  • Turn on Entra authorization for storage accounts. In storage accounts, if Microsoft Entra authorization isn’t the default authentication type, actions that were made by AD users will be displayed in the logs with SAS token IDs instead of user names, which makes it extremely hard to understand who performed each action during an investigation. Enabling Entra authorization by default is also advised as it ties user actions to their permissions. 

GCP

  • Turn on Data Access logs. GCP Data access logs are configured individually for each service, or universally using the "all services" parameter. While enabling logging for all services can be convenient, it's important to note that it can also incur significant costs.

Optimize logging cost for public resources

Another way to optimize data events cost across cloud providers is to disable data read events for public resources. These events tend to be data-intensive and offer lower security value. For example, your organization probably doesn’t need to be collecting read events for an S3 bucket storing static files for a public site. Remember that enabling Write events for these resources remains crucial.

Partial AWS CloudTrail Data Event: S3 Log Object Retrieval

{

  "awsRegion": "us-east-1",

  "eventCategory": "Data",

  "eventName": "GetObject",

  "eventSource": "s3.amazonaws.com",

  "eventTime": "2023-03-19T17:35:57Z",

  "managementEvent": false,

  "readOnly": true,

  "recipientAccountId": "012345678910",

  "requestParameters": {

    "Host": "aws-cloudtrail-logs.s3.amazonaws.com",

    "bucketName": "aws-cloudtrail-logs",

    "key": "AWSLogs/012345678910/CloudTrail/ImportantLogFile.json.gz"

  },

  "sourceIPAddress": "12.23.34.45",

  "userAgent": "Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion",

  "userIdentity": {

    "accessKeyId": "ASIAABCDEFJHIGKLMN",

    "accountId": "012345678910",

    "arn": "arn:aws:sts::012345678910:assumed-role/AWSRole/BadUser",

  ...

Monitoring buckets containing sensitive company information is crucial, but additional buckets also require attention. For instance, buckets storing logs are highly valuable to attackers seeking to blend into the environment as they allow them to learn the environment and mask their activities as normal behavior. It's essential to restrict access to log buckets strictly and monitor them closely for any signs of unusual access. 

Category 3: Network

Flow logs, DNS logs and Firewall logs

What do these logs contain? 

Network logs provide insight into all communication between resources within the cloud environment and with external entities. Unlike network sniffing, these logs primarily store metadata of the communication rather than the content itself. For example, network logs usually include destination and source IPs, ports, communication size, external domains and more.
Across various cloud providers, Flow logs capture network communications between virtual machines, DNS logs monitor all DNS queries, and Firewall logs record communication permitted or blocked by the firewall.

Why should you collect them? 

Monitoring network logs serves as a critical tool for detecting compute-focused attack attempts. With on-premises attack techniques already well-established among threat actors, many of these methods are now being employed on compute resources hosted in the cloud as well. Communication logs play a significant role in detecting these threats. 

What can you detect using the logs?

  • Password brute force on VMs.

  • Port and IP scans.

  • Communication using unusual ports.

  • Communication with known malicious IPs and Domains.

  • Internal compute resources discovery.

Configuration tip and tricks

AWS

  • Enable VPC Flow Logs. You can create a flow log for a VPC, a subnet, or a network interface and send it to different locations such as, S3 buckets, CloudWatch and Firehose. Although it's advisable to enable flow logs for all VPCs, in practice, this can be quite costly. Therefore, prioritize enabling them for critical or internet-facing VPCs to ensure comprehensive monitoring and security while balancing cost-effectiveness for your organization.

  • Add additional fields to Flow Logs. The default version for VPC Flow Logs is version 2, which lacks valuable information that can be added as custom fields. Our recommendation is to add those fields manually to the default fields set: 

Flow-direction, instance-id, pkt-dst-aws-service, pkt-dstaddr, pkt-src-aws-service, pkt-srcaddr, region, subnet-id, tcp-flags, traffic-path, type, vpc-id

Azure 

GCP  

  • Turn on Flow Logs. When you enable VPC Flow Logs, you enable logging for all VMs in a subnet. You can use filters to enable logs for specific VMs.

Category 4: Secrets

AWS secret manager and KMS logs, Azure Key vault logs and GCP Secret manager and Cloud KMS logs

What do these logs contain?

Logs in the secrets category contain information about access to secrets, and changes in their permissions.

Why should you collect them? 

Secrets are often considered a treasure sought after by attackers, granting them the means to move laterally to other resources. Therefore, monitoring for unusual access to secrets is crucial. Analyzing logs for access to secrets allows you to spot instances of secrets exfiltration. Additionally, monitoring encryption attempts or key deletions can aid in identifying ransomware attacks.

What can you detect using the logs?

  • Mass encryption for ransomware.

  • Deletion of encryption keys for ransomware.

  • Unusual secret retrieval.

  • Key vault configuration changes.

Configuration Tips and Tricks

AWS

  • Default Logging using CloudTrail. KMS and Secret Manager logs are logged by default in the Cloudtrail management logs. Make sure that when configuring the trail, the “Exclude AWS KMS events” is not marked. Note that Decryption events are the second most used event in Cloudtrail, potentially resulting in significant log volumes and increased logging costs. 

Azure

  • Enable Key vault Logs. Key vault logs are configured for each vault separately using diagnostic settings.

GCP


Category 5: Compute

What do these logs contain?

Compute logs refer to the logs of the compute resources in the environment, such as virtual machines, functions and containers. Some of them are OS logs, collected by the cloud provider from inside the instances/VMs using agents. Function logs relate to function manipulation and activation, and Container logs contain information about control plane activity within the cluster, including resource orchestration and editing.

Why should you collect them? 

Threat actors targeting the cloud have found value in reusing techniques honed during years of operations in on-prem networks, since many of these techniques are just as applicable to virtual machines and containers as they are to physical devices. Furthermore, attackers frequently leverage functions for execution and persistence within cloud environments.

What can you detect using the logs?

  • Malware files upload and execution

  • Command execution on VMs

  • Credentials retrieval using the metadata service

  • Abusing lambdas for persistence

  • Containers created with high permissions

Configuration tips and tricks

AWS

  • Enable Instance Logs. All control plane activity related to Instances is logged in Cloudtrail management by default, and system-level logs can be configured using CloudWatch agents that collect the logs and send them to CloudWatch.

  • Lambda logs. These are collected in CloudTrail data events using event selectors for the specific resource. If you want to collect data events of specific resources you can use the advanced event selectors option.

  • EKS logs. There are five types of EKS logs (API server, Audit, Authenticator, Controller manager, Scheduler), that are enabled separately for each cluster and then sent to CloudWatch.

Azure

  • VM logs. VM logs are gathered through Azure monitor agents, which are installed within the VMs. These agents retrieve multiple log types including Event Logs for Windows and Syslog for Linux.

  • Function, App service and AKS logs.These are configured for each resource using Diagnostic settings.

GCP

One final "champion" tip:

In many cases, empty regions do not generate logs, so enabling logging in these regions will incur minimal costs. Attackers often exploit unused regions, as they are typically less monitored, but once you have logs enabled, malicious activity in relatively quiet regions is easier to spot, since there's less noise to blend in with. Therefore, we recommend enabling logs such as CloudTrail and Flow Logs in unused regions, as it is a cost-effective measure to enhance security.

Conclusion

Effective monitoring of cloud logs is essential for rapid threat detection and response. In cloud environments, log data is often the only way that organizations can gain real-time insights into their cloud environments, identify anomalies, and react swiftly to potential incidents. This guide has provided practical steps to help you optimize your logging configuration, from selecting the right log types to collect to implementing cost-effective strategies to ensure you’re getting the most value out of the logs you gather.

Cloud logging is a dynamic process. As your cloud environment evolves, so too should your logging strategy. Continuously assess your logging practices to ensure they remain aligned with your evolving detection and response needs.

Continue reading

Get a personalized demo

Ready to see Wiz in action?

“Best User Experience I have ever seen, provides full visibility to cloud workloads.”
David EstlickCISO
“Wiz provides a single pane of glass to see what is going on in our cloud environments.”
Adam FletcherChief Security Officer
“We know that if Wiz identifies something as critical, it actually is.”
Greg PoniatowskiHead of Threat and Vulnerability Management