New EKS Access Management and Pod Identity features: a security analysis
The Wiz research team unpacks the security implications of the new EKS access and identity management features and recommends best practices when using them.
On November 26 Amazon introduced EKS Pod Identity feature — a new way for cluster workloads to interact with cloud resources. About three weeks later Amazon presented yet another related feature — EKS access management. The introduction of these two new features closes the Cloud-Cluster authentication and identity loop that used to be a pain point for cloud and K8s operators. In this blog post we inspect the changes from the security perspective and answer the questions such as:
How do they affect existing security controls?
Are there new security pain points the new features introduce?
What are the security best practices when using the new features?
Overview of the new features
Introduction
In the next two sections we dive deeper into the implementation. Here we introduce the features on a higher level. In the cloud-to-cluster direction, EKS cluster access management is a new way for IAM users and IAM roles to connect to and manage the EKS clusters and resources via EKS API. This includes two new entities — “access entries” and “access policies”, which control AWS principals' access and permissions inside the EKS cluster respectively. In the cluster-to-cloud direction, EKS pod identity adds an alternative to an existing but somewhat rigid IRSA (IAM Roles for Service Accounts) solution.
Following is a high-level schema of the Cloud-to-Cluster-to-Cloud permissions and identity flow with changes to the flow highlighted in pink:
In addition, AWS has revamped the cluster Access tab to include all the new features in one place, which is very handy. In the same tab, you can perform CRUD operations on cluster access entries/policies and on the pod identities:
Before stopping on security implications, we will briefly outline the main implementation points of both features.
Cloud to Cluster
After a user creates a cluster, they automatically become the owner and are added to the group of cluster administrators (system:masters). From that point on, adding other cluster users was tedious, and not an IAM-native process of modifying aws-config ConfigMap — which maps IAM principals onto pre-existing K8s RBAC groups. With the addition of the cluster access management feature, cluster access can be controlled through EKS API without the need for K8s API access. aws-config becomes a secondary way of new user authentication. The new and preferred way is through the usage of access entries and access policies.
Authentication mode. To manage these two authentication mechanisms Amazon has introduced “Authentication Mode”. There are three possible values:
CONFIG_MAP — the old way and the only way for the clusters pre-v1.23
API_AND_CONFIG_MAP — both ways working together
API — API authentication only
Access entries. With the new authentication modes, access entry is a manifestation of AWS identity and in effect represents an “entry” to the cluster, similar to a record in aws-config. It is impossible to create an entry without a principal reference or to switch principals post-entry creation.
Access policies. Access policies essentially duplicate a big part of Kubernetes RBAC. A principal with access entry but without any associated access policy is equivalent to anonymous user in level of access. There is a built-in set of predefined access policies that includes the following four policies based on the existing user-facing roles:
We will explore these policies in the future blog. A nice feature to note: amazon ensures that deletion of access entry also deletes any associated access policies. In addition, for the more granular permissions, operators can map IAM principals to the existing K8s groups, similar to the mappings in aws-config. For now, AWS leaves the decision whether to use access entries / policies or aws-config to the operator. Nevertheless, we expect a gradual deprecation of the CONFIG_MAP mode — as supported by the advice in EKS best practices guide.
Cluster to Cloud
Before the EKS pod identity, there were several ways to provision permissions to a pod service account, each with its own disadvantages:
Via worker node identity, with its lack of granularity and overprovisioning as a result.
Using IRSA with its complex management.
By distributing AWS credentials across the application workloads and risking secret exposure.
EKS pod identity is here to solve the shortcomings of the above approaches. It enables a granular per-workload permission management (as opposite to over-provisioning the worker node identities); and it simplifies the IAM on the cloud side by making the SA-role association more flexible.
The visualized implementation appears as follows:
It is based on the eks-pod-identity-agent running on every worker node (as a DaemonSet) and serving as an intermediary for retrieving the necessary credentials from EKS Auth (by using a worker node identity) and returning them to the requesting container. In addition, pod-identity-webhook is responsible for augmenting the pod spec with the necessary environment variables and token mount.
Security implications and recommendations
While both EKS cluster access management and EKS pod identity are welcome new features, they represent a substantial change to the existing mechanisms. It is important to understand security implications. Amazon did an excellent job updating EKS security best practices guide with the relevant recommendations, but we want to take a step further and dig deeper into some of the topics.
Effective permission matrix calculation
The introduction of authentication modes introduces additional complexity when auditing the resulting permissions. API_AND_CONFIG_MAP authentication mode is probably the most notable of authentication modes, since most of the clusters fall in this category (see version distribution numbers from our 2023 Kubernetes security report).
When assessing the number of principals with cluster access and the magnitude of this access, both aws-config and EKS API should be taken into account. The final access matrix will be the union of the two methods for each IAM principal. Next, the audit should account for multiple access policies per access entry. The effective permission matrix, again, is a union of associated access policies. Note that associating the access policy with an already-associated access entry does not update it but rather adds an additional policy. Finally, in terms of in-cluster authorization, it is vital to understand the changed flow of K8s authorizers:
There are no DENY rules in K8s authorization flow, therefore, for the principal with access provisioned from aws-config and access entries, the resulting permissions set is again a union of authorization rules from the bound RBAC roles and associated access policies.
As we can see, the auditor (be it a tool or a person) needs to consider the authentication mode of the cluster and calculate permissions accordingly. Moreover, K8s-level API access is not enough for the auditor and EKS API access permissions are necessary. Here is our suggested flow of effective permission matrix calculation for the specific IAM principal on a cluster:
We also recommend periodic audits of policy semantics to detect associated policies that don’t make sense. For example, the same IAM principal is associated with multiple redundant access policies (i.e. AmazonEKSViewPolicy and AmazonEKSEditPolicy) might be a sign of policy update attempt signaling misunderstanding of effective permission calculation.
Protecting the identity token
The traditional vectors of lateral movement in a cluster are still relevant in EKS pod identity scenarios. Consider an escaped pod with filesystem access or even a pod with a sensitive hostPath mount and a read-only filesystem access to /var/lib/kubelet/pods. In this classic scenario, the malicious actor can easily read the eks-pod-identity-token and abuse it by calling for the pod-identity-agent endpoint directly to gain STS identity like so:
curl 169.254.170.23/v1/credentials -H "Authorization: STOLEN_TOKEN | jq .
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1281 100 1281 0 0 12317 0 --:--:-- --:--:-- --:--:-- 12436
{
"AccessKeyId": "ASIA…",
"SecretAccessKey": "IAYH…",
"Token": "…Ka",
"AccountId": "4…6",
"Expiration": "2024-01-03T00:00:57Z"
}
Cluster operators should review the existing RBAC permissions in light of potential token stealing. Consider namespace separation for workload with different risk profile if not employed yet. Review your Admission Controller rules to disallow access to /var/lib/kubelet. Better yet, consider splitting the nodes while considering risk profile and permission impact of the workloads (check out this guide). This can prevent neighbor token stealing.
Detection considerations
Association of Kubernetes principal with a high-privilege role is a significant security event that should typically be monitored. In the old order of things, the events such as creation of a new powerful ClusterRole or binding of new principal to cluster-admin with (Cluster)RoleBinding could have been detected/prevented via traditional cluster detection sources — Admission Controller and kube-audit log. With the introduction of EKS access management these detection sources are not sufficient because they are blind to the access entry event creation or access policy association. In this landscape, prevention with Admission Controller is not possible, and a proper detection solution should incorporate cloud AWS CloudTrail log for the full coverage (see, for example, Datadog examples of cloud log-based detections). Wiz has introduced several new detections in this area:
AWS principal granted cluster-admin role across an entire EKS cluster
AWS principal granted admin role across an entire EKS cluster
AWS principal granted cluster-admin role across multiple EKS clusters in a short period of time
AWS principal granted admin role across multiple EKS clusters in a short period of time
On a system level, we can imagine several potential privilege escalation scenarios. These scenarios are quite similar to K8s token stealing, but they do require detection adjustments. One such vector is for an attacker to steal the eks-pod-identity token by accessing the file. Another vector is the internal network connection to the local node pod identity server. The Wiz sensor team has introduced several new sensor detection rules:
Process used AWS EKS CLI to discover pod identities
Process used AWS EKS CLI to create or modify pod identities
Unrecognized access to AWS EKS pod identity token
Process used AWS EKS pod identity to assume role
Unrecognized connection to EKS pod identity agent credentials
Usage of default service account
Using default service account (SA) for application workloads is a security anti-pattern. Consider the scenario where different pods have different access level to an S3 bucket. The appropriate service accounts are default (read access) and full-access-sa (read/write access). When another unrelated pod without specific service account is created, it is automatically given default SA and with it, a read access to an S3 bucket:
[cloudshell-user@ip-10-130-85-108 ~]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pod-full-bucket-access 1/1 Running 0 8s
pod-read-bucket-access 1/1 Running 0 15s
[cloudshell-user@ip-10-130-85-108 ~]$ kubectl run ubuntu --image ubuntu:latest
pod/ubuntu created
[cloudshell-user@ip-10-130-85-108 ~]$ kubectl get pods ubuntu -o json | grep pod-identity-token
"value": "/var/run/secrets/pods.eks.amazonaws.com/serviceaccount/eks-pod-identity-token"
Here we can see that ubuntu pod has eks-pod-identity-token mounted. This flow compromises the principle of least privileges. We recommend not to use EKS pod identities on default SAs and / or to opt out of automounting service account feature by using the automountServiceAccountToken: false flag in the pod specs.
Increased attack surface
Cluster attack surface is a function of exposed functionality. Publicly exposed data plane workload adds to attack surface, but so do the various cluster plugins/addons, albeit in an indirect way. As we showed in our “Cluster Grey Zone” research, middleware components can be a handy way for attacker to move laterally or to elevate privileges given the initial foothold. Pod-identity-webhook is a new webhook in a cluster control plane. Amazon EKS pod identity agent is based on eks-pod-identity-agent DaemonSet. The DaemonSet deploys pods that have several problematic configurations, such as shared hostNetwork with open ports and privileged initial container.
We will show how we can exploit this setup in our follow-up blog.
The recent features by Amazon represent a significant step forward in a cohesive management of cloud/cluster access and permissions. The security impact cannot be overstated, and will probably yield new attack avenues as time goes by. In this blog we hope to contribute to secure Kubernetes usage in the cloud by calling out the problematic points and providing clear recommendations. Let’s summarize:
Monitor security posture: New features will not protect from bad security design and misconfigurations — traditional best practices and recommendations still apply.
Audit: Review your audit practices and adjust them to consider the cluster authentication mode and pod identity mappings.
Privilege escalation: Treat new access entry and policy management permissions as high-privileged permissions. New features and complexities add various escalation avenues that were previously impossible. We will expand on that in a follow-up blog.
Detection: There are multiple opportunities for abusing the new features. Extend the current detection coverage to cover those opportunities. Both EKS and K8s APIs need to be covered.
Detect and mitigate CVE-2023-46805, CVE-2024-21887, CVE-2024-21888 and CVE-2024-21893, critical vulnerabilities in Ivanti VPN products. Organizations should patch urgently, and government agencies are instructed to isolate Ivanti VPN instances.
Detect and mitigate “Leaky Vessels”, container escape vulnerabilities affecting runC and BuildKit. Learn how to prioritize patching and detect exploitation attempts in runtime.