In Kubernetes version 1.21, Pod Security Policies (PSP) were officially deprecated and replaced with Pod Security Admission (PSA). PSA implements Pod Security Standards (PSS), a set of policies describing various security-related characteristics of the workloads in a Kubernetes cluster. With version 1.25, PSA became a stable feature and PSP was completely removed. In this blog, we will discuss PSP-to-PSA migration strategies, offer guidance to help transition from Pod Security Policies to Pod Security Standards, and point out potential migration restrictions and limitations.
Background
In Kubernetes, the Admission Controller is a crucial security component that intercepts API server requests and applies a specific policy to authorize or monitor them. Pod Security Policies is a Kubernetes feature that enables administrators to define security constraints for the creation and deployment of pods, such as restricting privileged access and sensitive host path mounting. However, PSPs were deprecated as of Kubernetes v1.21 in favor of the newer Pod Security Standards, which provide similar functionality with easier control.
Pod Security Standards can be used to define security policies at three levels (privileged, baseline, and restricted) for pods at a cluster-wide or namespace level. There are two approaches a cluster administrator may take to enforce Pod Security Standards: using the built-in Pod Security Admission Controller or relying on third-party alternatives. These third-party alternatives validate pod creation requests against the defined policies to ensure that only pods that meet the specified security requirements are deployed. As for Pod Security Admission, it is a built-in validating admission controller applying the policy specified by the cluster admin. The cluster admin can choose to assign one of the three levels to different namespaces, providing limited flexibility. For example, the kube-system namespace can operate at the privileged level, whereas the production app namespace can operate at the restricted level.
Wiz Research investigated hundreds of cloud environments to understand and quantify the usage of PSPs, PSA, and external admission controllers in clusters. To start with, we have calculated the version distribution numbers across all customers, cluster flavors, and cloud environments:
According to the pie chart, the vast majority of environments are capable of using both PSPs and PSSs (76%). The relatively minimal adoption of v1.25 (2%) suggests that now is the optimal time to migrate the policies.
Furthermore, we looked closer at the adoption policies on a per-version basis:
The numbers above show that PSP utilization increases with every version. However, PSA adoption does not rise symmetrically. There are two non-exclusive possible explanations: first – users migrate from PSP to external admission controllers (we see some evidence of that); second – users postpone or delay PSS adoption due to its complexity. The following guide attempts to prevent the latter. In fact, the low adoption of version 1.25 and above indicates that there is still time to perform a proper migration.
Migration scenarios
When it comes to applying PSA, there are four scenarios in which users can find themselves:
Migrating brand-new workloads directly to PSA.
Migrating existing, policy-free workloads that are not under any policy to PSA.
Migrating existing workloads with simple PSPs to PSA.
Migrating existing workloads with elaborate PSPs to an external admission controller.
We discuss scenarios (1) and (2) in the “Onboarding of new and policy-free workloads” section, and scenario (3) in the “Migration of existing workloads” section. The fourth scenario pertains to customers with a complicated PSP policy requiring more flexibility than PSS can offer. In this case, our recommendation is to use an external admission controller providing complex functionality, such as Wiz Admission Controller.
However, it is worth noting that you can always refer to the detailed Kubernetes guide that contains the detailed description of the steps at the command-by-command level. Here we attempt to simplify and outline the overall process flow, provide additional recommendations for Wiz customers, describe the operational restrictions of the PSA in managed clusters, and warn Kubernetes practitioners about potential hurdles in the process.
Onboarding of new and policy-free workloads
Whether you need to stage a brand-new cluster, add a new workload to an existing cluster, or migrate existing clusters or namespaces to PSA, this section will guide you through the process. When applying PSS to a new or existing PSP-free workload, we can use the following commands:
Note the --dry-run=server flag—this flag enables various checks to be carried out, including authentication and authorization, without applying any changes. If the PSS level is suitable for the namespace workloads, there will be no warnings in the output. Otherwise, kubectl will helpfully print a list of warnings detailing the specific problems:
In this example, the pod andy-dufresne violates three checks and consequently blocks the restricted policy application. At this point, the cluster admin must choose to either modify the workload, adjust the policy level to baseline or privileged, or ignore this namespace altogether (which is not recommended).
Finally, after the necessary changes, you can re-run the above command without the --dry-run flag and then verify the successful policy application with the following command:
$ kubectl describe ns default | grep pod-security
pod-security.kubernetes.io/enforce=restricted
Migration of existing workloads
Migrating existing workloads that actively use PSP requires more effort than applying PSA from scratch. There are several issues that need to be avoided when performing such a migration, including irrevocable breakage of running workloads, service disruptions, and the failure to apply a policy to a workload. This migration should therefore be carried out in two stages:
Once the system has processed the command, you can observe this output when trying to spin up a new pod that violates the policy:
$ cat <<EOF | kubectl apply -f -> apiVersion: v1> kind: Pod> metadata:> name: privpod> spec:> containers:> - image: alpine:latest> command:> - "sleep"> - "3600"> imagePullPolicy: IfNotPresent> name: privpod> securityContext:> capabilities:> add: ["NET_ADMIN", "SYS_ADMIN"]> runAsUser: 0> restartPolicy: Never> hostIPC: true> hostNetwork: true> hostPID: true> EOFWarning: would violate PodSecurity "restricted:latest": host namespaces
(hostNetwork=true, hostPID=true, hostIPC=true), allowPrivilegeEscalation != false
(container "privpod" must set securityContext.allowPrivilegeEscalation=false),
unrestricted capabilities (container "privpod" must set
securityContext.capabilities.drop=["ALL"]; container "privpod" must not include
"NET_ADMIN", "SYS_ADMIN" in securityContext.capabilities.add), runAsNonRoot != true
(pod or container "privpod" must set securityContext.runAsNonRoot=true), runAsUser=0
(container "privpod" must not set runAsUser=0), seccompProfile (pod or container "privpod" must set securityContext.seccompProfile.type to "RuntimeDefault" or
"Localhost")
pod/privpod created
Several things to note:
-Despite the warning, the pod was successfully spun up.
-There are no warnings on the existing workloads that violate the warn policy.
Because of the above, we recommend duplicating the monitored policy with warn and audit modes in order to have additional means of observing the warnings.
Enforcing application
The lack of warnings will indicate the namespace is ready for the final application. The same command from the previous section will suffice. Note the --overwrite flag needed to update the level or the mode:
Because PSA and PSP are separate features, cluster operators are encouraged to leave the PSP active until the PSA is enabled in the enforcing mode. This is only possible in clusters with version 1.24 and below. To avoid potential downgrades, it is encouraged to perform the migration before upgrading clusters to v1.25.
Treatment of problematic workloads
The most difficult situation arises when the workload must run with the privileges violating baseline/restricted profiles. The following options are available to the cluster admin:
If there is a minority pod in the namespace requiring special privileges, consider splitting the namespaces so that the problematic workload won’t prevent the bigger migration.
Apply exemptions to problematic workloads. You can exempt workloads initiated by the specific user, or those created by a specific RuntimeClassName. You can even exempt the entire namespace, although the latter option is equivalent to not applying the PSA at all.
Even if nothing can be done within the specific namespace, we recommend setting the PSA level to privileged rather than omitting it entirely. This will reflect that there was a deliberation behind the decision.
Post-deployment steps
To maintain cluster security hygiene post migration, we recommend the following actions:
Only allow the creation of explicitly labelled namespaces.
Review the permissions to annotate namespaces and modify the PSS levels. You can run the label command with the --v=8 flag that will show the kubelet’s real API requests:
This means that every Kubernetes user/service account with patch/update permissions in a namespace can effectively remove the label and relax the policy.
Restrictions and limitations
Managed namespaces
All managed clusters have a default configuration and begin with the minimal set of workloads CSPs deem necessary to install, such as monitoring, logging, and networking infrastructure. These infrastructure workloads typically require above-average privileges and are deployed as a part of kube-system or another namespace exempted from pod security application:
$ kubectl label ns kube-system pod-security.kubernetes.io/enforce=restrictedWarning: namespace "kube-system" is exempt from Pod Security, and the policy
(enforce=restricted:latest) will be ignored
These namespaces include kube-system and kube-node-lease in AKS and EKS, and gatekeeper-system in AKS.
Image-level settings
An important thing to remember is that PSA is an admission controller and thus is susceptible to admission controller workflow bypasses. For example, restricted PSS requires a pod to run as a non-root user. A pod YAML might declare such intention with the following setting: spec.securityContext.runAsUser = 1000. However, if a container image is compiled to run as root, it effectively bypasses the admission controller check and is only caught by the additional runtime check at a container start-up stage.
Completed workloads
A completed privileged workload showing as completed when listing pods will still fail the label application as demonstrated below:
$ kubectl run alpine-test --image alpine -n testpod/alpine-test created
$ kubectl get pods -n testNAME READY STATUS RESTARTS AGE
alpine-test 0/1 Completed 1 (3s ago) 4s
$ kubectl label namespace testpod-security.kubernetes.io/enforce=restricted
Warning: existing pods in namespace "test" violate the new PodSecurity enforce level
"restricted:latest"
Warning: alpine (and 1 other pod): allowPrivilegeEscalation != false, unrestricted
capabilities, runAsNonRoot != true, seccompProfile
namespace/test labeled
The completed pods and jobs are kept by default in order to report the success/fail status. The correct way to regulate the displayed time for workload completion is via the TTL-after-finished controller settings (stable since v1.23). However, to facilitate the PSS application, a cluster admin can detect and later remove these workloads manually with this command:
$ kubectl delete pod $(kubectl get pods | grep -Ei "(Completed|CrashLoopBackOff|Terminating)" | awk '{print $1}')
Takeaways
What is the current state of security policy adoption?
Data shows that migration to PSA from PSP has been slow. The worst scenario entails losing Kubernetes users who currently use PSP and stop using any policy after the upgrade to v1.25.
What can I do about the transition to PSA/PSS?
There is more than one way to facilitate migration, but you must start before version 1.25. Hopefully, this guide can serve as a starting point.
What should I expect?
To demonstrate what you should expect when attempting to apply PSS to common workloads, we have compiled a table specifying which PSS levels are expected:
Popular add-on/extension/app
Managed environment
Default PSS level
Airflow
GKE
Privileged
Airflow
AKS
Baseline
ActiveMQ
GKE
Baseline
Grafana
GKE
Baseline
Consul
GKE
Privileged
Consul
AKS
Baseline
Elasticsearch
AKS
Privileged
Logstash
AKS
Baseline
Kubecost
EKS
Baseline
Two patterns emerge: 1) Similar applications can have different PSS levels across CSPs (multi-cloud Kubernetes users should be ready for migration process variation across CSPs), and (2) none of the applications can operate at a restricted level, which after all is rather demanding.
Protecting your environment
Wiz offers its customers a series of functionalities to aid with the migration process:
A built-in Cloud Configuration Rule to identify workloads running without an assigned PSP at the cluster level. See all the namespaces without an assigned PSP in the image below:
Built-in Pod Security Standards frameworks to assess clusters and namespaces without the need for dry runs on each namespace, allowing you to identify the most suitable Pod Security level (baseline or restricted):
Moreover, the following steps are helpful post deployment:
1. Use Wiz Admission Controller to protect yourself by default with these two rules:
Wiz for DSPM, now generally available, helps customers reduce the time it takes to discover and fix cloud data exposure before it becomes a costly breach