Strategies for performing security migrations

Advice for tackling and completing these major projects, including metrics, alerts, and prevention strategies.

6 minutes read

Security teams often get involved with a type of larger scale migration project where there isn’t a clear CVE; here is some advice, given that these projects can be harder to prioritize and get across the finish line. They tend to involve changing the way something has always been done such that even though some big risks might be reduced, something must go wrong before the risk being mitigated could be abused. Examples of this include migrating AWS environments from IMDSv1 to IMDSv2 or getting rid of IAM user access keys. There are many similar projects, but I’ll use these two as examples because I’ve been directly involved with them. (If you want to see specifics of an implementation of this strategy, Slack recently described their journey of migrating to IMDSv2.)

Get metrics 

The first step is to understand how bad this problem is, and to have a way that you can generate these metrics regularly to see your progress over time. It is also helpful to collect additional data such as what AWS accounts these findings are present in, and how many AWS accounts you have, as you may find that the best metric for your needs is the percentage of AWS accounts with this issue or some other calculation from this data, as opposed to just a raw count of findings. 

Divide and conquer 

Understand the use cases for this issue so you can make plans to attack it. For example, with IAM user access keys, there might be some IAM users that are being used by humans, some being used by vendor solutions, and some being used for other specific use cases. Each of these use cases might be solved differently. Some of these use cases you might be able to identify from logs or other APIs, but you should also talk to people.  If you have 10,000 findings, you will want to randomly sample from this set to understand them. 

You may also have certain teams that you have a better relationship with and can divide this work based on that criterion as well. I usually prioritize focusing first on the places where I can make progress most easily as this helps build momentum. 

I’ve written previously about getting rid of IAM user access keys where I discuss some ways of prioritizing findings based on the risk they present. This includes finding keys that aren’t in use, and those with access to sensitive resources. You can read Part 1 of that guide here

Develop paved roads 

Before you start asking people to change something, you should document what they should change to and how. You not only want to make this easier for them, you also want to ensure that things end up how you hope.   

Ideally, this will involve using new modules for Infrastructure-as-Code (IaC) that use the preferred solution. If this module is made with enough benefits over the old way, you may be lucky that people will naturally migrate to this module once it exists. You’ll want to search out internal documentation for how to do something, as you may find older guidance that advocates for the older way of doing things. 

Be aware that a surprisingly common solution to “fixing” something is simply to delete it. I’ve seen an alert for a single resource misconfiguration result in the engineer deciding to delete the entire AWS account due to it being no longer needed. 

Engage with vendors 

You may find that a source of findings in your environment for the issue you are tackling is due to vendors that you can’t directly change. Some of these may already support an improved way of doing things that your company had not realized they could start using, but you may also need to reach out to them to request they implement this improvement. I recommend treating this like any other feature request to a vendor, where you make the request privately to them and request regular status updates. You may also get better traction by reaching out to a security contact at the vendor, as the individuals that receive that message may be more motivated and directly responsible for ultimately making the change. 

Engage with providers 

There may be changes that you can additionally request from cloud providers or makers of Infrastructure-as-Code solutions (IaC). There may be features that need to be supported and there may also be cases where defaults could be set.  On AWS, over time IMDSv2 became a default for all EC2s created through the web console, for all EC2s generated from certain AMIs, and finally became a regional setting to enable that feature by default for all EC2s. Similarly, the AWS web console added more friction to the process of creating IAM user access keys over time.  

There may also be popular online tutorials or other reference sources that engineers at your company might see which use an older way of doing things. In the article A security community success story of mitigating a misconfiguration, you can see where a request to change a popular online tutorial played a role in eradicating a misconfiguration. Your milage may vary with reaching out to authors of blog articles that may be years old, but a related strategy is to write your own blog article that shows the preferred way of doing something, and perhaps this will get more highly ranked by search engines.  

Detect and alert 

You eventually want to start detecting and alerting engineers every time they follow the old way of doing things. This is also necessary for fixing existing issues. The two biggest problems you’ll encounter here are getting the alert to the correct person and getting the alert to them as quickly as possible. You’ll want to perform this detection in two places: From scanning the cloud environment (or log events) once the misconfiguration has been made and from scanning IaC before the misconfiguration is deployed. The scanning of the IaC could be done as a Pull Request is being made and will get a warning or other alert directly to the developer as they are working on the change. However, in many environments, some changes may get done by clicking in the web console or otherwise be done outside of the preferred (and monitored) pathways, so scanning the cloud environment or cloud log events should be done. 

Apply preventions 

As you work toward exterminating a problem in your environment, you should consider applying prevention mechanisms, or what sometimes are referred to as ratchets because they ensure that things travel in only one direction. Similar to a ratchet strap that can be tightened more and more aggressively without loosening, prevention mechanisms similarly can ensure that a problem is eventually eradicated by preventing the misconfiguration from occurring again. On AWS this takes the form of SCPs.  For example, with an SCP you can prevent new IAM user access keys (or IAM users altogether) from being created.  Existing IAM user access keys will continue to function, but no new ones will be created once an SCP is applied for that problem. Eventually, as the existing IAM user access keys are converted to IAM roles, you’ll get rid of all of them. 

An alternative to SCPs is auto-remediation where in addition to detecting and alerting engineers, an action is also performed to automatically fix or delete the resource for the engineer.  

You do need to be careful with preventative measure like this that you aren’t preventing existing workloads before people have a chance to fix them, that people are still able to handle emergencies (such as rolling an IAM user access key during an incident, when they may not be ready yet to convert to it to an IAM role), and that you aren’t spending too much time handling exceptions. So this should be applied only once you are sure that the count of misconfigurations will only be going in one direction. You should consider applying preventions to specific environments where the problem has already been eradicated instead of waiting until this can be done company-wide.  

Get your hands dirty 

There may be cases that fail to make progress and you’ll need to dive in and provide additional assistance. This might reveal unexpected use cases that new solutions need to be developed for, bugs or oversights in how alerts are sent to engineers, or other problems where you’ll realize in hindsight that your involvement really was necessary. 

Conclusion  

All these techniques will likely be needed to push a large security project like this across the finish line. Speaking personally, it's a great feeling as a security engineer to completely eradicate a particular risk from your environment ... that is, until your company makes an acquisition, and you have to start over again. 

Continue reading

Get a personalized demo

Ready to see Wiz in action?

“Best User Experience I have ever seen, provides full visibility to cloud workloads.”
David EstlickCISO
“Wiz provides a single pane of glass to see what is going on in our cloud environments.”
Adam FletcherChief Security Officer
“We know that if Wiz identifies something as critical, it actually is.”
Greg PoniatowskiHead of Threat and Vulnerability Management