Get visibility into your AI pipelines, detects pipeline misconfigurations, and uncovers attack paths to your AI services, allowing you to securely introduce AI into your environment.
Adversarial artificial intelligence (AI), or adversarial machine learning (ML), is a type of cyberattack where threat actors corrupt AI systems to manipulate their outputs and functionality.
Wiz Experts Team
7 minutes read
Adversarial artificial intelligence (AI), or adversarial machine learning (ML), is a type of cyberattack where threat actors corrupt AI systems to manipulate their outputs and functionality. These malicious interventions cause enterprise AI and machine learning systems to generate incorrect, controversial, and dangerous information.
According toMcKinsey, 65% of companies surveyed regularly leverage generative AI (GenAI) technologies. Furthermore, Wizresearch reveals that 7 out of 10 companies use managed AI services. All of these AI/GenAI technologies and services are potential attack vectors for cybercriminals looking to conduct adversarial AI attacks. While whispers of adversarial AI have been around since 2004, the rapid proliferation of AI and ML in mission-critical enterprise contexts makes it a pressing security threat.
Adversarial attacks are particularly dangerous because they don’t draw too much attention to themselves. Instead, they subtly interfere with the internal logic of AI and ML systems and allow threat actors to bypass the parameters and guardrails of machine learning models, even those ready for deployment.
To help organizations understand the nuances and risks of adversarial AI, MITRE released the Adversarial Threat Landscape for Artificial Intelligence Systems (ATLAS). MITRE ATLAS is a comprehensive knowledge base of adversarial techniques and tactics that malicious actors use to attack AI systems.
How do adversarial AI attacks work?
Unlike many traditional cyber threats that try to circumvent the capabilities of their target, adversarial AI attacks weaponize AI's inherent capabilities. AI systems make autonomous decisions and generate output based on training data and prompts, and that’s exactly what adversarial AI attacks take advantage of.
Below are some steps typically used by threat actors in an adversarial AI attack:
Scoping the victim’s AI and machine learning systems: Threat actors study their victim’s systems to find weaknesses and vulnerabilities in machine learning models, deep neural networks, guardrails, and other underlying infrastructure. To do this, adversaries may use techniques ranging from traditional research to reverse engineering.
Designing malicious inputs: Once threat actors understand the intricacies of their victim’s AI infrastructure and machine learning algorithms, they carefully craft malicious inputs. These malicious inputs can severely compromise systems that rely on high degrees of accuracy, such as natural language processing (NLP), image detection, and fraud-detection systems.
Corrupting AI and ML systems: Adversaries deliver malicious inputs into AI and ML environments, compromising the integrity of these systems. In a best-case scenario, the corruption of these AI and ML systems may result in minor and invisible inconveniences for enterprises. However, malicious inputs can also result in large-scale damage stemming from unreliable AI systems and reputational harm caused by dangerous outputs.
Escalating the attack: Depending on the severity of the adversarial AI attack, businesses are either unaware that their AI systems are malfunctioning or are scrambling to remediate the fallouts of malfunctioning mission-critical AI and ML infrastructure. During this vulnerable and stressful period, threat actors can escalate attacks in myriad ways to further weaken the organization. For example, they might exploit other vulnerabilities, conduct new cyberattacks, move laterally, or spread AI-generated disinformation.
Remember that adversarial AI attacks can affect the entire AI and ML model journey, from development to deployment.
There are two broad categories of adversarial AI attacks: white box attacks and black box attacks. White box attacks describe scenarios where threat actors possess a deep understanding of their victim’s AI systems. Conversely, black box attacks are those where adversaries aren’t thoroughly familiar with their victim’s AI and ML systems.
Let’s take a look at some specific types of adversarial AI attacks:
Evasion attacks: Cybercriminals interfere with AI/ML models by altering or editing input files. There are two subcategories of evasion attacks: nontargeted and targeted attacks. In nontargeted attacks, threat actors manipulate AI systems to make them generate wrong information. However, in these attacks, the exact nature of the malicious output doesn't matter. In targeted attacks, threat actors focus on making AI systems generate very specific malicious output.
Poisoning attacks: Adversaries inject inaccurate and malicious data into training datasets, affecting the system’s learning process and influencing future autonomous decisions and outputs.
Transfer attacks: Cybercriminals design adversarial AI models for a certain victim and then use that model to compromise the AI and ML systems of other potential targets.
Model extraction attacks: Threat actors steal proprietary machine learning algorithms to create their own illegal duplicates quickly and inexpensively.
Byzantine attacks: Threat actors cripple a distributed ML system by feeding various models and components with manipulative and contradictory data and inputs.
Trojan AI attacks: Malicious actors weave a trigger into the AI or ML model during the training phase. As a result, the AI model will function normally for the most part and only fully initiate the attack when triggered.
Model inversion attacks: Threat actors analyze the outputs of an AI/ML model to make assumptions about details within its training data. (This typically occurs in scenarios where threat actors cannot access training datasets.)
Membership inference attacks: Threat actors study AI and ML models to determine whether exploitable sensitive information about individuals or institutions exists within training data.
The methods that threat actors use to carry out the above adversarial AI and ML attacks depend on their objectives, the type of attack, and the victim’s infrastructure.
What are some real-world examples of adversarial AI?
Now that we know the different types of adversarial AI attacks, let’s take a look at three real-world examples.
1.MadRadar: Engineers at Duke University hacked the radar systems of autonomous vehicles and made them hallucinate other cars. A potential scenario where hackers make vehicles perceive phantom cars could cause major accidents.
2. Google’sSearch Generative Experience: Although the exact reasons behind its malicious outputs are unclear, Google’s new AI search engine sometimes misdirects users to malicious links containing malware, suggesting some form of adversarial AI. The most concerning aspect of this example is how realistic and believable the AI search engine is when presenting dangerous information.
3. Tesla: In 2019, in a controlled experiment, researchers interfered with the autonomous AI-powered capabilities of self-driving Tesla vehicles to make them drive into oncoming traffic, hallucinate lane markings, and start windshield wipers at the wrong time. In the hands of hackers, the ability to manipulate AI systems in autonomous vehicles can be a dangerous weapon.
Best practices to mitigate adversarial AI
The following are some important recommendations that can help businesses protect themselves from the mounting threat of adversarial AI.
24/7 monitoring and detection
Most cloud environments feature mission-critical AI systems. As such, businesses must implement round-the-clock surveillance (with no blind spots) to keep their cloud-based AI infrastructure secure and rapidly investigate and remediate any signs of adversarial AI.
Enterprises should analyze and collect examples of adversarial AI and integrate them into training data for their AL and ML models. By doing so, enterprises can enhance AI and ML model robustness and ensure that they can accurately identify anomalous patterns that might point to an adversarial AI attack.
Adversarial training involves incorporating adversarial examples into the model's training data. This helps the model learn to recognize and correctly classify manipulated inputs. The process typically involves:
Generating adversarial examples using known attack techniques
Adding these examples to the training dataset
Retraining the model on the augmented dataset
While effective against known attacks, this approach may slightly decrease performance on clean data and requires regular updates to defend against new attack methods.
Strengthen AI and ML development environments
By reinforcing critical components of AI/ML development environments, businesses can strengthen their overall AI security posture. Some essential security practices include sanitizing training data, developing or leveraging more robust ML algorithms (like Byzantine-resistant algorithms), using AI to write or validate ML algorithms and reduce human error, and integrating security into AI pipelines as early as possible.
Optimize the architectures of AI/GenAI services
Enterprises should be very deliberate with what tenant architecture model they use for GenAI-incorporating services. The three fundamental types of GenAI tenant architectures are multi-tenant, single-tenant, and hybrid architectures. Businesses should spread some components of their GenAI services across multiple tenants and provide others with dedicated tenants. This is a critical means of reducing large-scale damage and facilitating a swift response during active adversarial AI attacks.
Input Preprocessing
Applying preprocessing techniques to inputs before feeding them into the model can help detect and mitigate adversarial perturbations:
Feature squeezing: Reducing the precision of input features to remove adversarial noise
Input transformation: Applying random resizing, padding, or other transformations to disrupt carefully crafted adversarial inputs
Anomaly detection: Using statistical methods to identify inputs that deviate significantly from expected patterns
Implement AI security posture management (AI-SPM)
AI security can’t be an afterthought. To secure themselves from adversarial AI attacks, businesses must choose a unified cloud security tool in whichAI-SPM is a central capability. If companies adopt a robust AI-SPM tool, adversarial AI attacks will have little to no effect on their AI adoption and implementation journey.
How Wiz can help address the threat of adversarial AI
Wiz AI-SPM provides full-stack visibility into AI pipelines and resources through its agentless AI-BOM (Bill of Materials) capabilities. This allows organizations to:
Discover all AI services, technologies, and SDKs in use across their environment
Detect shadow AI projects that may have been introduced without proper oversight
Gain a holistic view of the AI attack surface
Misconfiguration Detection
The solution enforces AI security best practices by:
Detecting misconfigurations in AI services like OpenAI and Amazon Bedrock using built-in rules
Extending security checks to the development pipeline through Infrastructure-as-Code (IaC) scanning
This helps prevent vulnerabilities that adversaries could exploit to compromise AI systems.
Attack Path Analysis
Wiz AI-SPM extends attack path analysis capabilities to AI resources, allowing organizations to:
Detect potential attack paths to AI models and services
Assess risks across vulnerabilities, identities, network exposures, data access, and more
Proactively remove critical AI attack paths before they can be exploited
Data Security for AI
To protect against data poisoning and other adversarial attacks on training data, Wiz offers:
Automatic detection of sensitive AI training data
Out-of-the-box Data Security Posture Management (DSPM) controls for AI