Exposure Report: 65% of Leading AI Companies Found with Verified Secret Leaks

How secure are top private AI companies? Find out from our scans and disclosures.

Overview

AI companies are racing ahead, but many are leaving their secrets behind. We looked at 50 leading AI companies and found that 65% had leaked verified secrets on GitHub. Think API keys, tokens, and sensitive credentials, often buried deep in deleted forks, gists, and developer repos most scanners never touch. Some of these leaks could have exposed organizational structures, training data, or even private models. For teams building the future of AI, speed and security have to move together.

Hypothesis and Target Population

This is the second post in our series on AI-driven secret leaks. 

Check out the companion talk presented at OWASP AppSec Global USA on November 6th!

In our previous blog, we started from the assumption that any company with a big enough GitHub footprint has exposed secrets. Our results showed the prevalence of AI secrets and new leak vectors.

This blog flips the script to analyze the security practices of prominent AI startups. Our new hypothesis? Any AI company with a big enough GitHub footprint DEFINITELY has exposed secrets.

We focused our attention on the private AI companies included in the Forbes AI 50, because it's one of the most respected benchmarks for innovation in AI. This list consistently highlights the companies shaping what's next, from established leaders like Anthropic to emerging players like Glean and Crusoe. It’s a “who’s who” of companies disrupting the market in new and exciting ways, making it an ideal lens to explore how security fits in. 

Methodology - How We Scanned GitHub for Exposures

Traditional secrets scans against the relevant GitHub organizations weren’t going to cut it here. That’s a commoditized approach we felt would be redundant in the face of:  (1) scans from GitHub’s integrated secrets scanner; (2) scans from corporate security tools; (3) commodity scans by 3rd-party companies that perform automated scans for marketing purposes.

To identify differentiated attack surface, we focus on three dimensions: Depth, Perimeter, and Coverage

Secrets leakage has often been described as an iceberg: a set of known risks exposed publicly in GitHhub organizations, but also a deeper risk below the surface in commit history, deleted forks, workflow logs etc. We believe "topology" is also relevant - showing the difference between "secrets at the summit" (the main GitHub org) and those buried off in the edges (i.e. public repos of org members), with lower (yet still non-zero) probability of impact.

Depth (searching for new sources): Regular GitHub search only captures "secrets on the surface." Our deep scan includes full commit history, commit history on forks, deleted forks, workflow logs and gists (which can also have forks!). We’ve expanded our research scanning tools to support all these secret sources to uncover the secrets that are traditionally left “under the water surface.”

Perimeter (expanding to adjacent discovery): Beyond the core organization, organization members and contributors can inadvertently check company-related secrets into their own public repositories and gists. 

How can we find these org members? Well, we start with the public Organization Members, and then fan outward by identifying “candidate members”, through:

1 - Organization followers

2 - Searching for accounts referencing the organization name in their metadata (e.g johndoe-companyname accounts)

3 - Code contributors, including using the GHArchive to collate activity

4 - Correlations in related networks like HuggingFace and npm

Once “candidate members” have been identified, they can be triaged and confirmed through manual and automated methods.

Detection coverage (a.k.a. new secret types): In the first blog we have compiled a table of AI-related secret types that are often missed by the traditional scanners:

PrevalencePlatform
Most commonPerplexity, WeightsAndBiases, Groq, NVIDIA API
Less commonTavily, Langchain, NVIDIA-NGC, Cohere, Pinecone, Clarifai, Gemini, AI21 Labs, IBM Watsonx AI, Cerebras, FriendliAI, FireworksAI, TogetherAI
AI TigersZhipu AI, Moonshot AI, Baichuan Intelligence, 01.AI, StepFun, MiniMax

We continue to see success by finding specific secret types missed by alternative tools. 

Findings and Analysis: Hidden Exposures Across the AI50

After scanning the Forbes AI 50 companies, minus the few without a GitHub presence, we got a stark result: 

Almost two-thirds of the AI companies analyzed had a verified secrets leak.

In total, the companies with verified secret leaks are valued at over $400B.

Among the above companies with leak instances, the smallest footprint belonged to the company with 0 public repositories and 14 organization members. This shows how our methodology can highlight hidden risk even for companies without an obvious public footprint.

Conversely, the company with the largest footprint without an exposed secret had 60 public repos and 28 organization members. Does this mean that if you have less than 60 public repos you don’t need a secret scanner? Not really. In our opinion, the more probable explanation is this company already has a solid secrets management strategy in place. It’s a positive indicator that this is a preventable issue, not an inevitable artifact of scale. 

The overall secret type distribution among AI companies was similar to the general findings in Part 1, featuring AI-related secrets such as WeightsAndBiases, ElevenLabs and HuggingFace among the most popular impactful secrets. 

Disclosures and Interesting Leak Cases

While leaks in major AI companies like ElevenLabs and Langchain were disclosed and promptly fixed, the overall disclosure landscape is challenging. 

Almost half of disclosures either failed to reach the target or received no response. Many companies lacked an official disclosure channel, failed to reply, and/or failed to resolve the issue.

On a more positive note, more leaks reported were acknowledged and addressed promptly. Here are a few example cases:

LangChain - multiple Langsmith API keys in .py, .ipynb, and .env files, including organization-level enterprise_legacy tier keys with org:manage and org:read permissions to LangChain Inc organization. Beyond the functional impact (access to the org observability platform), Langsmith org API keys allow listing of organizational members – information that threat actors consider highly valuable.

ElevenLabs – enterprise-tier ElevenLabs API key in plaintext mcp.json. This speaks to the relationship between vibe coding and secrets leakage we identified in the previous blog.

AI50 Company (no disclosure permission) - HuggingFace token in deleted fork allowing access to about 1K private models. In addition, we found multiple WeightsAndBiases API keys belonging to the org employees that leaked the training data for many private models.

Conclusions

To conclude, we have not been able to find a leaked secret in every AI50 company. However, we believe the following takeaways are vital, especially for AI companies in the beginning on their journey: 

  • Mandate Public VCS Secret Scanning: If you use a public Version Control System (VCS), deploy secret scanning now. This is your immediate, non-negotiable defense against easy exposure.  Even companies with the smallest footprints can be exposed to secret leaks as we have just proved.

  • Prepare for Disclosure: Disclosure channels are an essential element of a security program, and for AI innovators they're especially necessary from inception. (On that note we can recommend this blog post suggesting staffing guidelines for startups that undoubtedly should apply for AI startups as well.)

  • Consider Proprietary Secret Detection: AI service providers must prioritize detection for their own secret types. Too many shops leak their own API keys while "eating their dogfood." If your secret format is new, proactively engage vendors and the open source community to add support.

In addition, for all companies we strongly recommend:

  • Treat your employees as part of your company’s attack surface and your VCS org members and contributors as an extension of your SDLC infrastructure. We recommend creating a VCS member policy to apply during the onboarding process (i.e. create a new GitHub user without revealing the name of the employer, use MFA for personal orgs, keep all personal activity in personal accounts etc.).  

  • Be ready to adjust your scanning policy as AI use cases develop to cover new file types and secret vectors. Continuously update coverage - you must revisit and extend your scanner’s secret type coverage to include the new generation of AI platform tokens. As more secret types are added to the market, the scanner must be easily extendable.

While modern secret scanning has elevated the “defense waterline," our investigation clearly shows that threats lurk deep below the surface - in deleted forks, gists, and developer repos. For AI innovators, the message is clear: speed cannot compromise security. We urge the industry to adopt the "Depth, Perimeter, and Coverage" mindset detailed here to decisively raise the defense standard and secure the next generation of AI.

Get our AI Security Readiness Survey Report

Continue reading

Get a personalized demo

Ready to see Wiz in action?

"Best User Experience I have ever seen, provides full visibility to cloud workloads."
David EstlickCISO
"Wiz provides a single pane of glass to see what is going on in our cloud environments."
Adam FletcherChief Security Officer
"We know that if Wiz identifies something as critical, it actually is."
Greg PoniatowskiHead of Threat and Vulnerability Management