A large-scale analysis of public GitLab Cloud repositories has revealed tens of thousands of exposed credentials and API keys that are still valid and usable. The findings highlight how widespread hardcoded secrets in source code remain and how easily they can be abused to compromise corporate infrastructure, cloud services, and production data.
Scale of the GitLab Cloud secrets exposure
The research examined 5.6 million public GitLab Cloud projects. Within this dataset, investigators identified 17,430 confirmed, currently active secrets belonging to more than 2,800 unique domains. In this context, “secrets” include API keys, passwords, access tokens, database connection strings and similar sensitive data that should be stored only in dedicated secret management systems, never directly in source code.
For comparison, a similar study of 2.6 million Bitbucket repositories uncovered 6,212 active secrets. The leak density—the number of secrets per repository—was about 35% higher in GitLab Cloud, suggesting weaker secret management practices or less mature security controls in parts of the GitLab ecosystem.
From a threat perspective, each valid secret represents a direct path into internal systems. Attackers do not need to exploit a software vulnerability if they can simply authenticate as a legitimate user by abusing an exposed key.
How GitLab repositories were scanned for exposed secrets
The investigation relied on TruffleHog, an open-source tool specifically designed to detect secrets hidden in Git history and file contents. TruffleHog looks for high-entropy strings, known API key formats and other patterns that indicate the presence of credentials or tokens.
Automated cloud-based scanning architecture
To process millions of repositories at scale, the researchers built a fully automated pipeline in AWS. A Python script used the public GitLab API to enumerate projects by ID and enqueue scan tasks into AWS Simple Queue Service (SQS). For each repository, an AWS Lambda function was invoked to clone the project, run TruffleHog, and log any detected secrets.
The full scan of 5.6 million repositories took a little over a day and cost approximately USD 770 in compute resources. This demonstrates that large-scale secret hunting is no longer limited to governments or large corporations. Any sufficiently motivated actor—from independent researchers to criminal groups—can perform similar scans across public code hosting platforms.
What kinds of secrets were exposed in GitLab
Most of the exposed secrets appeared in commits from 2018 onwards, but the scan also identified credentials created as far back as 2009 that were still valid. Such long-lived secrets often belong to legacy systems that are poorly monitored and rarely updated, making them particularly attractive to attackers.
The most frequently exposed credentials were Google Cloud Platform (GCP) keys, with more than 5,200 separate instances. In addition, the analysis found:
— MongoDB credentials and connection strings. These can enable direct access to production databases, including sensitive customer or operational data.
— Tokens for Telegram bots. Compromised bot tokens can be used to send phishing messages or malicious content from what appear to be legitimate channels.
— Keys for OpenAI and other AI platforms. Besides financial impact through abuse of paid APIs, such keys can be used to exfiltrate proprietary prompts or training data.
— More than 400 GitLab-related secrets. These may enable unauthorized access to GitLab APIs, project automation, or even administrative functionality depending on scope.
In many cases, the combination of cloud credentials, database access and messaging bot tokens is sufficient to fully compromise an organization’s environment—bypassing traditional perimeter defenses and endpoint security controls.
Notification efforts, bug bounty programs and remaining risk
Because the leaked secrets were tied to 2,804 distinct domains, manually contacting all affected organizations was impractical. To streamline notification, the researcher used web search, custom Python scripts and automated message generation via a large language model. This semi-automated approach made it possible to alert a significant number of impacted entities.
Many organizations responded quickly by revoking exposed keys, rotating credentials and reinforcing internal policies. Some incidents were submitted through bug bounty programs, generating approximately USD 9,000 in rewards for the researcher. However, a substantial proportion of secrets remain publicly accessible, underscoring the systemic nature of the problem and the lack of consistent secret management across development teams.
DevSecOps lessons: how to prevent secret leakage in Git repositories
1. Prohibit storing secrets in code repositories. Production credentials, access tokens and API keys should never be committed to Git. Instead, use dedicated secret management solutions—such as cloud provider secret stores or HashiCorp Vault—and reference them via environment variables or identifiers in code.
2. Integrate automated secret scanning into CI/CD. Tools like TruffleHog, Gitleaks and similar scanners should be part of the continuous integration pipeline. Every commit and merge request should be checked for secrets before being merged into main branches.
3. Enforce regular key rotation and short-lived tokens. Even if a secret is accidentally exposed, strict time-to-live (TTL) policies and automated rotation significantly reduce the attack window. Where possible, use ephemeral credentials and fine-grained access scopes.
4. Train developers in secure coding and secret hygiene. Many leaks originate from quick experiments, test code or debugging shortcuts. Development teams must understand that public repositories are never suitable for “temporary” keys, and private repositories offer limited protection if accounts or access tokens are compromised.
The state of public GitLab Cloud repositories revealed by this research is a clear signal that secret management must be a core element of every DevSecOps program. Organizations that rely on cloud services, databases and third-party APIs should regularly audit their repositories, automate secret scanning, and establish robust playbooks for revoking and rotating credentials. The sooner these practices are embedded into everyday development workflows, the lower the likelihood that the next public report on exposed secrets will involve their own infrastructure.