HackerOne AI Dispute Highlights How Bug Bounty Platforms Handle Vulnerability Data

CyberSecureFox 🦊

One of the world’s largest bug bounty platforms, HackerOne, has become the focus of an industry‑wide debate on how artificial intelligence (AI) and researcher data should be used in cybersecurity. The launch of a new AI‑driven service triggered concerns in the bug bounty community that vulnerability reports might be used as training data for AI models without explicit researcher consent.

Agentic PTaaS: AI‑driven pentesting and questions about training data

The discussion was sparked by HackerOne’s new product, Agentic PTaaS (Penetration Testing as a Service). The company promotes it as “continuous security testing with autonomous AI agents and human expertise”. According to public descriptions, these AI agents rely on a proprietary exploit knowledge base accumulated over years of testing real enterprise systems.

The reference to a multi‑year knowledge base immediately raised a key question: what exactly is inside this knowledge base, and does it include private bug bounty reports submitted by researchers? Several bug hunters publicly voiced fears that their work could have been incorporated into AI training pipelines without a clear opt‑in/opt‑out mechanism or dedicated consent.

Researchers note that when “white‑hat” hackers start doubting whether legal frameworks and platform terms of service protect their interests, participation in bug bounty programs becomes less attractive. In the worst case, valuable expertise may shift away from responsible disclosure and towards more opaque markets.

HackerOne’s official position: no generative AI training on researcher reports

In response to mounting criticism, HackerOne CEO Kara Sprague published a detailed statement on LinkedIn. She emphasized that HackerOne does not train generative AI models—neither in‑house nor third‑party—on researcher reports or on customers’ confidential data.

Sprague also clarified that such data is not used for fine‑tuning existing AI models. According to her, contracts with AI vendors explicitly prohibit them from storing or reusing researcher and customer data to train their own models. This aligns with current best practices for enterprise use of large language models (LLMs), where major cloud providers increasingly offer “no‑training” modes designed to keep customer inputs out of model training corpora.

Within its broader AI strategy, HackerOne is developing an agent‑based system called HackerOne Hai. The company states that Hai is intended to accelerate report triage, assist in drafting remediation guidance, and streamline payments, while preserving the integrity and confidentiality of bug bounty submissions.

Competitors respond: AI transparency as a trust differentiator

Intigriti: researcher ownership and limited AI use

The controversy prompted rival platforms to publicly clarify their own AI and data policies. Intigriti founder and CEO Stijn Jans stressed that the company treats the output of security researchers as their intellectual property and reiterated to the community that “your work belongs to you”.

According to Jans, Intigriti uses AI tools primarily to improve report handling efficiency and communication between researchers and clients, not to build closed, proprietary AI models based on community findings. This positioning is meant to reassure bug hunters that automation will support, rather than appropriate, their expertise.

Bugcrowd: contractual bans on AI training with researcher data

Another leading platform, Bugcrowd, has already codified its stance in its official terms of use. Third parties are prohibited from training any AI or LLM models on researcher or customer data. At the same time, researchers themselves are required to use generative AI responsibly: fully auto‑generated, unverified vulnerability reports are not accepted.

This approach illustrates a broader trend in the bug bounty market: platforms aim to capture the benefits of AI—faster triage, better matching of reports to stakeholders, automated duplicate detection—while preserving community trust through strict limits on data reuse.

Why training AI on vulnerability reports is so controversial

Bug bounty reports frequently contain high‑impact vulnerabilities, detailed information about corporate infrastructure, log excerpts, and source code snippets. Feeding such material into AI training sets creates several layers of risk.

First, there is the problem of knowledge leakage through models. Academic studies on LLMs have shown that models can sometimes memorize and regurgitate sensitive fragments from their training data. For zero‑day vulnerabilities—previously unknown flaws—such leakage can be particularly damaging if hostile actors gain indirect access to that knowledge.

Second, using extensive security data for AI training can conflict with the principle of data minimization, embedded in regulations such as the GDPR and reflected in current drafts of the EU AI Act. Organizations must be able to clearly justify why they collect, retain, and process sensitive data, especially when it relates to security posture or personal information.

Third, there is a community and incentive dimension. If researchers feel their highly specialized work is being commercialized as training data without appropriate control, transparency, or compensation, their motivation to participate in coordinated disclosure can erode. Industry reports such as the Verizon Data Breach Investigations Report (DBIR) and ENISA Threat Landscape repeatedly show that attackers heavily exploit known but unpatched vulnerabilities. Reduced engagement from ethical hackers could therefore translate into a tangible drop in organizations’ cyber resilience.

Amid the discussion, HackerOne has announced upcoming updates to its Terms and Conditions to formalize the assurances given in public statements and address residual concerns. Clear contractual language around AI use is becoming essential to avoid reputational and legal risk.

The debate around HackerOne underlines a broader shift: in the era of AI‑assisted security, transparency about how vulnerability data is collected, stored, and used is emerging as a core competitive factor for bug bounty platforms. Organizations and researchers alike should carefully review AI policies and privacy sections in platform terms, ask direct questions about data flows and model training, and insist on explicit consent for any use of reports in AI systems. Platforms that prioritize clear, enforceable rules today are likely to become the most trusted partners in safeguarding digital infrastructure tomorrow.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.