Microsoft has introduced two open-source tools — RAMPART and Clarity — designed to test the security of AI agents directly in the development process. RAMPART enables engineers to write and run tests for agents’ resilience to attacks such as cross-prompt injections and data exfiltration, while Clarity helps teams identify design risks even before any code is written. Both tools are available on GitHub and are aimed at developers building autonomous AI systems.
What RAMPART is
RAMPART (Risk Assessment and Measurement Platform for Agentic Red Teaming) is a security testing framework built on top of Pytest. It allows developers to create test scenarios that simulate attacks on AI agents and evaluate the results. The stated testing categories include:
- Cross-prompt injections — situations where untrusted data enters the AI system indirectly through processed sources such as email, files, or web pages;
- Unintended behavioral regressions — cases where the agent starts behaving in ways that were not intended;
- Data exfiltration — unauthorized extraction of information from the system.
An adapter is required to connect the agent under test to the framework. RAMPART evaluates test results and generates a report, covering both adversarial and normal-operation scenarios.
Its positioning relative to its predecessor — PyRIT (Python Risk Identification Tool), which Microsoft released more than two years ago — is particularly important. According to the company, PyRIT is optimized for black-box research of already built systems, whereas RAMPART is created for engineers working on a system during its construction. In effect, RAMPART is built on top of PyRIT, turning exploratory red teaming findings into reproducible engineering tests.
The role of Clarity in the development lifecycle
Clarity tackles a different task: it is a tool for structured analysis of design decisions at the early stages. Microsoft describes it as “an AI-powered thinking partner that pushes back” — it guides the team through refining the problem, exploring solution options, analyzing possible failures, and recording the decisions made.
The idea is to capture why a particular decision was made — for example, what tool access an agent receives — before the system is built. As noted in the announcement, the goal is to give product managers and engineers the ability to validate their assumptions at the start of the project, when changing course is still inexpensive.
Strategic context: from one-off checks to a continuous process
The key idea behind both tools is the shift from one-off AI security audits to a set of “living artifacts” that accompany the system throughout its entire lifecycle. Microsoft states that it intends to make incidents reproducible and remediation measures verifiable, scaling the knowledge gained from red teaming into executable engineering assets.
This reflects a broader trend: as AI agents gain access to real tools and data, traditional security approaches — checks at the final stage — are becoming insufficient. Agent-specific threats such as cross-prompt injections are not covered by standard application security testing tools.
Practical recommendations
- For teams developing AI agents: assess whether RAMPART can be integrated into your existing CI/CD pipeline. A Pytest-based framework lowers the adoption barrier for teams already using the Python ecosystem.
- At the design stage: use Clarity to document security-related assumptions before implementation begins — especially those concerning agents’ access to external tools and data sources.
- For existing PyRIT users: RAMPART complements rather than replaces PyRIT. Consider the combination: PyRIT for exploratory vulnerability discovery, RAMPART for regression testing of identified issues.
It should be taken into account that the stated capabilities of both tools are based solely on Microsoft’s description — at the time of publication, no independent evaluation of their effectiveness is available. Teams planning adoption are advised to start with pilot testing on non-critical agents, assess detection quality on their own scenarios, and only then scale usage to production systems.