Introducing Spec27: Spec-driven validation for AI applications and agents

By Steven Willmott

Launching in Early Access today!

Testing AI systems in production is a huge challenge. Manual testing quickly becomes overwhelming, there are often only vague definitions of what “good” looks like, and every type of agent seems to need a different testing framework.

At the same time, AI is moving rapidly into real workflows, customer interactions, and operational processes, increasing both the impact of failure and the difficulty of detecting it.

Today we’re introducing Spec27, an automated specification-driven validation platform for AI applications and agents, designed to address this gap.

Spec27 allows teams to define expected behaviour in a structured specification, then automatically generate and run tests against it both before deployment and for long term monitoring. These tests cover both robustness and security, ensuring systems perform reliably under real world variations, and are hardened against manipulation into unintended or harmful behaviour.

Spec27 takes existing test data and amplifies its coverage many times over by generating variants according to a wide range of adversarial methods. The platform also carries out all testing “outside-in” by integrating with the same access points users would access. This infrastructure independent approach tests the whole operational stack and avoids the need for privileged access to models or developer workflows.

Our specification-approach introduces a more principled approach to validation. Instead of relying on one-off checks or informal testing, teams can define a clear standard for behaviour, evaluate it automatically, and monitor it continuously. The specification becomes a durable source of truth for testing, benchmarking, and drift detection.

Key features include:

Specification-driven validation to define and enforce expected behaviour
Automated test generation to expand coverage without manual effort
Black-box validation to test both internal and third-party systems without code access
Combined security and functionality testing to ensure systems are both safe and effective
Ongoing monitoring to track behaviour as systems evolve

At Safe Intelligence, our mission is to build tools that enable safer deployment and use of AI systems. With Spec27 we’re excited to be bringing this knowledge and experience to AI applications and Agents for the first time.

Spec27 replaces manual, ad-hoc testing with a repeatable validation process, enables independent evaluation of opaque systems, and provides structured evidence for deployment decisions.

Find us this week at the AI.Engineer Europe Summit in London! We’re at booth G2!

Sign up for early access at Spec27.ai.

Introducing Spec27: Spec-driven validation for AI applications and agents

Launching in Early Access today!

Related news & resources

Related news & resources

Interested in knowing more?