Building Trustworthy AI: Highlights from AAAI

By Benedikt Brückner

As AI systems move from research labs into safety-critical deployment in areas such as self-driving cars or autonomous aviation, the question is no longer just “what can this model do?” but “can we trust it to do it reliably?”

This was the main theme for us at this year’s AAAI Conference on Artificial Intelligence. It is always one of the most significant events on the calendar, covering the full spectrum of AI research. For the Safe Intelligence team, it was an incredibly productive week of presenting our latest research, attending tutorials as well as poster sessions, and exchanging ideas with the other researchers.

Here is a look at what we presented and the key takeaways that inspired us.

Our Contributions: Certifying Robustness in the Real World

We presented two papers this year, both focusing on a shared challenge: making computer vision models robust against the messy, unpredictable nature of the real world.

Certified Background-Invariant Training

(Presented at the Workshop on AI for Air Transportation)

In object detection tasks, models often cheat by relying on background context rather than the object itself. In safety-critical fields like aviation, this is dangerous. A plane is still a plane, regardless of the weather behind it, and e.g. an image detector should still be able to detect that plane. We proposed a new method combining Variational Autoencoders (VAEs) and certified training to induce provable robustness against background variations. Our experiments on aircraft detection showed that this approach improves generalisation and guides the network to focus on the object, not the noise.

Defending Models Against Input Blurring

(Presented at the Workshop on AI for Cyber Security)

Real-world cameras shake, vibrate, and blur. Standard models often fail when faced with this motion blur, yet traditional defenses like adversarial training lack formal guarantees. We introduced a novel Certified Training approach that leverages an efficient encoding of convolutional perturbations. The result? A model that achieves over 80% robust accuracy against motion blur on CIFAR10, outperforming standard adversarial training while maintaining high standard accuracy.

The VNN-COMP Tutorial

One of our highlights was the tutorial: “The Verification of Neural Networks Competition (VNN-COMP): A Lab for Benchmark Proposers, Verification Tool Participants, and the Broader AI Community.”

As neural networks enter automotive, medical, and aerospace domains, verifying them is non-negotiable. This tutorial was about lowering the barrier to entry for researchers to join that mission.

Taylor T Johnson kicked off the tutorial by motivating the verification of safety-critical systems.
Konstantin Kaulen provided an overview of the competition’s history and the results from the last competition (VNN-COMP ’25).
Matthew Daggitt and Edoardo Manino spoke about the current benchmark landscape in the competition and introduced the new VNN-LIB 2.0 standard. This evolution of the original VNN-LIB standard for specifying verification problems significantly improves expressiveness and ease of use.

The session was highly interactive, featuring hands-on demos for running verification tools and creating benchmarks on your own. It also featured a look at the AWS-based evaluation infrastructure. We found the tutorial extremely useful and hope to see many more industry-related benchmarks in this year’s competition. (Link to Tutorial Materials)

AI Robustness Highlights: What Caught Our Eye

The conference was packed with fascinating research, both at the poster sessions as well as during the oral presentations. We noticed three distinct trends: the race to secure LLMs, the expansion of verification into the physical 3D world, and the constant sharpening of theoretical bounds.

1. The Battle for Safe LLMs

As Large Language Models (LLMs) are used in more domains, their robustness to attacks becomes increasingly relevant. While traditional robustness verification methods often struggle to scale to models of these sizes, there were multiple papers which stood out for their novel approaches to defend LLMs and their realistic assessment of vulnerabilities.

AlignTree proposed a highly efficient defense using a random forest classifier to monitor LLM activations, detecting misaligned behavior without the high computational cost of heavy guard models. (Link to Paper)
Adversarial Prompt Disentanglement (APD) took a semantic approach, using graph-based intent classification to isolate and neutralise malicious components of a prompt before they even reach the model. (Link to Paper)
CluCERT addressed the difficulty of certifying LLMs against synonym substitutions. By using a clustering-guided denoising method, they achieved tighter certified bounds and better efficiency than previous word-deletion methods. (Link to Paper)
However, the defense landscape is still evolving. The STACK paper provided a sobering look at current safeguard pipelines, demonstrating a “staged attack” that achieved a 71% success rate against state-of-the-art defenses. This proves that we still have work to do when it comes to making LLMs safe. (Link to Paper)

2. Verification Meets the Physical World

Despite the recent focus on the field of LLMs, AI keeps being employed in a number of other fields as well. As an example, it operates robots and drones (as well as the self-driving taxis that Waymo is bringing to the UK this year!).

Phantom Menace was a standout study on Vision-Language-Action (VLA) models. The authors developed a “Real-Sim-Real” framework to simulate physical sensor attacks (like attacking a robot’s microphone or camera), revealing critical vulnerabilities in how these models integrate multi-modal data. (Link to Paper)
In the realm of 3D perception, FreqCert proposed a way to certify 3D point cloud recognition which is a challenging task due to the structure of the input data. By shifting the analysis to the frequency domain (spectral similarity), they created a defense that is robust against the geometric distortions that often fool spatial-domain defenses. (Link to Paper)

3. Pushing the Boundaries of Verification

Finally, we saw excellent work on the fundamental mathematics of verification.

Neural network verifiers often consider inputs in their verification queries which are either out of distribution or do not actually exist in the real world. VeriFlow proposes a fix for this by modeling the data distribution with a flow-based model. This allows verifiers to restrict their search to the data distribution of interest. (Link to Paper)
Ghost Certificates served as an important warning: it is possible to “spoof” certificates. The authors focused on models certified using Randomized Smoothing (including state-of-the-art diffusion-based defenses like DensePure), demonstrating that they can be manipulated by specialised, region-based attacks. These adversarial inputs are crafted to remain imperceptible while tricking the certification mechanism into issuing a large robustness radius for a completely incorrect class. This highlights a subtle but critical flaw in how we interpret guarantees from probabilistic certification frameworks: a valid certificate ensures stability around an input, but it does not guarantee the semantic correctness of the prediction itself. (Link to Paper)
We also saw specialised advances for specific architectures, including DeepPrism for tighter RNN verification (Link to Paper) and new Parameterised Abstract Interpretation methods for Transformers that can verify instances where existing methods fail. (Link to Paper)

Looking Ahead

AAAI showed us that the field of AI verification is maturing rapidly. We are moving from simple image classifiers to complex, multi-modal systems and LLMs, and the tools we use to verify them are becoming more sophisticated.

We are excited to integrate these insights into our work at Safe Intelligence and are looking forward to the next iteration of the VNN-COMP in 2026.