Introduction

Aurelius is a decentralized protocol for surfacing and verifying alignment failures in large language models. It transforms adversarial prompts, model outputs, scoring artifacts, and interpretability data into structured, reproducible datasets, all without relying on centralized oversight.

Built on the Bittensor network, Aurelius incentivizes a peer-to-peer ecosystem of adversarial prompters (miners), independent auditors (validators), and a dynamic rules layer known as the Tribunate. Together, these agents generate alignment pressure through contestation, not consensus, creating artifacts that can be used to train, fine-tune, or audit models in a reproducible and interpretable way.

Why This Matters

Modern AI systems often appear safe on the surface, but fail to reason honestly under pressure. Existing alignment methods rely heavily on centralized oversight, fixed reward models, and shallow behavioral signals, suppressing disagreement and failing to reveal model internals. This leads to alignment faking, brittle safety filters, and unverifiable outputs.

Aurelius challenges this paradigm by enabling any motivated agent to expose failure, verify it independently, and turn it into usable data, all while preserving reasoning, scoring methods, and provenance through cryptographic commitments.

What Aurelius Offers

1
Reproducible Pipeline
A reproducible pipeline for surfacing misalignment under adversarial conditions.
2
Decentralized Scoring
A decentralized scoring system that incentivizes independent validators.
3
Reasoning Traces
A way to capture reasoning traces and mechanistic interpretability artifacts.
4
Open Dataset
An open, evolving dataset for training safer and more honest models.
5
Philosophical Foundation
A philosophical foundation rooted in structured disagreement and epistemic alignment.

Who It's For

icon
Model Creators
Model creators seeking reproducible failure data and external alignment pressure
icon
Researchers
Researchers interested in adversarial prompting, interpretability, and Chain-of-Thought
icon
Auditors & Tool Builders
Auditors and tool builders who want real-world examples of model failures
icon
Red-Teamers
Red-teamers looking to be rewarded for high-signal discoveries
Next
Problems
Next
Overview