Aurelius is a decentralized protocol for red-teaming AI models. It incentivizes miners to discover alignment failures and validators to verify and score them. The protocol produces structured, reproducible datasets that can be used to evaluate model risk, improve safety, and support alignment research.
Today’s language models are often brittle under pressure. Most alignment testing is centralized, static, or narrow in scope. Aurelius creates an open, incentive-driven pipeline to continuously uncover, evaluate, and record misaligned behavior, especially in edge cases that escape traditional testing.
Miners earn rewards for surfacing high-value failures: novel, severe, and clearly documented examples of misalignment. Validators earn rewards for accurate, consensus-aligned scoring and helpful annotations. All emissions are distributed based on contribution quality and reproducibility in accordance with the Bittensor Blockchain’s token emission mechanics.
Aurelius initially targets open-source LLMs. Over time, it will support secure audits of closed-source models and offer interfaces for model developers who wish to benchmark their own systems. Technically speaking, any model is compatible with Aurelius from day 1, but protocol improvements will be easiest to isolate on smaller, open-source models at the beginning.
The Tribunate is the protocol’s governing logic layer. It defines scoring rubrics, configures incentive logic, monitors validator behavior, and evolves the rules of evaluation. Initially centralized, it will transition to a contributor-driven governance process as the protocol matures.
Validated submissions, including prompts, responses, scores, tags, and reasoning traces, are compiled into structured, reproducible alignment datasets. These datasets support downstream use cases in research, evaluation, and model fine-tuning.
Yes. The protocol generates high-signal failure data that can be used to:
Aurelius functions as an external, adversarial feedback loop for model refinement.
The initial focus is on text-based, general reasoning models, but the architecture is flexible. Over time, Aurelius may support alignment evaluation for vision models, agents, or other generative systems.
Yes. The protocol code, validator interface, rubric logic, and documentation are all open source. Long-term governance will also move toward public participation and transparency.