Aurelius transforms alignment from a passive concern into an active market. It rewards agents not for saying the right thing, but for surfacing, verifying, and organizing evidence of when models fail.
By aligning emissions with epistemic value, the protocol creates an economy around adversarial discovery, reproducible verification, and continuous rubric governance.
Miners are rewarded for surfacing examples of misalignment, prompts that cause models to produce harmful, deceptive, biased, or otherwise unsafe behavior. Rewards scale with:
Severity — how significant is the failure?
Novelty — has this failure mode been seen before?
Signal Richness — does the submission include helpful tags, reasoning traces, or interpretability metadata?
Validator Agreement — did auditors confirm the output as meaningful misalignment?
Each miner submission is evaluated by multiple validators. If the output is:
Reproducible
Accurately scored
Confirmed as a failure
…then the miner receives:
Protocol emissions (e.g. dTAO)
Reputation gains
Priority access to future model evaluations
Low-quality, repetitive, or unverifiable submissions receive no rewards and may reduce miner reputation over time.
Validators verify alignment failures. They earn rewards for faithfully auditing miner submissions using the rubric defined by the Tribunate.
Key incentive drivers:
Scoring accuracy — agreement with validator consensus
Tag quality — correct classification of failure types
Evaluation depth — use of supporting comments or rationale
Participation rate — consistent, timely auditing of miner submissions
Validators build reputation over time, which affects:
Their influence in consensus aggregation
Their payout multiplier
Their eligibility for rubric review roles
Validators who fail spot checks, miss clear failures, or deviate from consensus may face slashing.
The Tribunate defines the reward logic that governs the entire system. Initially curated by the core team, it will transition into a semi-decentralized body responsible for:
Maintaining scoring rubrics and dimension weights
Defining validator scoring functions
Updating audit protocols and evaluation standards
Tribunate contributors may receive:
A share of protocol emissions
Recognition in governance dashboards and academic outputs
Long-term authority over how alignment is measured
Their primary incentive is epistemic integrity, ensuring that the reward function tracks truth, not trend.
dTAO emissions
Paid to miners and validators for high-value contributions
Delegated staking
Token holders may support top performers, earning yield while guiding emissions
Aurelius includes safeguards to protect against abuse:
Miner identity is hidden from validators
Prompt sampling prevents cherry-picking
Validator–miner collusion is detected through statistical outlier detection
The Tribunate regularly rotates rubric logic and audit conditions
High-reputation agents are trusted more, but no one is immune to challenge.
The protocol doesn’t reward output alone, it rewards impactful insight.
The system is designed to:
Direct effort toward high-risk failure domains
Encourage discovery of subtle or evolving failure modes
Sustain a robust validator class capable of critical judgment
Grow an high-signal dataset of validated alignment failures
Miners earn rewards for exposing model failures that are validated and reproducible
Validators earn rewards for confirming signal and maintaining evaluation quality
Tribunate members shape the incentive logic itself and are rewarded for maintaining alignment integrity
Reputation and slashing systems ensure long-term accountability
All rewards are linked to verifiable alignment signal, not popularity, not politics, not compliance
Aurelius turns alignment into an adversarial, decentralized market for truth, where incentives sharpen safety at scale.