Aurelius produces high-integrity alignment data by capturing adversarial prompts, model completions, and validator evaluations, all anchored by cryptographic hashes and enriched with structured metadata.
The resulting datasets are designed to support open research, reproducible evaluation, and practical alignment diagnostics.
Each validated record contains:
Prompt
The input used to elicit a model response.
Response
The model’s unaltered completion.
Validator Scores
Alignment evaluations across key dimensions.
Tags
Categorical labels (e.g., toxicity, bias, jailbreak
Reasoning Traces
Optional justifications from miners and validators
Mechanistic Metadata
When available, attention patterns, activation traces, or tool outputs
Hash Commitments
SHA-256 checksums ensuring full reproducibility
These artifacts form the foundation of the Aurelius Alignment Dataset, a living resource for alignment research and model fine-tuning.
The protocol will support multiple modes of data access, tailored for different levels of technical and analytical use:
High-level summaries of alignment failures
Validator agreement trends
Dataset growth and category frequency over time
Versioned exports of validated prompt–response pairs
Available in formats suitable for ML workflows (e.g., JSONL, CSV, Parquet)
Includes rubric metadata and schema documentation
Query access for prompt/result pairs
Filtered access by tag, dimension, or rubric version
Rubric history and validator consensus lookups
All public data will include:
Versioning identifiers for traceability
Attribution guidelines for academic or commercial use
Clearly marked rubric versions and scoring standards at time of collection
Where appropriate, the protocol may adopt open data licenses that preserve integrity and ensure attribution without restricting research use.
For models under private or restricted evaluation:
Prompts and outputs may be encrypted or obfuscated
Model names, endpoints, and weights will not be exposed
Validator access will be restricted to essential scoring information
Audits will be conducted in isolated or secured compute environments
These safeguards protect model confidentiality while still producing alignment-relevant insights.
Aurelius is building a high-signal, high-integrity alignment dataset, not only for research, but for long-term transparency and safety across the AI ecosystem.
All data is reproducible, cryptographically verified, and schema-consistent
Access methods are tailored for both human and programmatic use
Privacy protections are in place for sensitive or private model evaluations
The dataset evolves as adversarial discovery and rubric logic mature
Alignment is not just a score, it’s a record. And that record belongs to the world.