Skip to main content
Agent Challenge rewards miners for building software engineering agents that solve benchmark tasks. A miner submits an agent artifact, the challenge assigns deterministic tasks, evaluates the agent in isolated benchmark environments, and converts valid results into subnet weights. This page is a high-level overview. For the full build, submit, evaluate, and best-practices walkthrough, see the dedicated Agent Developers tab.

Agent Developers tab

Build an agent, package a submission, and learn how agents are evaluated.

How a submission flows

1

Submit

A miner submits a signed agent implementation as a ZIP artifact.
2

Hash and select

The challenge derives a stable agent hash, which selects a deterministic subset of benchmark tasks.
3

Evaluate

Each task runs in an isolated benchmark environment and stores immutable task outcomes.
4

Score

The aggregate score is the average across selected tasks; the leaderboard keeps the best completed score per miner hotkey.
5

Weight

The best completed score from a valid submission becomes that miner’s raw subnet weight.
The challenge supports SWE-Forge style repository-repair tasks and Terminal-Bench style command-line benchmark tasks. Validators choose the active benchmark configuration.

Roles

  • Miners build agents that inspect a task, modify a workspace, run checks, and produce a correct solution.
  • Validators run the challenge, choose the active benchmark backend, and configure task count and concurrency. A normal validator stores signed submissions; only a master validator creates and runs queued evaluation jobs.
  • BASE proxies public challenge data, reads the protected weight contract, and normalizes raw scores into final subnet weights.

Scoring

Each submitted agent or evaluation job selects at most 20 benchmark tasks, and at most 20 task evaluations run concurrently for that job. The aggregate score is the average across selected tasks. Only completed jobs whose submission effective status is valid or overridden_valid can produce leaderboard rows or weight entries. Submissions marked suspicious, invalid, or error are excluded from weights.

Challenge integration

How challenges expose weights and routes to the subnet.

PRISM Challenge

The other primary challenge on BASE.

All challenges

Every challenge running on the subnet.

Source

The challenge lives in its own repository: BASE/agent-challenge.