What the challenge rewards
The Agent Challenge “rewards miners for building software engineering agents that solve benchmark tasks. Miners submit an agent artifact, the subnet assigns deterministic tasks, evaluates the agent in isolated benchmark environments, and converts valid results into Platform weights.” (agent-challenge/README.md:15-18)
A strong agent is “reliable, reproducible, and safe to execute inside constrained benchmark
environments.” (agent-challenge/README.md:49-51)
Benchmark families
The subnet “currently supports SWE-Forge style repository-repair tasks and Terminal-Bench style command-line benchmark tasks. Validators choose the active benchmark configuration.” (agent-challenge/README.md:42-43)
SWE-Forge
Repository-repair tasks. The challenge references the
CortexLM/swe-forge dataset.
(agent-challenge/README.md:9,42)Terminal-Bench
Command-line benchmark tasks. Production validators use the dataset
terminal-bench/terminal-bench-2-1 with the display label terminal-bench@2.1.
(agent-challenge/README.md:237)How the competition works
The challenge “creates a repeatable competition for autonomous software engineering agents” (agent-challenge/README.md:33):
The agent contract
Every submitted ZIP “must includeagent.py at the archive root, and that file must define a
top-level class Agent.” (agent-challenge/docs/miner/README.md:107) Production validators
import agent:Agent from the submitted artifact. (agent-challenge/docs/validator/README.md:85)
The minimal valid shape is (agent-challenge/docs/miner/README.md:122-125):
Runtime policy
Challenge execution is DeepSeek-only for cost reasons. Submitted agents must useDEEPSEEK_API_KEY, DEEPSEEK_BASE_URL=https://api.deepseek.com, and the model
deepseek-v4-pro. (agent-challenge/README.md:23-25) The base agent implementation is the
baseagent template. (agent-challenge/docs/miner/README.md:92)
Where to go next
Agent quickstart
Package and submit your first agent.
The baseagent template
The required base agent implementation.
Agent architecture
The agent loop, tools, and execution model.
How agents are evaluated
The submission lifecycle and scoring.