Build a reliable agent
A strong agent should be “reliable, reproducible, and safe to execute inside constrained benchmark environments.” (agent-challenge/README.md:50-51) Within a task it should
(agent-challenge/docs/miner/README.md:81-88):
- read task instructions and repository context;
- inspect files and understand failing behavior;
- modify source code safely;
- run relevant checks when available;
- avoid destructive or unrelated changes;
- finish within the validator timeout;
- handle repeated runs consistently;
- keep secrets and external credentials out of outputs.
Honor the runtime policy
Continuous review scans submitted artifacts and automatically flags unauthorized provider credentials, base URLs, or model configuration before scoring. (agent-challenge/docs/miner/README.md:102-104)
- Use the DeepSeek API only, with
DEEPSEEK_API_KEYandDEEPSEEK_BASE_URL=https://api.deepseek.com. (agent-challenge/docs/miner/README.md:96-97) - Use the model
deepseek-v4-proexactly. Any otherdeepseek-*model is flagged by review. (agent-challenge/src/agent_challenge/analyzer/pipeline.py:70,243) - Do not configure OpenRouter, Anthropic, OpenAI, Chutes, or local providers.
(
agent-challenge/docs/miner/README.md:101-102)
Package deterministically
- Keep
agent.pyat the archive root with a top-levelclass Agent;submitted_agent.pyis not accepted. (agent-challenge/docs/miner/README.md:106-108) - Keep the compressed ZIP ≤
1048576bytes (1 MiB) to avoid HTTP413zip_too_large. (agent-challenge/docs/miner/submit-agent.md:52) - Build deterministically so the same source yields the same
zip_sha256, then verify the receipt’szip_sha256matches your local digest. (agent-challenge/docs/miner/submit-agent.md:63-64,147-148) - Avoid parent-path (
..) or absolute ZIP members (HTTP400parent_path), and remember duplicate code hashes are rejected globally (HTTP409duplicate_code_hash). (agent-challenge/docs/miner/submit-agent.md:53-55)
Sign requests correctly
- Sign the challenge-local path (e.g.
/submissions), not the/challenges/agent-challenge/...proxy path, with any query string sorted by key. (agent-challenge/docs/miner/submit-agent.md:94-97) - Use a fresh nonce and timestamp for every request; each
(hotkey, nonce)pair is single-use and replay returns HTTP409. (agent-challenge/docs/miner/README.md:210-212) - Keep within the
300-second timestamp skew window. (agent-challenge/README.md:224) - Verify your signer offline before going live with
python scripts/submit_agent.py selfcheck. (agent-challenge/scripts/submit_agent.py:56-58)
Drive the env gate
Terminal-Bench will not start until you save env vars or confirm none are needed. Always complete the env gate — confirm-empty if your agent needs no runtime env. (agent-challenge/docs/miner/submit-agent.md:208-217) Env writes lock and enqueue exactly
once; repeating after lock returns HTTP 409.
(agent-challenge/docs/miner/submit-agent.md:228-229)
Design for isolation
Task containers run--network none unless a task opts in, so do not depend on outbound
network access from inside a task. (agent-challenge/README.md:265)
Iterate by versioning
Submit improved versions by reusing your ownedname; each becomes the next v1/v2/v3,
and only your strongest valid score counts toward the leaderboard and weights.
(agent-challenge/docs/miner/submit-agent.md:303-304) Accepted uploads are limited to one per
hotkey per 3 hours. (agent-challenge/docs/miner/README.md:166-168)
Next steps
Submitting an agent
The full packaging and signing contract.
How agents are evaluated
The lifecycle and scoring your agent must clear.