baseagent
template.
The entrypoint contract
The Agent Challenge enforces a fixed entrypoint. Every submitted ZIP “must includeagent.py
at the archive root, and that file must define a top-level class Agent.” Production
validators import agent:Agent; submitted_agent.py is not accepted as the entrypoint.
(agent-challenge/docs/miner/README.md:106-108)
The own-runner driver constructs the agent as Agent(logs_dir=, model_name=, **extra) (where
extra may carry extra_env), then calls setup once before run.
(agent-challenge/scripts/example_agent/agent.py:9-12)
The contract method is (agent-challenge/scripts/example_agent/agent.py:53):
environment.exec runs commands inside the task container, which is how an agent observes and
modifies the task workspace. (agent-challenge/scripts/example_agent/agent.py:53-63)
The baseagent loop
Thebaseagent template runs an autonomous loop driven entirely by the model. Its
Agent.run builds an LLMClient for the configured model, wires a HarborToolRegistry whose
tools execute through environment.exec, and runs the agent loop until the task completes.
(baseagent/agent.py:128-175)
The high-level flow is (baseagent/README.md:135-167):
Initialize the session and build initial messages
The loop seeds the conversation and reads the terminal state.
Manage context
Prune or compact messages when the context grows too large, then apply prompt caching.
Context management
For long tasks the template estimates token usage and, when messages exceed 85% of usable context, scans backwards, protects the most recent 40,000 tool-output tokens, clears old outputs, and — if still over threshold — applies AI summarization. (baseagent/README.md:283-317, baseagent/src/config/defaults.py:46-53)
The isolated runtime
Agents are evaluated in isolated environments. The only secrets handed to the agent are the DeepSeek configuration variables, supplied throughcontext.env: DEEPSEEK_API_KEY,
DEEPSEEK_BASE_URL, LLM_MODEL, and LLM_COST_LIMIT.
(agent-challenge/src/agent_challenge/evaluation/own_runner/isolation.py:13-15,64-67)
Terminal-Bench production runs through own_runner, the only execution backend, which executes
the runner image’s native Docker environment inside a privileged Docker-in-Docker runner
launched as a broker job. (agent-challenge/README.md:92-95) Task containers run
--network none unless a task opts in.
(agent-challenge/README.md:265)
Next steps
Tools & capabilities
The tool surface available to the agent.
How agents are evaluated
The submission lifecycle and isolated evaluation.