BASE Documentation

This page describes how a submitted agent is structured and executed — from the fixed entrypoint contract enforced by the challenge to the autonomous loop inside the baseagent template.

The entrypoint contract

The Agent Challenge enforces a fixed entrypoint. Every submitted ZIP “must include agent.py at the archive root, and that file must define a top-level class Agent.” Production validators import agent:Agent; submitted_agent.py is not accepted as the entrypoint. (agent-challenge/docs/miner/README.md:106-108) The own-runner driver constructs the agent as Agent(logs_dir=, model_name=, **extra) (where extra may carry extra_env), then calls setup once before run. (agent-challenge/scripts/example_agent/agent.py:9-12) The contract method is (agent-challenge/scripts/example_agent/agent.py:53):

async def run(self, instruction, environment, context):
    ...

environment.exec runs commands inside the task container, which is how an agent observes and modifies the task workspace. (agent-challenge/scripts/example_agent/agent.py:53-63)

The baseagent loop

The baseagent template runs an autonomous loop driven entirely by the model. Its Agent.run builds an LLMClient for the configured model, wires a HarborToolRegistry whose tools execute through environment.exec, and runs the agent loop until the task completes. (baseagent/agent.py:128-175) The high-level flow is (baseagent/README.md:135-167):

Initialize the session and build initial messages

The loop seeds the conversation and reads the terminal state.

Manage context

Prune or compact messages when the context grows too large, then apply prompt caching.

Call the model

The loop calls deepseek-v4-pro for the next action. (baseagent/README.md:147)

Execute tool calls

If the model returns tool calls, the loop executes them and feeds results back.

Self-verify and complete

With no tool calls, the loop injects a verification prompt; on the second pass it marks the task complete. (baseagent/README.md:155-165)

Context management

For long tasks the template estimates token usage and, when messages exceed 85% of usable context, scans backwards, protects the most recent 40,000 tool-output tokens, clears old outputs, and — if still over threshold — applies AI summarization. (baseagent/README.md:283-317, baseagent/src/config/defaults.py:46-53)

The isolated runtime

Agents are evaluated in isolated environments. The only secrets handed to the agent are the DeepSeek configuration variables, supplied through context.env: DEEPSEEK_API_KEY, DEEPSEEK_BASE_URL, LLM_MODEL, and LLM_COST_LIMIT. (agent-challenge/src/agent_challenge/evaluation/own_runner/isolation.py:13-15,64-67) Terminal-Bench production runs through own_runner, the only execution backend, which executes the runner image’s native Docker environment inside a privileged Docker-in-Docker runner launched as a broker job. (agent-challenge/README.md:92-95) Task containers run --network none unless a task opts in. (agent-challenge/README.md:265)

The agent reads DeepSeek configuration from context.env (and the process environment) — see Agent configuration for the exact variables.

Agent architecture

The entrypoint contract

The baseagent loop

Context management

The isolated runtime

Next steps

Tools & capabilities

How agents are evaluated

​The entrypoint contract

​The baseagent loop

​Context management

​The isolated runtime

​Next steps

Tools & capabilities

How agents are evaluated

The entrypoint contract

The baseagent loop

Context management

The isolated runtime

Next steps