Skip to main content
PRISM fixes the dataset and the evaluation protocol, not the model search space — but every submission must stay inside a fixed set of contract, sandbox, and resource limits. A violation is rejected at static review, before any GPU work. Source: docs/submissions.md:1-10.

Two-script contract

A bundle must contain two distinct scripts: an architecture role exposing build_model and a training role exposing train. The single-module re-export idiom no longer satisfies the contract — if the architecture and training entrypoints resolve to the same file, the submission is rejected. Source: src/prism_challenge/evaluator/components.py:99-103.

Parameter cap

The realized model is capped at 150M parameters (max_parameters = 150_000_000). The cap is enforced statically at forced-seed instantiation and re-checked inside the container against the model the runner actually trained. Source: src/prism_challenge/evaluator/interface.py:26; src/prism_challenge/evaluator/container.py:1149-1176; docs/submissions.md:6.

Token geometry

The context exposes a default token-id geometry the model must respect: vocab_size = 4096 and sequence_length = 128 (max_seq_len). Source: src/prism_challenge/evaluator/interface.py:23-24.

AST sandbox

Both scripts pass an AST sandbox of hard blocks before any GPU work. Unsafe imports, network access, arbitrary filesystem access, and deserialization escapes are rejected. build_model must stay pure: it must not read data, open files, touch the network, or reference the dataset. Source: docs/miner/README.md:39-41, :124-125; docs/submissions.md:24-26.

Locked data, no network

The train split is exposed read-only at ctx.data_dir; the val/test splits are secret and never exposed to your script. The eval container runs with network=none, HF_HUB_OFFLINE=1, and HF_DATASETS_OFFLINE=1, so there is no network during training. Read raw text from ctx.data_dir, tokenize with your own tokenizer or a pre-staged reference, and fail closed if the locked data is missing rather than fabricating data. Source: docs/submissions.md:83-94; docs/miner/README.md:76-81.

Single-node multi-GPU bounds

PRISM is single-node only. Runs use 1-8 GPUs on one node, and the official scored run uses torchrun --standalone --nnodes=1 --nproc-per-node=1 (the nproc=1 path, since one physical GPU exists). Requests above 8 GPUs or for multiple nodes are rejected. A correct training.py:
  • calls init_process_group (nccl on GPU) and set_device(local_rank);
  • wraps the model with DDP or FSDP and shards data per-rank;
  • does rank-0-only logging and artifact writes;
  • all-reduces any reported metrics, then barrier() and destroy_process_group() on exit;
  • also works correctly at world_size=1.
Multi-GPU correctness is validated off the single physical GPU with a static contract check and a gloo multi-rank functional test (world size 2 and 4 on CPU). True 8-GPU scaling is an accepted, unverifiable limitation on a one-GPU node. Source: docs/submissions.md:96-115; docs/scaling.md:22-46.

Writable paths

ctx.artifacts_dir is the only writable path, and only rank 0 writes. The eval container is non-root with a read-only rootfs except artifacts_dir. Source: docs/submissions.md:66; docs/architecture.md:100-102.

Compute budget

The score is compute-normalized; wall-clock is only a safety cap, enforced in layers — a graceful budget that stops the loop and scores the partial stream, a hard watchdog above it, and an outer broker timeout. A faster or larger GPU configuration does not change the ranking; it only changes how much of the budget the run can use. Source: docs/scaling.md:48-59.

Size and archive limits

LimitValueSource
Max submission code sizemax_code_bytes = 200000config.example.yaml:12
Epoch lengthepoch_seconds = 21600config.example.yaml:16
ZIP path traversalrejecteddocs/submissions.md:173-183
ZIP symlinksrejecteddocs/submissions.md:173-183
ZIP file count / total bytesboundeddocs/submissions.md:173-183
Allowed suffixesapproved text/code suffixes onlydocs/submissions.md:173-183
ZIP submissions are extracted defensively. Unsupported or unsafe archives are rejected before evaluation.

LLM hard gate

A strong OpenRouter LLM reviews both scripts as a hard gate and can reject before any GPU work. A reject is terminal. Source: README.md:55; docs/miner/README.md:126. See Submitting to PRISM for the manifest and Scoring for how a valid run is scored.