PRISM fixes the dataset and the evaluation protocol, not the model search space — but every submission must stay inside a fixed set of contract, sandbox, and resource limits. A violation is rejected at static review, before any GPU work.
Source: docs/submissions.md:1-10.
Two-script contract
A bundle must contain two distinct scripts: an architecture role exposing build_model and a training role exposing train. The single-module re-export idiom no longer satisfies the contract — if the architecture and training entrypoints resolve to the same file, the submission is rejected.
Source: src/prism_challenge/evaluator/components.py:99-103.
Parameter cap
The realized model is capped at 150M parameters (max_parameters = 150_000_000). The cap is enforced statically at forced-seed instantiation and re-checked inside the container against the model the runner actually trained.
Source: src/prism_challenge/evaluator/interface.py:26; src/prism_challenge/evaluator/container.py:1149-1176; docs/submissions.md:6.
Token geometry
The context exposes a default token-id geometry the model must respect: vocab_size = 4096 and sequence_length = 128 (max_seq_len).
Source: src/prism_challenge/evaluator/interface.py:23-24.
AST sandbox
Both scripts pass an AST sandbox of hard blocks before any GPU work. Unsafe imports, network access, arbitrary filesystem access, and deserialization escapes are rejected. build_model must stay pure: it must not read data, open files, touch the network, or reference the dataset.
Source: docs/miner/README.md:39-41, :124-125; docs/submissions.md:24-26.
Locked data, no network
The train split is exposed read-only at ctx.data_dir; the val/test splits are secret and never exposed to your script. The eval container runs with network=none, HF_HUB_OFFLINE=1, and HF_DATASETS_OFFLINE=1, so there is no network during training. Read raw text from ctx.data_dir, tokenize with your own tokenizer or a pre-staged reference, and fail closed if the locked data is missing rather than fabricating data.
Source: docs/submissions.md:83-94; docs/miner/README.md:76-81.
Single-node multi-GPU bounds
PRISM is single-node only. Runs use 1-8 GPUs on one node, and the official scored run uses torchrun --standalone --nnodes=1 --nproc-per-node=1 (the nproc=1 path, since one physical GPU exists). Requests above 8 GPUs or for multiple nodes are rejected.
A correct training.py:
- calls
init_process_group (nccl on GPU) and set_device(local_rank);
- wraps the model with DDP or FSDP and shards data per-rank;
- does rank-0-only logging and artifact writes;
- all-reduces any reported metrics, then
barrier() and destroy_process_group() on exit;
- also works correctly at
world_size=1.
Multi-GPU correctness is validated off the single physical GPU with a static contract check and a gloo multi-rank functional test (world size 2 and 4 on CPU). True 8-GPU scaling is an accepted, unverifiable limitation on a one-GPU node.
Source: docs/submissions.md:96-115; docs/scaling.md:22-46.
Writable paths
ctx.artifacts_dir is the only writable path, and only rank 0 writes. The eval container is non-root with a read-only rootfs except artifacts_dir.
Source: docs/submissions.md:66; docs/architecture.md:100-102.
Compute budget
The score is compute-normalized; wall-clock is only a safety cap, enforced in layers — a graceful budget that stops the loop and scores the partial stream, a hard watchdog above it, and an outer broker timeout. A faster or larger GPU configuration does not change the ranking; it only changes how much of the budget the run can use.
Source: docs/scaling.md:48-59.
Size and archive limits
| Limit | Value | Source |
|---|
| Max submission code size | max_code_bytes = 200000 | config.example.yaml:12 |
| Epoch length | epoch_seconds = 21600 | config.example.yaml:16 |
| ZIP path traversal | rejected | docs/submissions.md:173-183 |
| ZIP symlinks | rejected | docs/submissions.md:173-183 |
| ZIP file count / total bytes | bounded | docs/submissions.md:173-183 |
| Allowed suffixes | approved text/code suffixes only | docs/submissions.md:173-183 |
ZIP submissions are extracted defensively. Unsupported or unsafe archives are rejected before evaluation.
LLM hard gate
A strong OpenRouter LLM reviews both scripts as a hard gate and can reject before any GPU work. A reject is terminal.
Source: README.md:55; docs/miner/README.md:126.
See Submitting to PRISM for the manifest and Scoring for how a valid run is scored.