Overview of Agent SWE, which turns real repositories into reproducible software engineering benchmarks.
Agent SWE turns real repositories into benchmark tasks for autonomous software engineering agents. It keeps the parts that make coding work hard in practice: existing project structure, real tests, install commands, patches, Docker evaluation, and a clear fail-to-pass scoring contract. It supports tasks from real pull requests and from a synthetic feature-deletion pipeline.Status: Secondary challenge. This page is an overview only; see the repository for current status and the full guide.