category | title | date | poster |
---|---|---|---|
event |
Towards Autonomous Language Model Systems |
May 21, 2025 |
assets/images/pt-day-cfp.png |

Date: May 21, 2025, 11AM PT / 2PM ET
Location: Online
Language models (LMs) are increasingly used to assist users in day-to-day tasks such as programming (Github Copilot) or search (Google's AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end?
In this talk, Ofir Press will discuss efforts to build autonomous LM systems, focusing on the software engineering domain. Ofir will present SWE-bench, a novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. Ofir will then discuss SWE-agent, a system for solving SWE-bench tasks.
SWE-bench and SWE-agent are used by many leading AI organizations in academia and industry, including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets can have a substantial impact in steering the research community toward building autonomous systems that can complete challenging tasks.
Ofir is a postdoc at Princeton University, where they mainly work with Karthik Narasimhan's lab. Ofir previously completed their PhD at the University of Washington in Seattle, where Ofir was advised by Noah Smith. During their PhD, Ofir spent two years at Facebook AI Research Labs on Luke Zettlemoyer's team.