autonomous-language-model-systems.md

History

category	title	date	poster
event	Towards Autonomous Language Model Systems	May 21, 2025	assets/images/pt-day-cfp.png

Towards Autonomous Language Model Systems

Date: May 21, 2025, 11AM PT / 2PM ET
Location: Online

Language models (LMs) are increasingly used to assist users in day-to-day tasks such as programming (Github Copilot) or search (Google's AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end?

In this talk, Ofir Press will discuss efforts to build autonomous LM systems, focusing on the software engineering domain. Ofir will present SWE-bench, a novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. Ofir will then discuss SWE-agent, a system for solving SWE-bench tasks.

SWE-bench and SWE-agent are used by many leading AI organizations in academia and industry, including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets can have a substantial impact in steering the research community toward building autonomous systems that can complete challenging tasks.

Ofir is a postdoc at Princeton University, where they mainly work with Karthik Narasimhan's lab. Ofir previously completed their PhD at the University of Washington in Seattle, where Ofir was advised by Noah Smith. During their PhD, Ofir spent two years at Facebook AI Research Labs on Luke Zettlemoyer's team.

Register Now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autonomous-language-model-systems.md

autonomous-language-model-systems.md

Files

autonomous-language-model-systems.md

Latest commit

History

autonomous-language-model-systems.md

File metadata and controls