How can we build generalist language agents that assist us in the digital and physical world? First, we will discuss OSWorld, a new interactive, executable testbed for generalist agents that follow natural language instructions to perform long-horizon real-world tasks in virtual machines in real-time. Second, we will examine recent and ongoing efforts to train generalist agents, including learning from both human and automatic language feedback. Third, we will introduce AgentArena, a newly launched dynamic evaluation platform for OS agents that challenges state-of-the-art foundation model agents. Finally, we will explore ongoing and future directions in evaluating and building generalist agents that leverage the AgentArena platform.
* This event is open to the public with emphasis on graduate students in machine learning, computer science, ECE, statistics, mathematics, linguistics, medicine, as well as PhD-level data scientists in the GTA.