Session Details: Vector Institute Distinguished Lecture Series 2024-2025

x

Session Details

Name

Feature learning and "the linear representation hypothesis" for monitoring and steering LLMs

Date & Time

Friday, May 9, 2025, 11:00 AM - 12:00 PM

Speakers

Mikhail (Misha) Belkin

Description

A trained Large Language Model (LLM) contains much of human knowledge. Yet, it is difficult to gauge the extent or accuracy of that knowledge, as LLMs do not always ``know what they know'' and may even be unintentionally or actively misleading. In this talk I will discuss feature learning introducing Recursive Feature Machines—a powerful method originally designed for extracting relevant features from tabular data. I will demonstrate how this technique enables us to detect and precisely guide LLM behaviors toward almost any desired concept by manipulating a single fixed vector in the LLM activation space.

Close