
For over four decades, statistical physics has studied exactly solvable models of artificial neural networks. In this talk, we will explore how these models offer insights into deep learning and large language models. Specifically, we will examine a research strategy that trades distributional assumptions about data for precise control over learning behavior in high-dimensional settings. We will discuss several types of phase transitions that emerge in this limit, particularly as a function of data quantity. In particular, we will highlight how discontinuous phase transitions are linked to algorithmic hardness, impacting the behavior of gradient-based learning algorithms. Finally, we will review recent progress in learning from sequences and advances in understanding generalization in modern architectures, including the role of dot-product attention layers in transformers.
This event is open to the public with emphasis on graduate students in machine learning, computer science, ECE, statistics, mathematics, linguistics, medicine, as well as PhD-level data scientists in the GTA.