Gail Weiss: Thinking Like Transformers
03 March 2023
- 03 March 2023
Room A1.02 - East Campus USI-SUPSI
Room A1.02 - East Campus USI-SUPSI
Transformers - the purely attention based NN architecture - have emerged as a powerful tool in sequence processing. But how does a transformer think? When we discuss the computational power of RNNs, or consider a problem that they have solved, it is easy for us to think in terms of automata and their variants (such as counter machines and pushdown automata). But when it comes to transformers, no such intuitive model is available.
In this talk I will present a programming language, RASP (Restricted Access Sequence Processing), which we hope will serve the same purpose for transformers as finite state machines do for RNNs. In particular, we will identify the base computations of a transformer and abstract them into a small number of primitives, which are composed into a small programming language. We will go through some example programs in the language, and discuss how a given RASP program relates to the transformer architecture.
In this talk I will present a programming language, RASP (Restricted Access Sequence Processing), which we hope will serve the same purpose for transformers as finite state machines do for RNNs. In particular, we will identify the base computations of a transformer and abstract them into a small number of primitives, which are composed into a small programming language. We will go through some example programs in the language, and discuss how a given RASP program relates to the transformer architecture.
The Speaker
Gail Weiss is a postdoctoral fellow working in the NLP lab of Prof. Antoine Bosselut at EPFL. Before that, she was a PhD student at the Technion working with Prof. Eran Yahav and Prof. Yoav Goldberg. Her main research interest at the moment is in understanding sequential neural networks (such as RNNs and transformers) through the lens of formal language theory, whether in order to extract more interpretable models (such as deterministic finite automata from RNNs), or understand what such interpretable models should even be (as in the case of transformers).