ML Sprint: Transformers Wiki!

Hi Everybody!
As part of our first community hangout, we’re excited to be hosting a few sprints. This is one of the same:

The plan with ML Sprints is to run week-long activities where our community will contribute to projects.

This is one of three wikis that we’re inviting you to contribute to! This Wiki is meant to serve as a collection of best resources to learn about Transformer models and their applications.

This is a wiki! This means all of you can edit it, please do so!

Papers:
Attention Is All You Need (2017)
End-to-End Object Detection with Transformers (2020)
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2021)

Blog posts:
The Annotated Transformer
Transformer Deep Dive
The illustrated Transformer

Explanation videos:
Attention is All you need by Yannic Kilcher
GPT-2 by Yannic Kilcher
BERT by Yannic Kilcher
RoBERTa by Yannic Kilcher

Kaggle Notebooks:
Utilizing Transformer Representations Efficiently
On Stability of Few-Sample Transformer Fine-Tuning
Speeding up Transformer w/ Optimization Strategies

9 Likes

If you just want to get the hang of transformers in one post it would definitely be this one from Jay Alammar The illustrated Transformer

5 Likes

This playlist from Ms. Cofee Bean is informative as well.

The Transformer explained by Ms. Coffee Bean

I specifically like this diagram from Chirs McCormick on what you need to know to understand transformers:

3 Likes
  1. A great lecture by Dr. Rachel Tomas, about the fundamental idea behind Transformers. YouTube link
  2. “Attention is All You Need” paper read through by Yannic Kilcher. YouTube link
4 Likes

These are some of the paper walk-throughs everyone should go through at least once:

  1. Attention is All you need by Yannic Kilcher
  2. GPT-2 by Yannic Kilcher
  3. BERT by Yannic Kilcher
  4. RoBERTa by Yannic Kilcher

These kernels are good to learn from the application point of view:

  1. Different ways to utilize transformer representations Utilizing Transformer Representations Efficiently
  2. Stabilizing training of transformer models On Stability of Few-Sample Transformer Fine-Tuning
  3. Speeding up transformer training Speeding up Transformer w/ Optimization Strategies
4 Likes