Implementing the self-attention layer | The AiEdge

Autoplay
Autocomplete

Previous Lesson Complete and Continue

Introduction to Transformers for Large Language Models

Introduction

Welcome (3:01)

The RNN Encoder Decoder Architecture

Intro (1:09)
The Sequence to Sequence Models (2:29)
The RNN Encoder-Decoder Architecture: Concepts (12:22)
Implementing in PyTorch (6:47)
Implementing The Encoder (5:26)
Implementing The Decoder (8:52)
A Toy Example (6:30)
Putting the Encoder and Decoder together (3:17)
Outro (0:23)

The Attention Mechanism Before Transformers

Intro (0:45)
The RNN Encoder-Decoder vs Attention mechanism (4:08)
The Attention Layer (6:25)
The Bahdanau Attention (4:40)
The Luong Attention (3:13)
Implementing in PyTorch (2:18)
Implementing the Bahdanau attention (9:32)
Implementing the Luong attention (7:42)
Implementing the Decoder (10:51)
Putting everything together (2:32)
Outro (0:37)

The Self-Attention Mechanism

Intro (1:11)
Bahdanau vs Self-Attention (4:57)
The self-attention layer (6:21)
The Multihead attention layer (4:04)
Implementing the self-attention layer (13:02)
Implementing the Multihead attention layer (7:56)
Visualizing Attentions (9:34)
Outro (0:19)

Understanding the Transformer Architecture

Intro (1:01)
The Overall Architecture (5:10)
The Position Embedding (13:07)
The Encoder (5:23)
The Decoder (6:36)
Implementing the Position Embedding (4:16)
Implementing the Position-Wise Feed-Forward Network (1:54)
Implementing the Encoder Block (2:44)
Implementing the Encoder (2:58)
Implementing the Decoder Block (3:56)
Implementing The Decoder (3:12)
Implementing the Transformer (2:35)
Testing the code (8:05)
Outro (0:48)

How do we create Tokens from Words

Intro (1:31)
Word-level VS Character-Level VS Subword-level embeddings (8:58)
The Byte Pair Encoding Strategy (10:29)
Special Tokens (8:13)
The Hugging Face Tokenizer (7:29)
Visualizing The attentions with the padding token (12:47)
Outro (0:37)

How LLMs Generate Text

Intro (0:47)
Greedy Search Generation (5:48)
Multinomial sampling generation (12:24)
Beam Search generation (15:26)
Contrastive Search generation (9:16)
Generating Text with the Transformers package (15:57)
Outro (0:39)

Beyond LLMs: The Vision Transformer

Intro (0:46)
Transformer Applications (4:04)
The Vision Transformer architecture (13:47)
Image classification with The Vision Transformer (10:01)
Outro (0:33)

Implementing the self-attention layer

Complete and Continue

Discussion

0 comments