Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Train, Fine-tune and Deploy LLMs - Bootcamp
Welcome!
Curriculum
Tools
Join Discord
Tell me more about you!
Schedule
The Transformer Architecture
Intro (1:33)
The Overall Architecture (11:19)
The Self-Attention layer (12:20)
The Multihead Attention Layer (3:56)
The Position Embedding (13:07)
The Encoder (5:23)
The Decoder (6:36)
Implementing the Self-Attention Layer (15:24)
Implementing the Multihead Attention Layer (13:03)
Implementing the Position Embedding (4:16)
Implementing the Feed-Forward Network (1:54)
Implementing The Encoder Block (2:44)
Implementing the Encoder (2:58)
Implementing the Decoder Block (3:56)
Implementing the Decoder (3:12)
Implementing the Transformer (2:35)
Testing the Code (8:04)
Outro (0:47)
Homework 1
Homework 1 Feedback
Training LLMs to Follow Instructions
Intro (1:00)
The Overview (5:55)
Causal Language Modeling Pretraining (10:58)
Supervised Learning Fine-Tuning (7:46)
Reinforcement Learning with Human Feedback (15:09)
Implementing the Pretraining Step (21:54)
Implementing the Supervised Learning Fine-Tuning Step (9:53)
Implementing the Reinforcement Learning Fine-Tuning Step (28:25)
Outro (0:26)
Homework 2
Homework 2 Feedback
How to Scale Model Training
Intro (1:22)
CPU vs GPU vs TPU (5:49)
The GPU Architecture (8:03)
Distributed Training (2:38)
Data parallelism (3:59)
Model parallelism (7:38)
Zero Redundancy Optimizer Strategy (10:49)
Distributing Training with the Accelerate Package on AWS Sagemaker (38:08)
Outro (0:27)
Homework 3
Homework 3 Feedback
How to Fine-Tune LLMs
Intro (1:05)
The Different Fine-tuning tasks (4:23)
Language Modeling (8:16)
Sequence Prediction (5:03)
Text Classification (4:16)
Text Encoding (5:17)
Multimodal Fine-tuning (2:39)
Catastrophic forgetting (1:47)
LoRA Adapters (11:35)
QLoRA (19:24)
LoRA and QLoRA with the PEFT Package (22:25)
Outro (0:36)
Homework 4
Homework 4 Feedback
How to Deploy LLMs
Intro (0:59)
Before Deploying (9:13)
The Deployment Strategies (8:59)
Multi-LoRA (2:51)
The Text Generation Layer (13:13)
Streaming Applications (5:25)
Continuous Batching (6:13)
KV-Caching (11:18)
Deploying with vLLM (9:58)
Outro (0:22)
Homework 5
Building the Application Layer
Intro (1:39)
What is the Application Layer (5:21)
The RAG Application (4:19)
Optimizing the Indexing Pipeline (6:10)
Optimizing the Query (4:52)
Optimizing the Retrieval (5:00)
Optimizing the Document Selection (8:56)
Optimizing the Context Creation (8:39)
Building a simple RAG Application (1:28)
Implementing the Indexing Pipeline (35:04)
Implementing the Retrieval API (24:19)
Homework 6
Homework 6 Feedback
Outro (0:32)
Intro
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock