CPU vs GPU vs TPU | The AiEdge

Autoplay
Autocomplete

Previous Lesson Complete and Continue

Train, Fine-tune and Deploy LLMs - Bootcamp

Welcome!

Curriculum
Tools
Join Discord
Tell me more about you!
Schedule

The Transformer Architecture

Recorded Meeting #1 (202:17)
Recorded Meeting #2 (206:48)
Intro (1:33)
The Overall Architecture (11:19)
The Self-Attention layer (12:20)
The Multihead Attention Layer (3:56)
The Position Embedding (13:07)
The Encoder (5:23)
The Decoder (6:36)
Implementing the Self-Attention Layer (15:24)
Implementing the Multihead Attention Layer (13:03)
Implementing the Position Embedding (4:16)
Implementing the Feed-Forward Network (1:54)
Implementing The Encoder Block (2:44)
Implementing the Encoder (2:58)
Implementing the Decoder Block (3:56)
Implementing the Decoder (3:12)
Implementing the Transformer (2:35)
Testing the Code (8:04)
Outro (0:47)
Homework 1
Homework 1 Feedback

Training LLMs to Follow Instructions

Recorded Meeting #3 (233:58)
Recorded Meeting #4 (244:17)
Intro (1:00)
The Overview (5:55)
Causal Language Modeling Pretraining (10:58)
Supervised Learning Fine-Tuning (7:46)
Reinforcement Learning with Human Feedback (15:09)
Implementing the Pretraining Step (21:54)
Implementing the Supervised Learning Fine-Tuning Step (9:53)
Implementing the Reinforcement Learning Fine-Tuning Step (28:25)
Outro (0:26)
Homework 2
Homework 2 Feedback

How to Scale Model Training

Recorded Meeting #5 (259:18)
Recorded Meeting #6 (213:33)
Intro (1:22)
CPU vs GPU vs TPU (5:49)
The GPU Architecture (8:03)
Distributed Training (2:38)
Data parallelism (3:59)
Model parallelism (7:38)
Zero Redundancy Optimizer Strategy (10:49)
Distributing Training with the Accelerate Package on AWS Sagemaker (38:08)
Outro (0:27)
Homework 3
Homework 3 Feedback

How to Fine-Tune LLMs

Recorded Meeting #7 (250:04)
Recorded Meeting #8 (228:21)
Intro (1:05)
The Different Fine-tuning tasks (4:23)
Language Modeling (8:16)
Sequence Prediction (5:03)
Text Classification (4:16)
Text Encoding (5:17)
Multimodal Fine-tuning (2:39)
Catastrophic forgetting (1:47)
LoRA Adapters (11:35)
QLoRA (19:24)
LoRA and QLoRA with the PEFT Package (22:25)
Outro (0:36)
Homework 4
Homework 4 Feedback

How to Deploy LLMs

Recorded Meeting #9 (226:06)
Recorded Meeting #10 (199:38)
Intro (0:59)
Before Deploying (9:13)
The Deployment Strategies (8:59)
Multi-LoRA (2:51)
The Text Generation Layer (13:13)
Streaming Applications (5:25)
Continuous Batching (6:13)
KV-Caching (11:18)
Deploying with vLLM (9:58)
Outro (0:22)
Homework 5

Building the Application Layer

Recorded Meeting #11 (215:38)
Recorded Meeting #12 (200:11)
Intro (1:39)
What is the Application Layer (5:21)
The RAG Application (4:19)
Optimizing the Indexing Pipeline (6:10)
Optimizing the Query (4:52)
Optimizing the Retrieval (5:00)
Optimizing the Document Selection (8:56)
Optimizing the Context Creation (8:39)
Building a simple RAG Application (1:28)
Implementing the Indexing Pipeline (35:04)
Implementing the Retrieval API (24:19)
Homework 6
Homework 6 Feedback
Outro (0:32)

CPU vs GPU vs TPU

Lesson content locked

If you're already enrolled, you'll need to login.

Enroll in Course to Unlock