LLMs - PyTorch - HuggingFace - LangChain
Hands-on Projects (May be subject to changes)
- Project 1: Implementing from scratch The sparse attention mechanisms, SliGLU, RMSNorm, MoE, and Rope embedding in PyTorch
- Project 2: Fine-tuning an LLM with PPO vs DPO vs ORPO using the PEFT package.
- Project 3: Train an LLM in a distributed manner with the Accelerate package in AWS SageMaker with the Zero Redundancy Optimizer Strategy.
- Project 4: Fine-tuning a model with QLoRA to increase the context size.
- Project 5: Deploying a scalable LLM application API with streaming, KV-caching, Continuous batching, and text generation layer capabilities.Â
-
Project 6: Deploying an RAG application using LangChain, FastAPI, and LangServe.
Welcome Video
Welcome to the Train, Fine-Tune, and Deploy Large Language Models Bootcamp!
6 Weeks of Intense Learning!
The Transformer Architecture (1 week)
The Transformer is the fundamental Neural Network architecture that enabled the evolution of Large Language Models as we know them now.
- The Self-Attention Mechanism
- The Multihead attention
- The encoder-decoder architecture
- The position embedding
- The layer-normalization
- The position-wise feed-forward network
- The cross-attention layer
- The language modeling head
Training LLMs to Follow Instruction (1 week)
GhatGPT, Claude, or Gemini are LLMs trained to follow human instructions. We are going to learn how those are trained from scratch:
- The Causal Language Modeling Pretraining Step
- The Supervised Learning Fine-Tuning Step
- The Reinforcement Learning Fine-Tuning Step
- Implementing those Steps with HuggingFace
How to Scale Model Training (1 week)
More than ever, we need efficient hardware to accelerate the training process. We are going to explore the strategy of distributing training computations across multiple GPUs for different parallelism strategies:
- CPU vs GPU vs TPU
- The GPU Architecture
- Distributed Training
- Data Parallelism
- Model Parallelism
- Zero Redundancy Optimizer Strategy
How to Fine-Tune LLMs (1 week)
Fine-tuning a model means we continue the training on a specialized dataset for a specialized learning task. We are going to look at the different strategies to fine-tune LLMs:
- The different fine-tuning learning tasks
- Catastrophic forgetting
- LoRA Adapters
- QLoRA
How to Deploy LLMs (1 week)
The most important part of a machine learning model development is the deployment! A model that is not in production is a model that is costing money instead of generating money for the company. We are going to explore the different strategies to deploy LLMs:
- The Deployment Strategies
- Multi-LoRA
- The Text Generation Layer
- Streaming Applications
- Continuous Batching
- KV-Caching
- The Paged-Attention and vLLM
Building the Application Layer (1 week)
A deployed LLM on its own is not really useful. We are going to look at how we can build an agentic application on top of the model with LangChain:
- Implementing a Retriever Augmented Generation (RAG) pipeline with LangChain
- Optimizing the RAG pipeline
- Serving the pipeline with LangServe and FastAPI
What is included!
- 40+ hours of recorded lectures
- 6 hands-on projects
- Homework support
- Certification upon graduation
- Access to our online community
- Lifetime access to course content
Schedule
This boot camp is currently self-paced.
Who is this BootCamp for?
This Bootcamp is meant for Engineers with experience in Data Science or Machine Learning Engineering who want to upgrade their skills in Large Language Modeling.
Be ready to learn!
This Bootcamp is not meant to be easy! Be ready to spend time and effort in learning the subject so that the certificate means something.
I won't promise you that you will get a job after graduating (because it depends on you), but I can promise you that your understanding of LLMs will be at a completely different level!
Prerequisites
- Prior experience or knowledge of Machine Learning - at least 6 months. I expect people to feel comfortable with the concepts developed in the Machine Learning Fundamental Bootcamp.
- Proficiency in Python - at least 1 year experience.
Curriculum
- Thursday Aug 15th Meeting (202:17)
- Friday Aug 16th Meeting (206:48)
- Intro (1:33)
- The Overall Architecture (11:19)
- The Self-Attention layer (12:20)
- The Multihead Attention Layer (3:56)
- The Position Embedding (13:07)
- The Encoder (5:23)
- The Decoder (6:36)
- Implementing the Self-Attention Layer (15:24)
- Implementing the Multihead Attention Layer (13:03)
- Implementing the Position Embedding (4:16)
- Implementing the Feed-Forward Network (1:54)
- Implementing The Encoder Block (2:44)
- Implementing the Encoder (2:58)
- Implementing the Decoder Block (3:56)
- Implementing the Decoder (3:12)
- Implementing the Transformer (2:35)
- Testing the Code (8:04)
- Outro (0:47)
- Homework 1
- Homework 1 Feedback
- Thursday Aug 22nd Meeting (233:58)
- Friday Aug 23rd Meeting (244:17)
- Intro (1:00)
- The Overview (5:55)
- Causal Language Modeling Pretraining (10:58)
- Supervised Learning Fine-Tuning (7:46)
- Reinforcement Learning with Human Feedback (15:09)
- Implementing the Pretraining Step (21:54)
- Implementing the Supervised Learning Fine-Tuning Step (9:53)
- Implementing the Reinforcement Learning Fine-Tuning Step (28:25)
- Outro (0:26)
- Homework 2
- Homework 2 Feedback
- Thursday Aug 29th Meeting (259:18)
- Friday Aug 30th Meeting (213:33)
- Intro (1:22)
- CPU vs GPU vs TPU (5:49)
- The GPU Architecture (8:03)
- Distributed Training (2:38)
- Data parallelism (3:59)
- Model parallelism (7:38)
- Zero Redundancy Optimizer Strategy (10:49)
- Distributing Training with the Accelerate Package on AWS Sagemaker (38:08)
- Outro (0:27)
- Homework 3
- Homework 3 Feedback
- Thursday Sep 5th Meeting (250:04)
- Friday Sep 6th Meeting (228:21)
- Intro (1:05)
- The Different Fine-tuning tasks (4:23)
- Language Modeling (8:16)
- Sequence Prediction (5:03)
- Text Classification (4:16)
- Text Encoding (5:17)
- Multimodal Fine-tuning (2:39)
- Catastrophic forgetting (1:47)
- LoRA Adapters (11:35)
- QLoRA (19:24)
- LoRA and QLoRA with the PEFT Package (22:25)
- Outro (0:36)
- Homework 4
- Homework 4 Feedback
- Thursday Sep 12th Meeting (226:06)
- Friday Sep 13th Meeting (199:38)
- Intro (0:59)
- Before Deploying (9:13)
- The Deployment Strategies (8:59)
- Multi-LoRA (2:51)
- The Text Generation Layer (13:13)
- Streaming Applications (5:25)
- Continuous Batching (6:13)
- KV-Caching (11:18)
- Deploying with vLLM (9:58)
- Outro (0:22)
- Homework 5
- Thursday Sep 19th Meeting (215:38)
- Friday Sep 20th Meeting (200:11)
- Intro (1:39)
- What is the Application Layer (5:21)
- The RAG Application (4:19)
- Optimizing the Indexing Pipeline (6:10)
- Optimizing the Query (4:52)
- Optimizing the Retrieval (5:00)
- Optimizing the Document Selection (8:56)
- Optimizing the Context Creation (8:39)
- Building a simple RAG Application (1:28)
- Implementing the Indexing Pipeline (35:04)
- Implementing the Retrieval API (24:19)
- Homework 6
- Homework 6 Feedback
- Outro (0:32)
Meet Damien
Welcome, my name is Damien Benveniste! After a Ph.D. in theoretical Physics, I started my career in Machine Learning and Data Science more than 10 years ago.
I have been a Data Scientist, Machine Learning Engineer, and Software Engineer. I have led various Machine Learning projects in diverse industry sectors such as AdTech, Market Research, Financial Advising, Cloud Management, online retail, marketing, credit score modeling, data storage, healthcare, and energy valuation. Recently, I was a Machine Learning Tech Lead at Meta on the automation at scale of model optimization for Ads ranking.
I am now focusing on a more entrepreneurial journey where I build tech businesses and teach my expertise.