|
This page contains the course material for CS 794 (Systems for Machine Learning). In this course, you will learn about the infrastructure that underpins modern deep learning systems. The topics covered in this course include GPU architecture and networking, ML programming using CUDA and frameworks like PyTorch, hardware-aware performance optimizations, distributed training, and LLM inference optimizations. Pre-requisites: Students should have taken a basic course in machine learning, and be familiar with concepts related to neural network training and inference. The course will involve a significant programming component in CUDA and PyTorch, so students must be comfortable with C++ and Python programming in a Linux environment. Students are expected to use free GPU resources available on cloud platforms to solve the take-home assignments. News: We are working on a textbook for Systems of Machine Learning. Please find below the draft chapters, with accompanying slides and programming assignments. You can check back here for more updates soon. |
| Chapter# | Title | Chapter PDF | Slides | Programming Assignments |
| 1 | Introduction to SysML | |||
| 2 | Review of Deep Learning Concepts | |||
| 3 | Programming AI Hardware | link | GitHub link | |
| 4 | Hardware-aware Optimizations | |||
| 5 | Machine Learning Programming Frameworks | |||
| 6 | Distributed Training | |||
| 7 | Networking Optimizations | |||
| 8 | LLM Inference Optimizations |
|
The course material for the Spring 2026 offering of this course is archived below. |
| Lecture# | Topics | Slides | References | Programming Assignments |
| 0 | Introduction to the course | slides |
|
PA0: Kaggle setup |
| 1 | Overview of deep learning concepts | slides | PA1: KV caching for GPT model in PyTorch | |
| 2 | Hardware for AI acceleration | slides |
|
|
| 3 | Hardware-aware performance optimizations | slides | ||
| 4 | CUDA programming | slides |
|
PA2: Optimizations to Matrix Multiplication PA3: Optimizations to a simple MLP PA4: Flash Attention |
| 5 | High-level ML programming frameworks | slides |
|
|
| 6 | Distributed training | slides |
|
|
| 7 | LLM inference optimizations | slides | ||
| 8 | Networking optimizations | slides |
|
References: The following textbooks (available online) provide a good background on several topics covered in the course. I am grateful to the authors for permitting me to use content from these books in my slides for the Spring 2026 offering.
|