Systems for Machine Learning

Mythili Vutukuru
Department of Computer Science and Engineering, IIT Bombay


This page contains the course materials for CS 794: Systems for Machine Learning. This is a PG elective open to both CSE and non-CSE UG and PG students. The course will cover the "systems" aspects of machine learning. By the end of the course, students will be able to understand and appreciate the various advances at the infrastructure layer that have contributed to the development of modern deep learning architectures and the latest AI/ML boom.

Pre-requisites: Students should have taken a basic course in machine learning, and be familiar with concepts related to neural network training and inference. The course will involve a significant programming component (in the form of take-home assignments and proctored lab exams) in CUDA and PyTorch, so students must be comfortable with C++ and Python programming in a Linux environment. Students are expected to use free GPU resources available on cloud platforms to solve the take-home assignments.

Grading: The evaluation will consist of the following components.

  • Proctored lab exams based on the take-home (ungraded) programming assignments. Students are expected to write code on their own without any external help during the lab exams.
  • In-class mini-quizzes, roughly one per week.
  • Mid-semester and end-semester exams.

References: A lot of the content of this course will be based on research papers and other online resources, and pointers will be provided in the slides. In addition, the following textbooks (available online) provide a good background on several topics covered in the course. I am grateful to the authors for permitting me to use content from these books in my slides.




Lecture# Topics Slides References Programming Assignments
0 Introduction to the course slides PA0: Kaggle setup
1 Overview of deep learning concepts slides PA1: KV caching for GPT model in PyTorch
2 Hardware for AI acceleration slides
3 Hardware-aware performance optimizations slides
4 CUDA programming slides CUDA programming exercises

PA2: Optimizations to Matrix Multiplication

PA3: Optimizations to a simple MLP

PA4: FlashAttention
5 High-level ML programming frameworks
6 Distributed training
7 LLM inference optimizations
8 Advanced topics: TBD