CS620: New Trends in IT (Formal Methods in Machine Learning)
Spring 2021


Instructor Supratik Chakraborty email: supratik [AT] cse ...

Announcements

  • This offering of CS620 is different from previous offerings. Please see Course Information for more details.
  • Grades are now available.

Course Information

Machine learning (ML) is increasingly being used in applications where formal guarantees are desirable. For example, self-driven vehicles use ML components, wrong decisions by which can lead to fatal accidents. The ACAS Xu system for collision avoidance of autonomous drones is another such example, where a neural network is required to satisfy properties rigorously specified in logic. Formal robustness guarantees of image classification networks require the class label to not change unless the image is perturbed beyond an acceptable threshold. For various historical reasons, mainstream ML research has focused on development of an extremely rich set of tools, techiniques and models that can be used to solve an array of real-world problems, without formal guarantees of how they behave in all possible corner cases. In this course, we will be primarily interested in the behaviour of ML components in these hard-to-fish-out corner case scenarios. We will study how techniques from formal methods (FM) can be used to provide rigorous guarantees of the behaviour of ML components like deep neural networks (DNN) and recurrent neural networks (RNN). Formal methods have been successfully used to provide rigorous guarantees for hardware and software systems in the past, related work leading to at least three Turing Awards. The application of FM to prove properties of modern ML components, however, presents new challenges due to the scale and size of these ML components, and also because of the specifics of their operation. Over the last decade, this has spurred a lot of interest from both the ML and FM research communities, and formal methods for machine learning is now a routine track in most ML/FM conferences. This course will primarily focus on four threads of research in this area: (i) use of abstraction refinement for analyzing DNNs, (ii) use of specialized constraint solving techniques for reasoning about DNNs, (iii) automata theoretic analysis of RNNs, and (iv) formal methods for explainability and robustness of ML components. The course should be useful for students of both formal methods and machine learning, and lies at the intersection of these areas. Since the area is relatively new, the course material will be primarily based on research papers.

Pre-requisites

UG students CS228, CS213, CS337
PG students Either CS433 or CS719 or CS771. CS725 is recommended but not essential

Course Logistics

  • This is an online offering, with pre-recorded lectures and live interactions (only for clarifying doubts).
  • Students are required to see all pre-recorded videos and read the required reading material in order to keep pace with this course.
  • Attending live interaction sessions is not mandatory. No new material will be covered in these sessions, and will be used strictly for clearing doubts after seeing pre-recorded videos and reading required reading material.
  • Moodle will be the primary platform for discussion and dissemination of material. Please subscribe to the IITB Moodle page for CS620.
  • MSTeams will be used for live interaction sessions. Please subscribe to the CS620 channel on MSTeams using your IITB LDAP login. If you need a code to join the team, send email to instructor.
  • In addition to pre-recorded lecture videos, recordings of interaction sessions will also be made available.
  • Video recordings and reading material will be posted on the course page and also on Moodle.

Lectures and Interactive Sessions

Pre-recorded lectures will be posted by Tues 5.30pm (exceptions to be announced)
Interactive doubt-clearing sessions on MSTeams CS620 channel, Mon and Thurs, 5.30-7.00 pm

Evaluation and assessment

Midsem 30%
Endsem 30%
Paper critique presentation 10%
Project 20%
Quizes (attention-recall type) 10%

  • All exams/quizes will be open-book.
  • If you miss an exam for a genuine reason, inform the instructor immediately.
  • There will be no homeworks.
  • Each graded quiz will carry equal weightage.
  • Informal discussion among students for the purpose of clarification is strongly encouraged.
  • There will be zero tolerance for dishonest means like copying solutions from others in any quiz, exam, project, paper critique presentation. Anybody found indulging in such activities will be summarily awarded an FR grade. The onus will be on the supposed offender to prove his or her innocence.
  • If you are unsure about the distinction between copying and discussion, please read this.

Course project

  • The course project carries 20% marks.
  • The ideal size for a project team is 2 students. If you want to do a course project alone, please contact the instructor by Apr 8, 2021
  • Each group must choose a tool, say T from among the topics studied in this course and do the following.
    • Choose a machine-learning model, such as a deep neural network N, for a specific task. Examples are ACAS-Xu networks, MNIST networks, etc.
    • Decide on a property P you'd like to verify/analyze for the network.
    • Train the network using publicly available datasets for the task you'd like to verify properties of.
    • Use the tool T to verify/analyze property P of ML model N
  • Submit a report of the exercise by Sat, May 8, 2021
  • Important dates
    • Apr 17, 2021: Meeting (online) with instructor to discuss your project plan
    • May 8 May 14, 2021: Demo and presentation of your project

Readings for paper critique

  • Each student is required to read one of the papers from the list below, and present a short critique consisting of
    • Problem being solved in the paper
    • Positive points/strengths of the technique
    • Weaknesses of the technique
    • Points skipped, missed, ignored that could've been taken into account to make a better paper.
  • This exercise has to be done by each student individually. It is ok for multiple students to choose the same paper to critique.
  • Each student is required to prepare a 7-10 slide presentation of her/his critique. The set of slides is to be submitted via moodle for evaluation. Students who can make a presentation are encouraged to make a presentation to the instructor
  • The deadline for submission of the paper critique writeup/presentation is Sat, May 8 Fri, May 14, 2021. If you want to schedule a presentation earlier, please let the instructor know.
List of papers (mostly on the topic of formal methods for explainable AI) for paper critique exercise. Each student must choose 1 paper.
  • Abduction-based explanations for machine learning models by Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva.
  • A SAT-based approach to learn explainable decision sets by Alexey Ignatiev, Filipe Pereira, Nina Narodytska, and João Marques-Silva.
  • Learning Optimal Decision Trees with SAT by Nina Narodytska, Alexey Ignatiev, Filipe Pereira, Joao Marques-Silva
  • Explaining AI Decisions using Efficient Methods for Learning Sparse Boolean Formulae by Susmit Jha, Tuhin Sahai, Vasumathi Raman, Alessandro Pinto, Michael Francis
  • Scalable Quantitative Verification for Deep Neural Networks Teodora Baluta, Zheng Leong Chua, Kuldeep S. Meel and Prateek Saxena
  • SAT-based Decision Tree Learning for Large Data Sets by Andre Schidler and Stefan Szeider

Reading References

Required Reading List

Week 2
Week 3
Week 4
  • Catching-up week: Required readings are those of Weeks 2 and 3
Week 5
  • Chapter 2 ("Neural Networks as Graphs") and Chapter 5 ("Encodings of Neural Networks") of Verified Deep Learning by Aws Albarghouthi (work-in-progress version)
Week 6
Week 7
  • No additional reading. Midsem revision and exam
Weeks 8-9
Weeks 10-11
Week 12
Week 13
Week 14
  • No additional reading. Endsem revision.

Quizes and Exams

Exam/Quiz Posted on Answers due by
Quiz 1 Jan 29, 11:59am Jan 31, 11:59am (upload PDF file on Moodle)
Quiz 2 Feb 21, 3:00pm Feb 22, 5:00pm (upload PDF file on Moodle)
Midsem Exam Feb 24, 6:10pm Feb 24, 8:10pm (upload PDF file on Moodle)
Quiz 3 Apr 18, 3:30pm Apr 18, 7:30pm (upload PDF file on Moodle)
Endsem Exam May 1, 10:00am May 1, 4:00pm (upload PDF file on Moodle)
Endsem Re-exam May 13, 3:30pm May 13, 9:30pm (upload PDF file on Moodle)

Lectures and interactions

Week Topic Video recordingsSlides
Week 1
(Jan 7)
Course Logistics Interaction session:
Part 1 (Lectures and timings)
Part 2 (What the course is/isn't about)
Part 3 (Expectations from students)
PDF slides
Week 2
(Jan 11 - 18)
What, Why and How of FM in ML Pre-recorded lectures:
Part 1 (Safety in AI/ML -- ML perspective)
Part 2 (Safety in AI/ML -- FM perspective)
Part 3 (Safety in AI/ML -- Delving deeper from FM perspective)
Salient points from interaction session on Jan 11, 2021 included in above videos
PDF slides
Week 3
(Jan 19 - 25)
Specifying Properties of Neural Networks Pre-recorded lectures:
Part 1 (Hoare triples for specifying properties of neural networks)
Part 2 (Pitfalls of specifying properties of neural neworks)
Part 3 (Non-perceptual and perceptual property specifications -- issues and challenges)
Part 4 (Specs for perceptual DNNs, writing multiple sub-specs and system-level specs)
PDF slides
Week 4
(Jan 26 - Feb 1)
Catching-up week (and Quiz 1) No pre-recorded lectures No PDF slides
Week 5
(Feb 1 - 8)
Modeling Neural Networks Pre-recorded lectures:
Part 1 (Modeling nodes and edges as first-order constraints)
Part 2 (Compositional model building of entire network)
Part 3 (Relational models and approximations)
Part 4 (Approximate systems of constraints)
PDF slides
Week 6
(Feb 9 - 16)
Specialized decision procedures for reasoning about deep neural networks Pre-recorded lectures:
Talk on Reluplex by Guy Katz, the lead author of the Reluplex paper.
No PDF slides
Week 7
(Feb 16 - Feb 22)
Catching-up week, revision for mid-sem and Quiz 2 No pre-recorded lectures No PDF slides
Week 8
(Mar 4 - Mar 11)
Discussion on midsem
Abstract interpretation for reasoning about neural networks
Interaction session:
Part 1 (Discussion on midsem question 1)
Part 2 (Discussion on midsem question 2)
Part 3 (Glimpse of next topic, discussion on projects)
Pre-recorded lectures
Part 1 (Primer on Abstract Interpretation: Abstract domains)
Part 2 (Primer on Abstract Interpretation: Abstract state transitions)
PDF slides (primer on abstract interpretation)
Week 9
(Mar 12 - Mar 18)
Continuing discussion on DeepPoly: Polyhedral abstract interpretation for neural networks Interaction session:
Edited recording coming soon
Pre-recorded lectures
Talk on DeepPoly by Martin Vechev, co-author of the DeepPoly paper
No PDF slides
Weeks 10-11
(Mar 19 - Apr 1)
Training networks that are verifiably "correct" Pre-recorded lectures
Week 12
(Apr 2 - Apr 9)
Recurrent neural networks and automata Pre-recorded lectures
Week 13
(Apr 10 - Apr 15)
Safety in Reinforcement Learning Pre-recorded lectures
Week 14
(Apr 16 - Apr 23)
Endsem revision and Quiz 3 Pre-recorded lectures