CS620: New Trends in IT (Formal Methods in Machine Learning)
Spring 2021

Instructor

Supratik Chakraborty

email: supratik [AT] cse ...

Announcements

This offering of CS620 is different from previous offerings. Please see Course Information for more details.
Grades are now available.

Course Information

Machine learning (ML) is increasingly being used in applications where formal guarantees are desirable. For example, self-driven vehicles use ML components, wrong decisions by which can lead to fatal accidents. The ACAS Xu system for collision avoidance of autonomous drones is another such example, where a neural network is required to satisfy properties rigorously specified in logic. Formal robustness guarantees of image classification networks require the class label to not change unless the image is perturbed beyond an acceptable threshold. For various historical reasons, mainstream ML research has focused on development of an extremely rich set of tools, techiniques and models that can be used to solve an array of real-world problems, without formal guarantees of how they behave in all possible corner cases. In this course, we will be primarily interested in the behaviour of ML components in these hard-to-fish-out corner case scenarios. We will study how techniques from formal methods (FM) can be used to provide rigorous guarantees of the behaviour of ML components like deep neural networks (DNN) and recurrent neural networks (RNN). Formal methods have been successfully used to provide rigorous guarantees for hardware and software systems in the past, related work leading to at least three Turing Awards. The application of FM to prove properties of modern ML components, however, presents new challenges due to the scale and size of these ML components, and also because of the specifics of their operation. Over the last decade, this has spurred a lot of interest from both the ML and FM research communities, and formal methods for machine learning is now a routine track in most ML/FM conferences. This course will primarily focus on four threads of research in this area: (i) use of abstraction refinement for analyzing DNNs, (ii) use of specialized constraint solving techniques for reasoning about DNNs, (iii) automata theoretic analysis of RNNs, and (iv) formal methods for explainability and robustness of ML components. The course should be useful for students of both formal methods and machine learning, and lies at the intersection of these areas. Since the area is relatively new, the course material will be primarily based on research papers.

Pre-requisites

UG students	CS228, CS213, CS337
PG students	Either CS433 or CS719 or CS771. CS725 is recommended but not essential

Course Logistics

This is an online offering, with pre-recorded lectures and live interactions (only for clarifying doubts).
Students are required to see all pre-recorded videos and read the required reading material in order to keep pace with this course.
Attending live interaction sessions is not mandatory. No new material will be covered in these sessions, and will be used strictly for clearing doubts after seeing pre-recorded videos and reading required reading material.
Moodle will be the primary platform for discussion and dissemination of material. Please subscribe to the IITB Moodle page for CS620.
MSTeams will be used for live interaction sessions. Please subscribe to the CS620 channel on MSTeams using your IITB LDAP login. If you need a code to join the team, send email to instructor.
In addition to pre-recorded lecture videos, recordings of interaction sessions will also be made available.
Video recordings and reading material will be posted on the course page and also on Moodle.

Lectures and Interactive Sessions

Pre-recorded lectures will be posted by Tues 5.30pm (exceptions to be announced)

Interactive doubt-clearing sessions on MSTeams CS620 channel, Mon and Thurs, 5.30-7.00 pm

Evaluation and assessment

Midsem	30%
Endsem	30%
Paper critique presentation	10%
Project	20%
Quizes (attention-recall type)	10%

All exams/quizes will be open-book.
If you miss an exam for a genuine reason, inform the instructor immediately.
There will be no homeworks.
Each graded quiz will carry equal weightage.
Informal discussion among students for the purpose of clarification is strongly encouraged.
There will be zero tolerance for dishonest means like copying solutions from others in any quiz, exam, project, paper critique presentation. Anybody found indulging in such activities will be summarily awarded an FR grade. The onus will be on the supposed offender to prove his or her innocence.
If you are unsure about the distinction between copying and discussion, please read this.

Course project

The course project carries 20% marks.
The ideal size for a project team is 2 students. If you want to do a course project alone, please contact the instructor by Apr 8, 2021
Each group must choose a tool, say T from among the topics studied in this course and do the following.
- Choose a machine-learning model, such as a deep neural network N, for a specific task. Examples are ACAS-Xu networks, MNIST networks, etc.
- Decide on a property P you'd like to verify/analyze for the network.
- Train the network using publicly available datasets for the task you'd like to verify properties of.
- Use the tool T to verify/analyze property P of ML model N
Submit a report of the exercise by Sat, May 8, 2021
Important dates
- Apr 17, 2021: Meeting (online) with instructor to discuss your project plan
- ~~May 8~~ May 14, 2021: Demo and presentation of your project

Readings for paper critique

Each student is required to read one of the papers from the list below, and present a short critique consisting of
- Problem being solved in the paper
- Positive points/strengths of the technique
- Weaknesses of the technique
- Points skipped, missed, ignored that could've been taken into account to make a better paper.
This exercise has to be done by each student individually. It is ok for multiple students to choose the same paper to critique.
Each student is required to prepare a 7-10 slide presentation of her/his critique. The set of slides is to be submitted via moodle for evaluation. Students who can make a presentation are encouraged to make a presentation to the instructor
The deadline for submission of the paper critique writeup/presentation is ~~Sat, May 8~~ Fri, May 14, 2021. If you want to schedule a presentation earlier, please let the instructor know.

List of papers (mostly on the topic of formal methods for explainable AI) for paper critique exercise. Each student must choose 1 paper.

Abduction-based explanations for machine learning models by Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva.
A SAT-based approach to learn explainable decision sets by Alexey Ignatiev, Filipe Pereira, Nina Narodytska, and João Marques-Silva.
Learning Optimal Decision Trees with SAT by Nina Narodytska, Alexey Ignatiev, Filipe Pereira, Joao Marques-Silva
Explaining AI Decisions using Efficient Methods for Learning Sparse Boolean Formulae by Susmit Jha, Tuhin Sahai, Vasumathi Raman, Alessandro Pinto, Michael Francis
Scalable Quantitative Verification for Deep Neural Networks Teodora Baluta, Zheng Leong Chua, Kuldeep S. Meel and Prateek Saxena
SAT-based Decision Tree Learning for Large Data Sets by Andre Schidler and Stefan Szeider

Reading References

Verified Deep Learning by Aws Albarghouthi (part of this nice work-in-progress is available online).
Concrete Problems in AI Safety by Dario Amodei et al.
CAMUS: A Framework to Build Formal Specifications for Deep Perception Systems using Simulators by Julien Girard-Satabin et al.
Automated Reasoning for Explainable AI -- slides of a wonderful talk by Joao Marques-Silva.
Logic-Enabled Verification and Explanation of ML Models -- a beautiful tutorial at IJCAI 2020 by Alexey Ignatiev, Joao Marques-Silva, Kuldeep Meel and Nina Narodytska.

Required Reading List

Week 2	Towards Verified Artificial Intelligence by Sanjit A. Seshia, Dorsa Sadigh and S. Sankar Sastry
Week 3	Chapter 3 ("Correctness Properties") of Verified Deep Learning by Aws Albarghouthi (work-in-progress version) Section 5 and Appendix VI of Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks by Guy Katz, Clark Barrett, David Dill, Kyle Julian and Mykel Kochenderfer. Formal Specification for Deep Neural Networks by Sanjit Seshia et al.
Week 4	Catching-up week: Required readings are those of Weeks 2 and 3
Week 5	Chapter 2 ("Neural Networks as Graphs") and Chapter 5 ("Encodings of Neural Networks") of Verified Deep Learning by Aws Albarghouthi (work-in-progress version)
Week 6	Chapter 7 ("Specialized Theory Solvers") of Verified Deep Learning by Aws Albarghouthi (work-in-progress version) Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks by Guy Katz, Clark Barrett, David Dill, Kyle Julian and Mykel Kochenderfer.
Week 7	No additional reading. Midsem revision and exam
Weeks 8-9	An Abstract Domain for Certifying Neural Networks by Gagandeep Singh, Timon Gehr, Markus Puschel and Martin Vechev.
Weeks 10-11	On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models by Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovíc, Timothy Mann and Pushmeet Kohli ART: Abstraction Refinement-Guided Training for Provably Correct Neural Networks> by Xuankang Lin, He Zhu, Roopsha Samanta and Suresh Jagannathan
Week 12	Extracting Automata from Recurrent Neural Networks using Queries and Counterexample, by G. Weiss, Y. Goldberg and E. Yahav Representing Formal Languages: A Comparison between Finite Automata and Recurrent Neural Networks by J. Michalekno, A. Shah, A. Sharma, R. G. Baraniuk, S. Chaudhuri and A. Patel
Week 13	Safe Reinforcement Learning via Shielding, by M. Alshiekh, R. Bloem, R. Ehlers, B. Konighofer, S. Niekum, U. Topcu
Week 14	No additional reading. Endsem revision.

Quizes and Exams

Exam/Quiz	Posted on	Answers due by
Quiz 1	Jan 29, 11:59am	Jan 31, 11:59am (upload PDF file on Moodle)
Quiz 2	Feb 21, 3:00pm	Feb 22, 5:00pm (upload PDF file on Moodle)
Midsem Exam	Feb 24, 6:10pm	Feb 24, 8:10pm (upload PDF file on Moodle)
Quiz 3	Apr 18, 3:30pm	Apr 18, 7:30pm (upload PDF file on Moodle)
Endsem Exam	May 1, 10:00am	May 1, 4:00pm (upload PDF file on Moodle)
Endsem Re-exam	May 13, 3:30pm	May 13, 9:30pm (upload PDF file on Moodle)

Lectures and interactions

Week	Topic	Video recordings	Slides
Week 1 (Jan 7)	Course Logistics	Interaction session: Part 1 (Lectures and timings) Part 2 (What the course is/isn't about) Part 3 (Expectations from students)	PDF slides
Week 2 (Jan 11 - 18)	What, Why and How of FM in ML	Pre-recorded lectures: Part 1 (Safety in AI/ML -- ML perspective) Part 2 (Safety in AI/ML -- FM perspective) Part 3 (Safety in AI/ML -- Delving deeper from FM perspective) Salient points from interaction session on Jan 11, 2021 included in above videos	PDF slides
Week 3 (Jan 19 - 25)	Specifying Properties of Neural Networks	Pre-recorded lectures: Part 1 (Hoare triples for specifying properties of neural networks) Part 2 (Pitfalls of specifying properties of neural neworks) Part 3 (Non-perceptual and perceptual property specifications -- issues and challenges) Part 4 (Specs for perceptual DNNs, writing multiple sub-specs and system-level specs)	PDF slides
Week 4 (Jan 26 - Feb 1)	Catching-up week (and Quiz 1)	No pre-recorded lectures	No PDF slides
Week 5 (Feb 1 - 8)	Modeling Neural Networks	Pre-recorded lectures: Part 1 (Modeling nodes and edges as first-order constraints) Part 2 (Compositional model building of entire network) Part 3 (Relational models and approximations) Part 4 (Approximate systems of constraints)	PDF slides
Week 6 (Feb 9 - 16)	Specialized decision procedures for reasoning about deep neural networks	Pre-recorded lectures: Talk on Reluplex by Guy Katz, the lead author of the Reluplex paper.	No PDF slides
Week 7 (Feb 16 - Feb 22)	Catching-up week, revision for mid-sem and Quiz 2	No pre-recorded lectures	No PDF slides
Week 8 (Mar 4 - Mar 11)	Discussion on midsem Abstract interpretation for reasoning about neural networks	Interaction session: Part 1 (Discussion on midsem question 1) Part 2 (Discussion on midsem question 2) Part 3 (Glimpse of next topic, discussion on projects) Pre-recorded lectures Part 1 (Primer on Abstract Interpretation: Abstract domains) Part 2 (Primer on Abstract Interpretation: Abstract state transitions)	PDF slides (primer on abstract interpretation)
Week 9 (Mar 12 - Mar 18)	Continuing discussion on DeepPoly: Polyhedral abstract interpretation for neural networks	Interaction session: Edited recording coming soon Pre-recorded lectures Talk on DeepPoly by Martin Vechev, co-author of the DeepPoly paper	No PDF slides
Weeks 10-11 (Mar 19 - Apr 1)	Training networks that are verifiably "correct"	Pre-recorded lectures
Week 12 (Apr 2 - Apr 9)	Recurrent neural networks and automata	Pre-recorded lectures
Week 13 (Apr 10 - Apr 15)	Safety in Reinforcement Learning	Pre-recorded lectures
Week 14 (Apr 16 - Apr 23)	Endsem revision and Quiz 3	Pre-recorded lectures

CS620: New Trends in IT (Formal Methods in Machine Learning) Spring 2021