IITB Machine Learning Reading Group

IITB AI/ML Reading Group

The goal of this reading group is to learn about new developments and foundational concepts in artificial intelligence (AI) and machine learning (ML).

Time and place: TBA
Compulsory: Fill this form before attending the reading session. It should not take more than 5 minutes if you have skimmed thourgh the paper.

If you would like to discuss a paper with the group, please fill this form https://forms.gle/TAtNFnwAFELPJ9N46.

Updates: If you would like to be notified of upcoming talks, please subscribe to the IITB AI/ML mailing list and/or calendar below.

Upcoming talks

Feb 17, 2022. Presenter: Indradyumna Roy
Backurs et.al. Scalable Nearest Neighbor Search for Optimal Transport, ICML '20
Indyk et.al. Fast Image Retrieval via Embeddings

Past talks

2019

July 19, 2019. Presenter: Rahul Mitra
Title: Learning Correspondence from the Cycle-consistency of Time [slides]

We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation – without finetuning – across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.
Additional Paper: Temporal Cycle-Consistency Learning (if time permits)
July 12th, 2019. Presenter: Vishal Kaushal
Title: Discussion on two papers from CVPR 2019 [slides]
I will be presenting the following two papers (from the recently concluded CVPR 2019) tomorrow. Looking forward to an engaging discussion.
1. AutoAugment: Learning Augmentation Strategies From Data
2. Rethinking the Evaluation of Video Summaries
July 5th, 2019. Presenter: Prathamesh Deshpande

[slides]

This week I am going to present our work on Adaptive Recurrent Units (ARU). ARU is a parameter-free local model that can be used to locally adapt a global RNN-based model. Unlike existing methods of adaptation that are either memory-intensive or non-responsive after training, ARUs require only fixed-sized state and adapt to streaming data via an easy RNN-like update operation. The core principle driving ARU is simple — maintain sufficient statistics of conditional Gaussian distributions and use them to compute local parameters in closed form. The contribution of the paper is in embedding such local linear models in globally trained deep models while allowing end-to-end training on the one hand, and easy RNN-like updates on the other. Across several datasets, ARU is more effective than recently proposed local adaptation methods that tax the global network to compute local parameters.

June 28th, 2019. Presenter: Prof. Anshumali Shrivastava (Rice University)
Title: Hashing Meets Statistical Estimation and Inference: Adaptive Sampling at the Cost of Random Sampling [slides]

Sampling is one of the fundamental hammers in machine learning (ML) for reducing the size and scale of the problem at hand. Many ML applications demand adaptive sampling for faster convergence. However, the cost of sampling itself is prohibitive, which creates a fundamental barrier. In this talk, I will demonstrate how hashing algorithms naturally breaks this barrier leading to efficient machine learning algorithms.

I will discuss some of my recent and surprising findings on the use of hashing algorithms for large-scale estimations. Locality Sensitive Hashing (LSH) is a hugely popular algorithm for sub-linear near neighbor search. However, it turns out that fundamentally LSH is a constant time (amortized) adaptive sampler from which efficient near-neighbor search is one of the many possibilities. LSH offers a unique capability to do smart sampling and statistical estimations at the cost of few hash lookups. Our observation bridges data structures (probabilistic hash tables) with efficient unbiased statistical estimations. I will demonstrate how this dynamic and efficient sampling beak the computational barriers in adaptive estimations, where it is possible that we pay roughly the cost of uniform sampling but get the benefits of adaptive sampling. I will demonstrate the power of a straightforward idea for a variety of problems 1) Adaptive Gradient Estimations for efficient SGD, 2) Efficient Deep Learning, 3) Anomaly Detection, and 4) The first possibility of sub-linear sketches for near-neighbor queries.
June 21, 2019. Presenter: Rishabh Dabral
Human pose estimation under noisy supervision [slides]
I'll be discussing the following two papers in this week's discussion. Both the papers come from the 3D human pose/shape estimation literature. Recently, the literature has undergone a surge in papers that attempt to solve human pose/shape with either weak supervision or no supervision at all. Both the papers are in the same line of work. I'll first introduce the broad problem statement and then discuss briefly some of the pre-requisites before placing the two papers up for detailed discussion.
1. Human Mesh Recovery, Kanazawa et al
2. Neural Scene Decomposition, Rhodin et al
June 14, 2019. Presenter: Abhijeet Awasthi
Robustness of Machine Learning Models [slides]
This week I will be discussing the following two papers which are related to the robustness of machine learning models.
1. Using Pre-Training Can Improve Model Robustness and Uncertainty [ICML 2019]
  It has been observed that training the model on task-specific data for a longer time can yield similar performance in comparison to pretraining followed by fine-tuning. Apart from faster convergence, what else does pre-training offer in the cases where it does not yield improvement on traditional accuracy metrics? This paper conducts several experiments to show that pre-training leads to robustness to label corruption, class imbalance and adversarial perturbations. They also show that pre-trained models are better calibrated and do better in the task of out-of-distribution detection.
2. Investigating Robustness and Interpretability of Link Prediction via Adversarial Modifications [NAACL 2019]
  How would link prediction in a Knowledge graph get affected by adding some fake links or removing some existing links? Since knowledge graphs are always noisy and incomplete, a desirable property of link prediction is to be robust to such changes. This paper proposes an effective way to estimate the change in link prediction score wrt perturbations to KG. They also propose an efficient method to search a very large space for fake links which could potentially deteriorate the link prediction performance. Identifying such links allows incorporating them as negative instances during training, thus making the link prediction model more robust to KG perturbations.
June 7, 2019. Presenter: Vihari Piratla
Representation learning. [slides].
1. "Learning Independent Causal Mechanisms", ICML 2018 -- http://proceedings.mlr.press/v80/parascandolo18a.html
2. "ORDERED NEURONS: INTEGRATING TREE STRUCTURES INTO RECURRENT NEURAL NETWORKS", ICLR 2019 -- https://openreview.net/pdf?id=B1l6qiR5F7
There has been a lot of interest in learning representations that disentangle all factors of variation in the input referred to as disentangled representations. We will look at what these are, their advantages and how they are related to causal models. We will discuss in detail a 2018 ICML paper titled: "Learning Independent Causal Mechanisms" which attempts to identify and learn independent causal processes on MNIST dataset. This modular design of causal models holds the promise of easier generalization and was demonstrated to generalize to unseen Omniglot examples. Finally, we will conclude with a short survey on techniques for disentangled representations.

The quest to build better language models (predict what you are going to say before you say it) has been an active interest for a long time now. In the recently concluded ICLR 2019 conferenece, one of the best papers went to a method titled "ORDERED NEURONS: INTEGRATING TREE STRUCTURES INTO RECURRENT NEURAL NETWORKS" that improves Neural Language Models (NLM). They employ a clean trick to identify the latent syntactic tree structure of the sentence and thereby extending the linear chain LSTM to pay attention to the hierarchical structure of the tree. Apart from improving the language model perplexity scores on standard datasets, they also demonstrate state of the art results on unsupervised inference of syntactic trees. We will conclude with a discussion on scope for further improvement in NLMs.
May 31, 2019. Screening of a pre-recorded talk. Presenter: Swabha Swayamdipta
Title: Learning Challenges in Natural Language Processing
As the availability of data for language learning grows, the role of linguistic structure is under scrutiny. At the same time, it is imperative to closely inspect patterns in data which might present loopholes for models to obtain high performance on benchmarks. In a two-part talk, I will address each of these challenges.
First, I will introduce the paradigm of scaffolded learning. Scaffolds enable us to leverage inductive biases from one structural source for prediction of a different, but related structure, using only as much supervision as is necessary. We show that the resulting representations achieve improved performance across a range of tasks, indicating that linguistic structure remains beneficial even with powerful deep learning architectures.
In the second part of the talk, I will showcase some of the properties exhibited by NLP models in large data regimes. Even as these models report excellent performance, sometimes claimed to beat humans, a closer look reveals that predictions are not a result of complex reasoning, and the task is not being completed in a generalizable way. Instead, this success can be largely attributed to exploitation of some artifacts of annotation in the datasets. I will discuss some questions our finding raises, as well as directions for future work.
May 24, 2019.
Screening of two pre-recorded talks. Theme: Incorporating expert knowledge in to learning systems to solve for learning under low resource setting such as in Medical Imaging or Robotics.
- Presenter: Max Welling Title: Making the case for using more inductive bias in deep learning Making the case for using more inductive bias in deep learning.
- Presenter: Anima Anandkumar. Title: Infusing Structure Into Machine Learning Standard deep learning algorithms are based on a function-fitting approach that do not exploit any domain knowledge or constraints. This has several shortcomings: high sample complexity, and lack of robustness and generalization, especially under domain or task shifts. I will show several ways to infuse structure and domain knowledge to overcome these limitations, viz., tensors, graphs, symbolic rules, physical laws, and simulations.
May 17, 2019.
Screening of talks from ICLR 2019 workshop: "Deep Generative Models for Highly Structured Data".
- Presenter: Yoshua Bengio. Title: Meta-transfer learning for factorizing representations and knowledge for AI
  Whereas machine learning theory has focused on generalization to examples from the same distribution as the training data, better understanding of the transfer scenarios where the observed distribution changes often in the lifetime of the learning agent is important, both for robust deployment and to achieve a more powerful form of generalization which humans seem able to enjoy and which seem necessary for learning agents. Whereas most machine learning algorithms and architectures can be traced back to assumptions about the training distributions, we also need to explore assumptions about how the observed distribution changes. We propose that sparsity of change in distribution, when knowledge is represented appropriately, is a good assumption for this purpose, and we claim that if that assumption is verified and knowledge represented appropriately, it leads to fast adaptation to changes in distribution, and thus that the speed of adaptation to changes in distribution can be used as a meta-objective which can drive the discovery of knowledge representation compatible with that assumption. We illustrate these ideas in causal discovery: is some variable a direct cause of another? and how to map raw data to a representation space where different dimensions correspond to causal variables for which a clear causal relationship exists? What generative model of the data can be quickly adapted to interventions in the agent's environment? We propose a large research program in which this non-stationarity assumption and meta-transfer objective is combined with other closely related assumptions about the world embodied in a world model, such as the consciousness prior (the causal graph is captured by a sparse factor graph) and the assumption that the causal variables are often those agents can act upon (the independently controllable factors prior), both of which should be useful for agents which plan, imagine and try to find explanations for what they observe.
- Presenter: Yulia Tsvetkov. Title: Continuous-Output Language Generation
  The softmax layer is used as the output layer of nearly all existing models for language generation. However, softmax is the computational bottleneck of these systems: it is the slowest layer to compute, and it has a huge memory footprint; to reduce time- and memory-complexity, many language generation systems limit their output vocabulary to a few tens of thousands of most frequent words, sacrificing the linguistic diversity and completeness of outputs. Finally, generating language using generative adversarial networks (GANs) is a notoriously hard task specifically due to the softmax layer. In this talk I'll introduce continuous-output generation—a general modification to the seq2seq models for generating text sequences which does away with the softmax layer, replacing it by the embedding layer. I will also describe ongoing work which explores alternative losses for continuous-output generation, approaches to efficient decoding, and continuous-output GANs for text.

Autumn 2017

November 1, 2017. Presenter: Shiv Shankar. Paper:
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks, Danfei Xu, Suraj Nair, Yuke Zhu, Julian Gao, Animesh Garg, Li Fei-Fei, Silvio Savarese, 2017.
October 25, 2017. Presenter: Manoj Gopalkrishnan.
Talk Title: "Algorithmic Biology: Are biochemical reactions networks implementing machine learning algorithms?"
August 30, 2017. Presenter: Siddhartha Chaudhuri. Papers:
1. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, CVPR 2017.
2. OctNet: Learning Deep 3D Representations at High Resolutions, Gernot Riegler, Ali Osman Ulusoy, Andreas Geiger, ICML 2017.
3. O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis, Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, Xin Tong, SIGGRAPH 2017.
August 9, 2017. Presenter: Sunita Sarawagi. Papers:
1. Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs, M. Gygli, M. Norouzi, A. Angelova, ICML 2017.
2. End-to-End Learning for Structured Prediction Energy Networks, D. Belanger, B. Yang, A. McCallum, ICML 2017.
July 26, 2017. Presenter: Sarath Chandar, Ph.D. student at University of Montreal.
Talk Title: "Memory Augmented Neural Networks". Presented a version of his tutorial that appeared at EMNLP 2017.

Spring 2017

Apr 12, 2017. Presenter: Shivaram Kalyanakrishnan. Paper: "Label-Free Supervision of Neural Networks with Physics and Domain Knowledge", Russell Stewart and Stefano Ermon, AAAI 2017.
Mar 29, 2017. Presenter: Siddhartha Chaudhuri. Papers:
1. Multi-view Convolutional Neural Networks for 3D Shape Recognition,Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller, ICCV 2015.
2. 3D Shape Segmentation with Projective Convolutional Networks, Evangelos Kalogerakis, Melinos Averkiou, Subhransu Maji and Siddhartha Chaudhuri, CVPR 2017.
Mar 15, 2017. Presenter: Vihari Piratla. Slides are here. Papers:
1. On large-batch training for deep learning: Generalization gap and sharp minima", N. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, P. Tang, ICLR 2017.
2. Understanding deep learning requires re-thinking generalization, C. Zhang, S. Bengio, M. Hardt, B. Recht, O. Vinyals, ICLR 2017.
Mar 1, 2017. Presenter: Saketh Nath. (Part II) Paper: "Kernel Embeddings of Conditional Distributions", Le Song, Kenji Fukumizu, Arthur Gretton, IEEE Signal Processing Magazine, 30(4), 2013.
Feb 8, 2017. Presenter: Saketh Nath. (Part I) Paper: "Kernel Embeddings of Conditional Distributions", Le Song, Kenji Fukumizu, Arthur Gretton, IEEE Signal Processing Magazine, 30(4), 2013.
Jan 11, 2017. Presenter: Vishal Kaushal. Paper: "You Only Look Once: Unified, Real-Time Object Detection", Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, CVPR 2016.

Page maintained by: Pavan Kalyan