Tutorial: Vision-Language Models: Evolution, Applications, and Challenges in Bridging the Gap Between Visual and Textual Data

Time and Venue

Venue: ICON 2023
Time: 2 PM - 5 PM IST, Sunday, December 17th, 2023

Abstract

In recent years, Vision-Language models (VL) have emerged as a groundbreaking field of research, offering new ways to understand, interpret, and generate both visual and textual data. This tutorial aims to offer a concise exploration of Vision-Language models, tracing their evolution, highlighting real-world applications, and addressing the challenges they pose in harmonizing visual and textual data. We will cover applications of Image-to-text models, Text-to-Image models, and video-to-text models. Attendees will gain insights into the past, present, and future of vision-language models, enabling them to navigate this exciting interdisciplinary field and harness its potential while understanding the crucial obstacles that need to be overcome. But it's not all smooth sailing; we will also delve into critical challenges, such as data bias and ethical considerations, to equip participants with a holistic understanding of the vision-language models.

Tutorial Prerequisite

Attendees should be familiar with the fundamentals of linear algebra and dimensionality reduction. Basic NLP and CV knowledge would be beneficial but is not required. We intend to make the tutorial self-contained. The tutorial materials such as the slides and video recordings will be made publicly available for later reference.

Instructors' Bio

Nihar Ranjan Sahoo is a PhD student in the Computer Science department of Indian Institute of Technology Bombay, supervised by Prof. Pushpak Bhattacharyya. His research interest lies in Ethical AI, social biases/toxicity in languages, fairness in ML, and explainability in NLP. He has worked as a teaching assistant for undergraduate and graduate students in AI, ML, and Deep Learning for NLP courses. He has taught a tutorial on end-to-end NLP pipeline. He has co-authored a computer vision paper published at BMVC 2021 conference and got the best student paper(Runner’s up) award. He has published paper on bias detection at conferences such as LREC, CoNLL, ACL.
Abisek R K is a Masters student in the Computer Science department of Indian Institute of Technology Bombay, supervised by Prof. Pushpak Bhattacharyya. His research interest lies in Vision and Language Understanding, and Understanding rare language phenomena like metaphors and hyperboles. He has won ``Dr. Winifred A. Fernandes Research Excellence Award'' from the CSE department of IIT Bombay for his work on Vision Language Understanding and rare language phenomena. He has published two papers on the same at ACL 2023.
Dr. Pushpak Bhattacharyya is Professor of Computer Science and Engineering at IIT Bombay. Educated in the IIT System (B.Tech IIT Kharagpur, M.Tech IIT Kanpur, PhD IIT Bombay), Dr. Bhattacharyya has done extensive research in Natural Language Processing and Machine Learning. He has published more than 350 research papers, has authored/co-authored 6 books including a textbook on machine translation, and has guided more than 350 students for their PhD, Masters and Undergraduate thesis. He has received many Research Excellence Awards- Manthan award from Ministry of IT, H.H. Mathur and P.K.Patwardhan awards from IIT Bombay, VNMM award from IIT Roorkee, and substantial research grants from Government and industry. Prof. Bhattacharyya holds the Bha- gat Singh Rekhi Chair Professorship of IIT Bombay, is a Fellow of National Academy of Engineering, Abdul Kalam National Fellow, Distinguished Alumnus of IIT Kharagpur, past Director of IIT Patna and past President of Association of Computational Linguistics.

Tutorial Slides

Slides

Github Page and Demo

Github