Tutorial: Vision-Language Models: Evolution, Applications, and Challenges in Bridging the Gap Between Visual and Textual Data
Time and Venue
-
Venue: ICON 2023
-
Time: 2 PM - 5 PM IST, Sunday, December 17th, 2023
Abstract
In recent years, Vision-Language models (VL) have emerged as a groundbreaking field of research, offering new ways to understand, interpret, and generate both visual and textual data. This tutorial aims to offer a concise exploration of Vision-Language models, tracing their evolution, highlighting real-world applications, and addressing the challenges they pose in harmonizing visual and textual data. We will cover applications of Image-to-text models, Text-to-Image models, and video-to-text models. Attendees will gain insights into the past, present, and future of vision-language models, enabling them to navigate this exciting interdisciplinary field and harness its potential while understanding the crucial obstacles that need to be overcome. But it's not all smooth sailing; we will also delve into critical challenges, such as data bias and ethical considerations, to equip participants with a holistic understanding of the vision-language models.
Tutorial Prerequisite
Attendees should be familiar with the fundamentals of linear algebra and dimensionality reduction. Basic NLP and CV knowledge would be beneficial but is not required. We intend to make the tutorial self-contained. The tutorial materials such as the slides and video recordings will be made publicly available for later reference.
Instructors' Bio
-
Nihar Ranjan Sahoo is a PhD student in the Computer Science department of Indian Institute
of Technology Bombay, supervised by Prof.
Pushpak Bhattacharyya. His research interest lies in Ethical AI, social biases/toxicity
in languages, fairness in ML, and explainability in NLP. He has worked as a teaching assistant for undergraduate and graduate
students in AI, ML, and Deep Learning for
NLP courses. He has taught a tutorial on end-to-end NLP pipeline. He has co-authored a
computer vision paper published at BMVC
2021 conference and got the best student
paper(Runner’s up) award. He has published
paper on bias detection at conferences such as
LREC, CoNLL, ACL.
-
Abisek R K is a Masters student in the Computer Science department of Indian Institute
of Technology Bombay, supervised by Prof. Pushpak Bhattacharyya. His research interest lies in Vision and Language Understanding, and Understanding rare language phenomena like metaphors and hyperboles. He has won ``Dr. Winifred A. Fernandes Research Excellence Award'' from the CSE department of IIT Bombay for his work on Vision Language Understanding and rare language phenomena. He has published two papers on the same at ACL 2023.
-
Dr. Pushpak Bhattacharyya is Professor of Computer Science and Engineering at IIT Bombay. Educated in the IIT System (B.Tech IIT Kharagpur, M.Tech IIT Kanpur, PhD IIT
Bombay), Dr. Bhattacharyya has done extensive research in Natural Language Processing and Machine Learning. He has published more than 350 research papers, has
authored/co-authored 6 books including a textbook on machine translation, and has guided
more than 350 students for their PhD, Masters
and Undergraduate thesis. He has received
many Research Excellence Awards- Manthan
award from Ministry of IT, H.H. Mathur and
P.K.Patwardhan awards from IIT Bombay,
VNMM award from IIT Roorkee, and substantial research grants from Government and
industry. Prof. Bhattacharyya holds the Bha-
gat Singh Rekhi Chair Professorship of IIT
Bombay, is a Fellow of National Academy
of Engineering, Abdul Kalam National Fellow, Distinguished Alumnus of IIT Kharagpur,
past Director of IIT Patna and past President
of Association of Computational Linguistics.