Soumen Chakrabarti
Computer Science and Engineering
Indian Institute of Technology Bombay
तमसो मा ज्योतिर्गमय
Cartoon by Panjwani
Google Ad
Publications   Courses   Sysadm   Book   Facad   Blog

Contact information

I am SOUMEN CHAKRABARTI, anagram for ANARCHISM OUTBREAK, a faculty member in the Department of Computer Science.

If you are from industry looking for consultation, please visit our research and development site, my informal notes, and a sample mutual NDA.

If you are looking to join CSE@IITB as a PhD scholar, please read about the standard operating procedure and the PhD Qualifier model being adopted by the department, and contact the department office directly. PhD admissions is centrally coordinated at the department level.

I do not offer short term projects or summer internships to students not enrolled at IIT Bombay. Such emails will be discarded.

If you are an IIT student looking for a project or seminar within the scope of your program (Btech, DD, Mtech) please read these guidelines first. You can check my calendar for free slots and, if you have permission, propose a meeting here or by email.

The best way to contact me is to send mail to (please note that I am on a low-spam diet). Please use only email to initiate a conversation with me if we haven't communicated before. Only in case of an emergency, you can call me at +91-22-2576-7716 or fax me at +91-22-2572-0022. If you are visiting, here are directions to my office.

Education and career

Current research interests

Representation learning for graph search
We are exploring how to go beyond shallow graph neural networks to represent nodes, edges and graphs for better link prediction and searching a corpus of graphs with a query graph with trainable notions of subgraph isomorphism.
Better embedding representation for entities, types, relations, and time
We are studying how to embed entities, types, relations and time to infer new edges in regular and temporal knowledge graphs, and their application to (temporal) question answering.
Complex multi-modal question answering
With IBM Research, I am exploring how to translate complex queries involving knowledge base access, arithmetic and logical operations into structured programs with memory.
Code-switched text analysis
Indian languages borrow heavily from English, resulting in ``code switching'' languages like Hindlish, Benglish, etc., the lingua franca of social media. We are investigating how to improve standard NLP tasks by generating synthetic code-switched text, and designing multi-task low-supervision recurrent networks.
[The World Wide Web is] the only thing I know of whose shortened form — WWW — takes three times longer to say than what it's short for.
Douglas Adams

Past projects

Searching the annotated Web with entities, types and relations
We built CSAW, a search system that integrates type and role annotations with keyword matches, thereby exploiting lexical ontologies and entity taggers within an information retrieval system.
Graph conductance search
Rich connections between random walks, graph eigensystems, and electrical networks make it attractive to apply them for ranking nodes. PageRank is a prominent example of the paradigm. In PageRank, the edge weights are fixed and we have to compute steady state probabilities of nodes. What if we have something like the opposite problem? And how to make this fast at query time? Supported by IBM and Microsoft (2007, 2008).
Integrating IR with databases
In the BANKS project, we proposed new paradigms of keyword search in graphs that can represent text embedded in relational or XML-like data.
The effect of search engines on the Web graph and page popularity
Search engines are influenced by the (in)degree of Web pages, but their ranked lists modulate page popularity and eventually their (in)degree, setting up a feedback to some degree. Might the evolution of the Web graph be influenced substantially by the existence of search engines? Is there a need to regulate monopolies? What are healthy economic objectives, and how to optimize them?
Focused crawlers to build topic-specific portals
A focused crawler collects a topic-specific subgraph of the Web by coupling classifiers and reinforcement learners with crawlers. An open-source focused crawler project was started at the Lab. for Intelligent Internet Research and is available.
Mining hypertext to estimate topics and popularity
I built a hypertext classifier that uses the text in and links around a given Web page to label it with a topic. This was an early application of Markov networks to Web analysis. As a member of the IBM Clever Project, I worked on algorithms to analyze the links around a web page and the text in pages that cite the given page to assign it a measure of popularity.
Compiling and running parallel scientific programs
In a previous life, my PhD thesis was on the design and implementation of compilers and runtime systems for distributed memory multiprocessors. Seems like distributed parallel computing is hot again, thanks to "Big Data"!
Recent papers are listed below with accompanying git repo links. Some older software can be found here.

Professional activity

Journal editorship
Conference/workshop organization
Conference/journal committee/reviewing
NeurIPS 2023 (area chair), EMNLP 2022, ARR 2020-, TACL 2020-2022, WSDM 2021 (senior PC), NeurIPS 2020, EMNLP 2020, ACL 2020, IJCAI 2020 (senior PC), AAAI 2020, EMNLP 2019, IJCAI 2019, ICML 2019, NeurIPS 2018, ICML 2018, NAACL 2018, WSDM 2018 (test of time awards), SIGIR 2017 (awards), SIGKDD 2017 (awards), WSDM 2017 (awards), NIPS 2017, ACL 2017; NIPS 2016, SIGIR 2016; CIKM 2014, ISWC 2014, SIGIR 2014, ACL 2014, WSDM 2014 (senior PC); SIGKDD 2013 (senior PC), WSDM 2013 (senior PC and awards committee); EMNLP 2012, SIGKDD 2012 (senior PC), WWW 2012; NIPS 2011, ICML 2011 (PC and invited applications talks committee), WWW 2011; SIGKDD 2010; NIPS 2009, WWW 2009, WSDM 2009 (senior PC); SIGKDD 2008 (senior PC), SIGIR 2008 (senior PC), WWW 2008; WWW 2007, SIGMOD 2007; SIGKDD 2006 (senior PC); EMNLP/HLT 2005, SIGKDD 2005, WWW 2005 (panel), SIGMOD 2005; SIGKDD 2004, SIGIR 2004, VLDB 2004, WWW 2004, ICDE 2004; SIGIR 2003, SIGKDD 2003, VLDB 2003 (IIS), SODA 2003; SIGIR 2002, ICDE 2002; SIGIR 2001, WWW 2001; WWW 2000; SIGKDD 1999; SIGKDD 1998.
  • Web Search and Data Mining (WSDM) steering committee member, 2008–2013.
  • ACM SIGKDD Curriculum Committee Member.


But the power of instruction is seldom of much efficacy, except in those happy dispositions where it is almost superfluous.
Edward Gibbon,
The Decline And Fall Of The Roman Empire
Volume 1, Chapter 4
... your work is to keep cranking the flywheel that turns the gears
that spin the belt in the engine of belief that keeps you and your desk in midair
Annie Dillard,
The Writing Life

Publication Google Scholar, DBLP, arXiv, ResearchGate, SemanticScholar, ?

Upcoming and past talks and meetings


Links in areas of interest

Content with URLs that have the current URL as a prefix has been hosted in accordance with fair use principles, for academic and non-profit purposes. By downloading the contents of this page, you agree to bring possible violation of fair use to my notice before taking legal recourse.