Soumen Chakrabarti
Associate Professor
Computer Science and Engineering
Indian Institute of Technology Bombay
Cartoon by Panjwani Google Ad
Facad   Publications   Sysadm   Downloads   Book   Blog

Contact information

I am SOUMEN CHAKRABARTI, anagram for ANARCHISM OUTBREAK, a faculty member in the Department of Computer Science.

If you are from industry looking for consultation, please read the section titled Consultative practice rules and norms (1996) herein, and my informal notes.

If you are looking to join CSE@IITB as a PhD scholar, please read about the PhD Qualifier model being adopted by the department, and contact the department office directly. PhD admissions is centrally coordinated at the department level.

I do not offer short term projects or summer internships to students not enrolled at IIT Bombay. Such emails will be discarded.

If you are an IIT student looking for a project or seminar within the scope of your program (Btech, DD, Mtech) please read these guidelines first. You can check my calendar for free slots and, if you have permission, propose a meeting here or by email.

The best way to contact me is to send mail to (please note that I am on a low-spam diet). Please use only email to initiate a conversation with me if we haven't communicated before. Only in case of an emergency, you can call me at +91-22-2576-7716 or fax me at +91-22-2572-0022. If you are visiting, here are directions to my office.

Education and career

Research interests

Searching the annotated Web with entities, types and relations
We are building CSAW, a new search system that integrates type and role annotations with keyword matches, thereby exploiting lexical ontologies and entity taggers. Supported by Yahoo!, HP Labs, Google, Microsoft, SAP and NetApp.
Graph conductance search
Rich connections between random walks, graph eigensystems, and electrical networks make it attractive to apply them for ranking nodes. PageRank is a prominent example of the paradigm. In PageRank, the edge weights are fixed and we have to compute steady state probabilities of nodes. What if we have something like the opposite problem? And how to make this fast at query time? Supported by IBM and Microsoft (2007, 2008).
Integrating IR with databases
In the BANKS project, we proposed new paradigms of keyword search in graphs that can represent text embedded in relational or XML-like data.
The effect of search engines on the Web graph and page popularity
Search engines are influenced by the (in)degree of Web pages, but their ranked lists modulate page popularity and eventually their (in)degree, setting up a feedback to some degree. Might the evolution of the Web graph be influenced substantially by the existence of search engines? Is there a need to regulate monopolies? What are healthy economic objectives, and how to optimize them?
Focused crawlers to build topic-specific portals
A focused crawler collects a topic-specific subgraph of the Web by coupling classifiers and reinforcement learners with crawlers. An open-source focused crawler project was started at the Lab. for Intelligent Internet Research and is available.
Mining hypertext to estimate topics and popularity
I built a hypertext classifier that uses the text in and links around a given Web page to label it with a topic. This was an early application of Markov networks to Web analysis. As a member of the IBM Clever Project, I worked on algorithms to analyze the links around a web page and the text in pages that cite the given page to assign it a measure of popularity.
Compiling and running parallel scientific programs
In a previous life, my PhD thesis was on the design and implementation of compilers and runtime systems for distributed memory multiprocessors. Seems like distributed parallel computing is hot again, thanks to "Big Data"!

Professional activity

Journal editorship
Conference organization
Conference committee/reviewing
ISWC 2014, SIGIR 2014, ACL 2014, WSDM 2014 (senior PC), SIGKDD 2013 (senior PC), WSDM 2013 (senior PC and awards committee), EMNLP 2012, SIGKDD 2012 (senior PC), WWW 2012, NIPS 2011, ICML 2011 (PC and invited applications talks committee), WWW 2011, SIGKDD 2010, NIPS 2009, WWW 2009, WSDM 2009 (senior PC), SIGKDD 2008 (senior PC), SIGIR 2008 (senior PC), WWW 2008, WWW 2007, SIGMOD 2007, SIGKDD 2006 (senior PC), EMNLP/HLT 2005, SIGKDD 2005, WWW 2005 (panel), SIGMOD 2005, SIGKDD 2004, SIGIR 2004, VLDB 2004, WWW 2004, ICDE 2004, SIGIR 2003, SIGKDD 2003, VLDB 2003 (IIS), SODA 2003, SIGIR 2002, ICDE 2002, SIGIR 2001, WWW 2001, WWW 2000, SIGKDD 1999, AAAI SIGKDD 1998.
Other
  • Web Search and Data Mining (WSDM) steering committee member, 2008--2013.
  • ACM SIGKDD Curriculum Committee Member.

Courses

... your work is to keep cranking the flywheel that turns the gears
that spin the belt in the engine of belief that keeps you and your desk in midair
---Annie Dillard, in The Writing life.

Representative publication DBLP, Google Scholar ?

Upcoming and recent talks and travel

Patents

Links in areas of interest

VISITS
Content with URLs that have the current URL as a prefix has been hosted in accordance with fair use principles, for academic and non-profit purposes. By downloading the contents of this page, you agree to bring possible violation of fair use to my notice before taking legal recourse.