Contact information
I am SOUMEN CHAKRABARTI, anagram for ANARCHISM
OUTBREAK, a faculty member in the Department of Computer Science.
If you are from industry looking for consultation,
please visit
our research and
development site, my informal
notes, and a sample
mutual NDA.
If you are looking to join
CSE@IITB as a
PhD scholar, please
read about the
standard operating
procedure and the
PhD
Qualifier model being adopted by the department,
and contact the department office directly.
PhD admissions is centrally coordinated at the department
level.
I do not offer short term projects
or summer internships to students not enrolled at IIT
Bombay. Such emails will be discarded.
If you are an IIT student looking for a
project or seminar within the scope of
your program (Btech, DD, Mtech) please read
these guidelines first.
You can check my calendar for
free slots and, if you have permission,
propose a meeting here
or by email.
The best way to contact me is to send mail to
(please note that I am on a
low-spam diet). Please use
only email to initiate a conversation with me if we
haven't communicated before. Only in case of an emergency, you can
call me at +91-22-2576-7716 or fax me at +91-22-2572-0022. If you are
visiting, here are directions to my
office.
Education and career
-
Don Bosco School,
Park Circus, Calcutta, 1975–1987 (memoirs).
-
Indian Institute of Technology,
Kharagpur, 1987–1991.
- University of California,
Berkeley, 1991–1996.
-
IBM Almaden
Research Center, 1996–1999.
- IIT Bombay,
1999–present.
- Carnegie-Mellon
University, Spring 2004.
- Google, Mountain View,
2014–2016.
Current research interests
- Representation learning for graph search
- We are exploring how to go beyond shallow graph neural networks
to represent nodes, edges and graphs for better link prediction
and searching a corpus of graphs with a query graph with
trainable notions of subgraph isomorphism.
- Better embedding representation for entities, types,
relations, and time
- We are studying how to embed entities, types, relations and
time to infer new edges in regular and temporal knowledge graphs,
and their application to (temporal) question answering.
- Complex multi-modal question answering
- With IBM Research, I am exploring how to translate complex
queries involving knowledge base access, arithmetic and logical
operations into structured programs with memory.
- Code-switched text analysis
- Indian languages borrow heavily from English, resulting in
``code switching'' languages like Hindlish, Benglish, etc., the
lingua franca of social media. We are investigating how to
improve standard NLP tasks by generating synthetic code-switched
text, and designing multi-task low-supervision recurrent
networks.
[The World Wide Web is] the only thing I know of whose shortened form
— WWW — takes three times longer to say than what it's short for.
—Douglas Adams
Past projects
- Searching the annotated Web with entities, types and relations
- We built CSAW, a search system
that integrates type and role annotations with keyword matches,
thereby exploiting lexical ontologies and entity taggers
within an information retrieval system.
- Graph conductance search
- Rich connections between random walks, graph eigensystems, and
electrical networks make it attractive to apply them for ranking
nodes. PageRank is a prominent example of the paradigm. In PageRank,
the edge weights are fixed and we have to compute steady state
probabilities of nodes. What if we have
something like the opposite problem?
And how to make this fast at query time?
Supported by
IBM
and Microsoft (2007, 2008).
- Integrating IR with databases
- In the BANKS project,
we proposed new paradigms of keyword search in graphs that can
represent text embedded in relational or XML-like data.
- The effect of search engines on the Web graph and page popularity
- Search engines are influenced by the (in)degree of Web pages, but
their ranked lists modulate page popularity and eventually their
(in)degree, setting up a feedback to some degree. Might the evolution
of the Web graph be influenced substantially by the existence of
search engines? Is there a need to regulate monopolies? What are
healthy economic objectives, and how to optimize them?
- Focused crawlers to build topic-specific portals
- A focused crawler collects a topic-specific
subgraph of the Web by coupling classifiers and reinforcement learners
with crawlers. An open-source focused crawler project was started at
the Lab. for Intelligent
Internet Research and is available.
- Mining hypertext to estimate topics and popularity
- I built a hypertext
classifier that uses the text in and links around a given Web
page to label it with a topic. This was an early application of
Markov networks to Web analysis. As a member of the
IBM Clever
Project, I worked on
algorithms
to analyze the links around a web page and the text in pages that
cite the given page to assign it a measure of popularity.
- Compiling and running parallel scientific programs
- In a previous life, my PhD thesis
was on the design and implementation of compilers and
runtime
systems for distributed memory multiprocessors.
Seems like distributed parallel computing is hot again,
thanks to "Big Data"!
- Downloads
- Recent papers are listed below with accompanying git repo links.
Some older software can be
found here.
-
Professional activity
- Journal editorship
- Conference/workshop organization
-
- Linguistics Meets Image and Video Retrieval. Workshop
at ICCV 2019.
Co-organizer.
- IJCAI 2019, area chair.
- WWW 2017, poster track co-chair with Mounia Lalmas
and Wei Chen.
- CIKM 2014, area char for text and Web data mining.
- EMNLP 2013,
area chair for information retrieval and question answering.
- WWW 2013, track chair for
search, systems and applications.
- SIGIR 2011,
area chair for Web IR and social media search.
- WWW 2010, program co-chair with
Juliana Freire.
- SIGIR
2010, senior PC member.
- Web Search APIs:
The Next Generation — A panel discussion at
WWW 2009.
Panel slides.
- SIGIR 2009,
Area Chair, Machine Learning for IR.
- WSDM 2008 ("wisdom"),
Program Co-chair with
Andrei Broder.
- VLDB 2007,
Tutorial Co-Chair.
- ECML-PKDD 2006,
Area Chair, Track for mining links, graphs, trees and
high-dimensional data.
- WWW 2006,
Deputy Chair, Data Mining track.
- COMAD 2005b,
Associate Program Chair.
- WWW 2003,
Vice Chair, Searching and Mining track.
- ICDE 2003.
Vice Chair, Data, Text and Web Mining track.
- WWW 2002, Deputy Chair,
Searching, Querying and Indexing
track (CFP).
- Conference/journal committee/reviewing
-
NeurIPS 2024 (area chair),
NeurIPS 2023 (area chair),
EMNLP 2022,
ARR 2020-,
TACL 2020-2022,
WSDM 2021 (senior PC),
NeurIPS 2020,
EMNLP 2020,
ACL 2020,
IJCAI 2020 (senior PC),
AAAI 2020,
EMNLP 2019,
IJCAI 2019,
ICML 2019,
NeurIPS 2018,
ICML 2018,
NAACL 2018,
WSDM 2018 (test of time awards),
SIGIR 2017 (awards),
SIGKDD 2017 (awards),
WSDM 2017 (awards),
NIPS 2017,
ACL 2017;
NIPS 2016,
SIGIR 2016;
CIKM 2014,
ISWC 2014,
SIGIR 2014,
ACL 2014,
WSDM 2014 (senior PC);
SIGKDD 2013 (senior PC),
WSDM 2013 (senior PC and
awards committee);
EMNLP 2012,
SIGKDD 2012 (senior PC),
WWW 2012;
NIPS 2011,
ICML 2011
(PC and invited applications talks committee),
WWW 2011;
SIGKDD 2010;
NIPS 2009,
WWW 2009,
WSDM 2009 (senior PC);
SIGKDD 2008 (senior PC),
SIGIR 2008 (senior PC),
WWW 2008;
WWW 2007,
SIGMOD 2007;
SIGKDD 2006 (senior PC);
EMNLP/HLT 2005,
SIGKDD 2005,
WWW 2005
(panel),
SIGMOD 2005;
SIGKDD 2004,
SIGIR 2004,
VLDB 2004,
WWW 2004,
ICDE 2004;
SIGIR 2003,
SIGKDD 2003,
VLDB 2003 (IIS),
SODA 2003;
SIGIR 2002,
ICDE 2002;
SIGIR 2001,
WWW 2001;
WWW 2000;
SIGKDD 1999;
SIGKDD 1998.
- Other
- Web Search and Data Mining (WSDM) steering committee member, 2008–2013.
- ACM SIGKDD
Curriculum Committee Member.
Courses
But the power of instruction is seldom of much efficacy,
except in those happy dispositions where it is
almost superfluous.
—Edward Gibbon,
The Decline And Fall Of The Roman Empire
Volume 1, Chapter 4.
- To reduce administrative overhead we will continue to use existing course codes CS635 (Autumn) and CS728 (Spring), but, since ChatGPT came out late 2022, we have again revamped the courses with new contents and removed some outdated material. CS635 is a soft prerequisite for CS728, but not enforced. Offerings:
2023.1S.CS728,
2023.2A.CS635,
2024.1S.CS728.
- Web Search and Mining has been expanded to a two-semester
sequence, shorthanded WMa (CS635, Autumn) and WMb (CS728, Spring).
WMa retains the old course code, but has been planned from scratch.
WMb will be largely about information extraction and integration,
and querying over semistructured and graphical data representations.
WMa Autumn 2009,
WMb Spring 2010,
WMa Autumn 2010,
WMb Spring 2011,
WMa Autumn 2011,
WMa Spring 2013,
WMa Autumn 2013,
WMb Spring 2014,
WMa Autumn 2016,
WMb Spring 2017,
WMa Autumn 2017,
WMb Spring 2018,
WMb Spring 2019,
WMa Autumn 2019,
WMb Spring 2020 (partly online),
WMa Autumn 2020 (online),
WMa Autumn 2020 (online),
WMb Spring 2021 (online),
WMa Autumn 2021 (online),
WMb Spring 2022 prereq reading (online/hybrid),
WMa Autumn 2022 (in-person),
WMb Spring 2023.
- Statistical Foundations of Machine Learning:
Autumn 2005,
Autumn 2006,
Autumn 2007,
Autumn 2008.
- Web Search and Mining (earlier called
Information Retrieval and Mining for Hypertext and the Web):
Spring 2001,
Spring 2002,
Spring 2003,
Spring 2005,
Spring 2006
(new improved),
Spring 2007,
Spring 2008,
Spring 2009.
- Undergraduate Programming Languages,
Spring 2000,
Autumn 2000,
Autumn 2001,
Autumn 2002,
Autumn 2003,
Autumn 2004.
- Computer programming and utilization aka
CS101,
Spring 2012.
- Undergrad
software lab: Autumn 2018.
- Graduate
software lab: Autumn 1999,
Autumn 2000.
... your work is to keep cranking the flywheel that turns the gears
that spin the belt in the engine of belief that keeps you and your desk
in midair
—Annie Dillard,
The Writing Life.
Publication
,
,
,
,
,
- How to think step-by-step: A
mechanistic understanding of chain-of-thought reasoning.
With Subhabrata Dutta, Joykirat Singh, Tanmoy Chakraborty.
TMLR 2024.
- Frugal LMs Trained to Invoke
Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning.
With Subhabrata Dutta, Joykirat Singh, Ishan Pandey, Sunny Manchanda and
Tanmoy Chakraborty. AAAI 2024.
code
- CRUSH4SQL:
Collective Retrieval Using Schema Hallucination For Text2SQL.
With Mayank Kothyari, Dhruva Dhingra, and Sunita Sarawagi.
EMNLP 2023.
code
- Small Language Models
Fine-Tuned for Decomposition and Solution
Improve Complex Reasoning.
With Gurusha Juneja, Subhabrata Dutta, Sunny Manchanda
and Tanmoy Chakraborty. EMNLP 2023.
- Locality Sensitive
Hashing in Fourier Frequency Domain For Soft Set Containment Search.
With Indradyumna Roy, Rishi Agarwal, Anirban Dasgupta,
and Abir De. NeurIPS 2023.
- mOKB6: A Multilingual Open
Knowledge Base Completion Benchmark.
With Shubham Mittal, Keshav Kolluru and Mausam. ACL 2023.
- Entropy-guided Vocabulary
Augmentation of Multilingual Language Models for
Low-resource Tasks. With Arijit Nag, Bidisha Samanta,
Animesh Mukherjee, and Niloy Ganguly.
ACL Findings, 2023.
- Multi-Row, Multi-Span Distant
Supervision For Table+Text Question Answering.
With Vishwajeet Kumar, Saneem Chemmengath, Yash Gupta, Jaydeep Sen, Samarth Bharadwaj
and Feifei Pan. ACL 2023.
- TwiRGCN: Temporally Weighted Graph Convolution for Question Answering over Temporal Knowledge Graphs. With Aditya Sharma, Apoorv Saxena, Chitrank Gupta, Seyed Mehran Kazemi, and Partha Talukdar. EACL 2023.
- Structured Case-based Reasoning for Inference-time Adaptation of Text-to-SQL parsers. With Abhijeet Awasthi and Sunita Sarawagi. AAAI 2023.
- Joint Completion and Alignment of
Multilingual Knowledge Graphs.
With Harkanwar Singh, Shubham Lohiya, Prachi Jain and Mausam.
EMNLP 2022. A preliminary version appeared in
AKBC 2021.
arXiv version.
code
- Maximum Common
Subgraph Guided Graph Retrieval: Late and Early Interaction Networks.
With Indradyumna Roy and Abir De.
NeurIPS 2022.
- Neural
Estimation of Submodular Functions with Applications to
Differentiable Subset Selection.
With Abir De. NeurIPS 2022.
-
Transfer Learning
for Low Resource Multilingual Relation Classification.
With Arijit Nag, Bidisha Samanta, Animesh Mukherjee and Niloy Ganguly.
TALLIP 2022. A preliminary version appeared in CoNLL 2021.
data
- VarScene:
A Deep Generative Model for Realistic Scene Graph Synthesis.
With Tathagat Verma, Abir De, Yateesh Agrawal, and Vishwa Vinay.
ICML 2022.
- Incomplete Gamma
Integrals for Deep Cascade Prediction using Content, Network,
and Exogenous Signals.
With Subhabrata Dutta, Shravika Mittal, Dipankar Das, and
Tanmoy Chakraborty. IEEE TKDE 2022.
- AIT-QA:
Question Answering Dataset over Complex
Tables in the Airline Industry.
With Yannis Katsis, Saneem Ahmed Chemmengath,
Vishwajeet Kumar, Samarth Bharadwaj, Mustafa Canim,
Michael Glass, Alfio Gliozzo, Feifei Pan, Jaydeep Sen, and
Karthik Sankaranarayanan. NAACL 2022.
data
- Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction.
With Keshav Kolluru, Muqeeth M, Shubham Mittal, and Mausam.
ACL 2022.
- Interpretable
Neural Subgraph Matching for Graph Retrieval.
With Indradyumna Roy, Venkata Sai Velugoti and Abir De. AAAI 2022.
- Semi-supervised stance
detection of tweets via distant network supervision.
With Subhabrata Dutta, Samiya Caur, and Tanmoy Chakraborty.
WSDM 2022.
- Active Assessment of
Prediction Services as Accuracy Surface Over Attribute Combinations.
With Vihari Piratla and Sunita Sarawagi. NeurIPS 2021.
code
- Redesigning the
Transformer Architecture with Insights
from Multi-particle Dynamical Systems.
With Subhabrata Dutta, Tanya Gautam, and Tanmoy Chakraborty.
NeurIPS 2021.
- T3QA: Topic
Transferable Table Question Answering.
With Saneem Chemmengath, Vishwajeet Kumar, Samarth Bharadwaj,
Jaydeep Sen, Mustafa Canim, Alfio Gliozzo and Karthik Sankaranarayanan.
EMNLP 2021.
- A Data Bootstrapping Recipe
for Low-Resource Multilingual Relation Classification.
With Arijit Nag, Bidisha Samanta, Animesh Mukherjee and Niloy Ganguly.
CoNLL 2021.
- Multilingual
Knowledge Graph Completion With Joint Relation and
Entity Alignment. With Harkanwar Singh, Prachi Jain,
Sharod Roy Choudhury, and Mausam.
AKBC 2021.
- Integrating Transductive and
Inductive Embeddings Improves Link Prediction Accuracy. With
Chitrank Gupta, Yash Jain, and Abir De. CIKM 2021.
- Question Answering
over Temporal Knowledge Graphs.
With Apoorv Saxena and Partha Talukdar. ACL 2021.
code
trackback
- Select, Substitute, Search:
A New Benchmark for Knowledge-Augmented Visual Question Answering.
With Aman Jain, Mayank Kothyari, Vishwajeet Kumar, Preethi Jyothi,
and Ganesh Ramakrishnan.
SIGIR 2021.
code
- Joint Autoregressive and
Graph Models for Software and Developer Social Networks. With
Rima Hazra, Hardik Aggarwal, Pawan Goyal, and
Animesh Mukherjee. ECIR 2021.
(Data.)
- Adversarial Permutation
Guided Node Representations for Link Prediction.
With Indradyumna Roy and Abir De. AAAI 2021.
- Differentially Private
Link Prediction With Protected Connections.
With Abir De. AAAI 2021.
- Temporal Knowledge
Base Completion: New Algorithms and Evaluation Protocols.
With Prachi Jain, Sushant Rathi, and Mausam. EMNLP 2020.
code
- OpenIE6: Iterative Grid
Labeling and Coordination Analysis for Open Information Extraction.
With Keshav Kolluru, Vaibhav Adlakha, Samarth Aggarwal and Mausam.
EMNLP 2020. code
- NLP Service APIs and Models
for Efficient Registration of New Clients.
With Sahil Shah, Vihari Piratla, and Sunita Sarawagi.
EMNLP Findings 2020.
- Deep Exogenous and
Endogenous Influence Combination for Social Chatter
Intensity Prediction.
With Subhabrata Dutta, Sarah Masud and Tanmoy Chakraborty.
SIGKDD 2020.
- Deep Neural
Matching Models for Graph Retrieval.
With Utkarsh Gupta, Kunal Goyal, and Abir De. SIGIR 2020.
- Interpretable
complex question answering. WebConf 2020.
- IMOJIE: Iterative
Memory-Based Joint Open Information Extraction.
With Keshav Kolluru, Samarth Aggarwal,
Vipul Rathore, and Mausam. ACL 2020.
code
- Neural Architecture
for Question Answering Using a Knowledge Graph and Web Corpus.
With Uma Sawant, Saurabh Garg, and Ganesh Ramakrishnan.
Information Retrieval Journal, 2019. Presented at ECIR 2020.
- Analysis of
reference and citation copying in evolving bibliographic networks.
With Pradumn Kumar Pandey, Mayank Singh, Pawan Goyal and Animesh Mukherjee.
Journal of Informetrics, 2020.
- On
Computing Entity Relatedness in Wikipedia, with Applications.
With Marco Ponza and Paolo Ferragina. Knowledge-Based Systems, 2020.
code,
data
- Learning
Linear Influence Models in Social Networks from Transient Opinion
Dynamics. With Abir De, Sourangshu Bhattacharya, Parantapa
Bhattacharya, and Niloy Ganguly. ACM TWEB
2019. Preliminary
version in CIKM 2014.
- Neural
Program Induction for KBQA Without Gold Programs or Query
Annotations. With Ghulam Ahmed Ansari, Amrita Saha, Vishwajeet
Kumar, Mohan Bhambhani and Karthik Sankaranarayanan. IJCAI
2019. code
- A Deep Generative
Model for Code-Switched Text. With Bidisha Samanta, Sharmila
Reddy, Hussain Jagirdar and Niloy Ganguly. IJCAI 2019.
code
- Improved
Sentiment Detection via Label Transfer from Monolingual
to Synthetic Code-Switched Text.
With Bidisha Samanta and Niloy Ganguly. ACL 2019.
- Topic
Sensitive Attention on Generic Corpora Corrects Sense Bias
in Pretrained Embeddings.
With Vihari Piratla and Sunita Sarawagi. ACL 2019.
- Complex Program Induction for Querying Knowledge
Bases in the Absence of Gold Programs. With Amrita Saha,
Ahmed Ansari, Abhishek Laddha and Karthik Sankaranarayanan.
TACL 2019.
- Multi-task
Learning for Target-dependent Sentiment Classification.
With Divam Gupta, Kushagra Singh, and Tanmoy
Chakraborty. PAKDD 2019.
- Automated Early
Leaderboard Generation From Comparative Tables. With Mayank
Singh, Rajdeep Sarkar, Atharva Vyas, Pawan Goyal, and Animesh
Mukherjee. ECIR 2019.
- GIRNet: Interleaved
Multi-Task Recurrent State Sequence Models. With Divam
Gupta and Tanmoy Chakraborty. AAAI 2019. code
- Type-Sensitive
Knowledge Base Inference Without Explicit Type Supervision.
With Prachi Jain, Pankaj Kumar, and Mausam. ACL 2018.
code
- Mitigating
the Effect of Out-of-Vocabulary Entity Pairs in Matrix Factorization
for KB Inference. With Prachi Jain, Shikhar Murty, and Mausam.
IJCAI 2018.
code
- New Embedded
Representations and Evaluation Protocols for Inferring
Transitive Relations. With Sandeep
Subramanian. SIGIR
2018.
- Open-domain
question answering using a knowledge graph and Web corpus.
With Uma Sawant and Ganesh Ramakrishnan. ACM SIGWEB Newsletter
(invited), 2018.
- Generalizing
Across Domains via Cross-Gradient Training. With Shiv
Shankar, Vihari Piratla, Siddhartha Chaudhuri, Preethi Jyothi,
and Sunita Sarawagi. ICLR
2018.
- Task-Specific
Representation Learning for Web-scale Entity Disambiguation.
With Rijula Kar, Susmija Reddy, Sourangshu Bhattacharya and
Anirban
Dasgupta. AAAI
2018. code
- A Two-Stage
Framework for Computing Entity Relatedness in Wikipedia. With
Marco Ponza and Paolo Ferragina.
CIKM 2017.
code,
data
- Relay-Linking Models for
Prominence and Obsolescence in Evolving Networks.
With Mayank Singh, Rajdeep Sarkar, Pawan Goyal, and
Animesh Mukherjee. SIGKDD 2017.
video
- Earth Mover Distance Pooling
over Siamese LSTMs for Automatic Short Answer Grading.
With Sachin Kumar and Shourya Roy. IJCAI 2017.
- Collective
Entity Resolution with Multi-Focal Attention.
With Amir Globerson, Nevena Lazic, Amarnag Subramanya, Michael Ringgaard
and Fernando Pereira. ACL 2016.
- Discriminative Link Prediction using Local, Community, and Global Signals.
With Abir De, Sourangshu Bhattacharya, Sourav Sarkar and Niloy Ganguly.
IEEE TKDE Journal, 2016.
- Knowledge
Graph and Corpus Driven Segmentation and
Answer Inference for Telegraphic Entity-seeking Queries.
With Mandar Joshi and Uma Sawant.
EMNLP 2014.
- Quantity
Queries on Web Tables: Annotation, Response and Consensus Models.
With Sunita Sarawagi.
SIGKDD 2014.
code
- Discriminative Link
Prediction using Local Links, Node Features and Community
Structure. With Abir De and Niloy Ganguly. ICDM 2013.
- Joint
Bootstrapping of Corpus Annotations and Entity Types.
With Siddhanth Jain and Hrushikesh Mohapatra.
EMNLP 2013.
- Web-scale Entity Annotation Using MapReduce.
With Shashank Gupta and Varun Chandramouli.
HiPC 2013.
- Learning Joint Query Interpretation
and Response Ranking. With Uma Sawant.
WWW 2013.
- Compressed Data Structures for Annotated
Web Search. With
Sasidhar Kasturi, Bharath Balakrishnan,
Ganesh Ramakrishnan, and Rohit Saraf. WWW 2012.
- Diversity in ranking via
resistive graph centers.
With Avinava Dubey and Chiru Bhattacharyya.
SIGKDD 2011. (Source code
is available, contact Avinava Dubey for usage details.)
- SCAD: Collective Discovery of Attribute Values.
With Anton Bakalov, Ariel Fuxman, and Partha Talukdar.
WWW 2011.
- Index Design and Query Processing for Graph Conductance Search. With Amit Pathak and Manish Gupta.
VLDB Journal, 2010.
- Annotating and Searching Web Tables Using Entities, Types and
Relationships. With Girija Limaye and Sunita Sarawagi.
VLDB 2010.
- Conditional Models for
Non-smooth Ranking Loss Functions.
With Avinava Dubey, Jinesh Machchhar, and Chiru Bhattacharyya.
ICDM 2009, Miami.
- Learning to rank for
quantity consensus queries.
With Somnath Banerjee and Ganesh Ramakrishnan.
SIGIR 2009, Boston.
- Collective annotation of
Wikipedia entities in Web text.
With Sayali Kulkarni, Amit Singh and Ganesh Ramakrishnan.
SIGKDD 2009, Paris.
- Text search enhanced with types and entities. Chapter in
Text Mining: Theory, Application, and Visualization,
Srivastava and Sahami, eds., 2008.
- New
closed form bounds on the partition function.
With Dvijotham Krishnamurthy and Subhasis Chaudhuri.
ECML/PKDD 2008, Antwerp.
- Structured Learning
for Non-Smooth Ranking Losses.
With Rajiv Khanna, Uma Sawant and Chiru Bhattacharyya.
SIGKDD 2008, Las Vegas.
- Learning to rank in vector spaces and social networks.
Internet
Mathematics, 2008.
- Focused Web Crawling. Entry in the
Encyclopedia of
Database Systems, 2008.
- The influence of search engines on preferential attachment.
With Alan Frieze and Juan Vera.
Internet Mathematics, volume 3, number 3 (2006–2007), pages 361–381.
A preliminary version
appeared in SODA 2005.
- Learning Random Walks to Rank
Nodes in Graphs. With Alekh Agarwal.
ICML 2007,
Oregon.
- Dynamic Personalized Pagerank
in Entity-Relation Graphs.
WWW 2007, Banff.
- Accelerating Newton optimization for
log-linear models through feature redundancy. With Arpit Mathur.
IEEE ICDM 2006,
Hong Kong.
- Learning parameters in entity-relationship
graphs from ranking preferences. With Alekh Agarwal.
ECML-PKDD 2006,
Berlin.
- Learning to rank networked entities.
With Alekh Agarwal and Sunny Aggarwal.
SIGKDD Conference 2006,
Philadelphia.
- Optimizing
Scoring Functions and Indexes for Proximity Search in Type-annotated
Corpora. With Kriti Puniyani and Sujatha Das.
WWW 2006, Edinburgh.
- Enhanced
Answer Type Inference from Questions using Sequential Models.
With Vijay Krishnan and Sujatha Das.
EMNLP/HLT 2005,
Vancouver.
- Bidirectional Expansion For Keyword Search on Graph Databases.
With Varun Kacholia, Shashank Pandit, S. Sudarshan,
Rushi Desai and Hrishikesh Karambelkar. VLDB 2005.
- Shuffling a Stacked Deck: The Case for Partially Randomized
Ranking of Search Engine Results.
With Sandeep Pandey, Sourashis Roy, Chris Olston, and Junghoo Cho.
VLDB 2005.
- Is question answering an
acquired skill?
With Ganesh Ramakrishnan, Deepa Paranjpe, and
Pushpak Bhattacharyya.
WWW2004,
New York City.
- Fast and accurate text classification
via multiple linear discriminant projections.
With Shourya Roy and Mahesh Soundalgekar.
VLDB Journal, 12(2), pages 170–185
[conference version, talk slides].
- Cross-Training:
Learning Probabilistic Mappings Between Topics.
With Sunita Sarawagi and Shantanu Godbole.
SIGKDD Conference 2003,
Washington D.C.
- Monitoring the Dynamic Web
to respond to Continuous Queries.
With Sandeep Pandey and Krithi Ramamritham.
WWW 2003,
Budapest, Hungary, May 2003.
(talk slides.)
- Accelerated focused
crawling through online relevance feedback.
With Kunal Punera and Mallela Subramanyam.
WWW 2002, Hawaii.
(Local copy.)
- The structure of
broad topics on the Web.
With Mukul Joshi, Kunal Punera, and David M. Pennock.
WWW 2002, Hawaii.
(Local copy.)
-
Keyword
Searching and Browsing in Databases using BANKS.
With Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, and S. Sudarshan.
In ICDE 2002. Also see the BANKS
home page.
-
Enhanced
topic distillation using text, markup tags, and hyperlinks.
With Mukul M. Joshi and Vivek B. Tawde.
In SIGIR 2001
(talk slides).
-
Integrating the
Document Object Model with hyperlinks for enhanced
topic distillation and information extraction.
In the 10th International World Wide Web
Conference, Hong Kong, May 2001.
- Memex: A browsing assistant
for collaborative archiving and mining of surf trails.
With Sandeep Srivastava, Mallela Subramanyam and Mitul Tiwari.
Demo at VLDB 2000.
-
Data mining for hypertext:
A tutorial survey.
SIGKDD
Explorations, 1(2), pages 1–11, 2000.
-
Using
Memex to archive and mine community Web browsing experience.
With Sandeep Srivastava, Mallela Subramanyam and Mitul Tiwari.
In the 9th International World Wide Web
Conference, Amsterdam, May 2000.
Talk slides.
Social
bookmarking companies founded long after this paper:
HistorySE,
Delicious,
Digg,
StumbleUpon,
Reddit,
Furl, Simpy, Citeulike, etc., and finally,
Mozilla Pocket!
-
Mining
the Link Structure of the World Wide Web. With Byron E. Dom, S. Ravi Kumar,
Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson,
and Jon Kleinberg. In IEEE Computer, vol. 32, no. 8, August 1999.
-
Distributed Hypertext Resource
Discovery Through Examples.
With Martin van den Berg and Byron Dom.
VLDB 1999, Edinburgh, Scotland.
Talk slides.
-
Hypersearching
the Web. With
Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan,
Andrew Tomkins, Jon M. Kleinberg, and David Gibson.
Invited paper in Scientific American,
June 1999.
-
Surfing
the Web Backwards. With D. A. Gibson and K. S. McCurley.
In WWW 1999.
-
Focused crawling: A
new approach to topic-specific Web resource discovery. With M. van
den Berg and B. Dom. WWW 1999,
Toronto, May 1999.
Upcoming and past talks and meetings
- Deep Knowledge Graph Representation Learning
for Completion, Alignment, and Question Answering.
Tutorial at SIGIR 2022.
- Temporal Knowledge Graph Representation and Question Answering.
ACSS 2021,
FIRE 2021.
- The future of search and recommendation: Beyond web search: panel discussion at Microsoft Research Summit 2021.
- A brief history of question answering. Invited talk at The Future of the Web track, WebConf 2021.
- Graph Neural Networks and Knowledge Graph Completion. Distinguished lecture at the Kohli Center on Intelligent Systems, IIIT Hyderabad, 2021/03/30. Distinguished seminar at ConcertAI, 2021/03/10.
- Knowledge Base Completion: The Role of Types and Time. Amazon Research Days, 2020.
- Learning New Type Representations from Knowledge Graphs.
Keynote talk at KG4IR
2018 (video).
- Tutorial on knowledge extraction and inference from text.
Subset of CIKM 2017 tutorial, at
SIGIR
2018.
- Answering questions: The shallow and the deep.
TIFR STCS seminar.
April
2018. Flipkart
Blue Sky seminar, June 2018. Interview.
- Tutorial
with Partha Talukdar
at CIKM 2017 on Knowledge
Extraction and Inference from Text.
- Keynote talk at CoDS
2017, Chennai, March 2017.
- Keynote talk at
CIKM 2014
Industry Track, Nov 2014.
- Keynote talk at
COMSNETS 2014, Bangalore, Jan 2014.
- Tutorial on Query Interpretation and
Representation for Searching the Web of Objects at
WWW 2013, Rio de Janeiro.
- WWW 2010 Conference, NC, April 2010.
- Keynote
talk at WSDM 2010, NYC, February 2010.
[Talk slides.]
- WWW 2010 PC meeting, Salt Lake City, Utah,
January 2010.
- WWW 2009
tutorial
and panel,
April 2009.
- SIGIR 2008 PC meeting, University of Maryland, March 2008.
- WSDM 2008, Stanford University, February 2008.
- Tutorial on Learning to rank in vector
spaces and social networks at WWW
2007, Banff.
- Keynote talk at WAW
and a short
course at Banff, Nov 2006.
- Invited talk at the
International Workshop
on Intelligent Information Access, Helsinki, July 2006.
- Invited talk at the ICML 2005 workshop on Learning in Web Search.
- Invited talk at the ICML 2005 workshop on
Learning and Extending Lexical Ontologies
by using Machine Learning Methods.
- Panel discussion on exploiting dynamic
networking effects in Web advertising at
WWW 2005.
- Invited talk and position paper at
ECML/PKDD
in Pisa, Sept. 2004.
- Short course on
machine learning for hypertext applications at
ADFOCS
in Saarbrücken, Sept. 2004.
- Graph
structures in data mining. A tutorial presented at
SIGKDD
2004 with Christos
Faloutsos.
-
Text search for
fine-grained semi-structured data.
A tutorial presented at VLDB 2002.
-
Beyond hubs and authorities: spreading out and zooming in.
Invited talk at
ICDT International Workshop
on Web Dynamics, London, Jan. 2001.
-
Data Mining and Learning on the Web. NIPS Workshop, Denver,
Dec. 2000. By invitation.
-
Nurturing
content-based collaborative communities on the Web.
Invited talk at the Joint
SIGDAT
Conference on Empirical Methods in Natural Language Processing and
Very Large Corpora
(EMNLP/VLC), Hong Kong, Oct. 7–8, 2000.
-
Hypertext data mining:
A tutorial presented at the
SIGKDD
Conference, Boston, August 2000.
- Hypertext databases and hypertext data mining.
SIGMOD 1999 Tutorial.
Patents
- Determining NCCs
and/or using the NCCs to adapt performance of computer-based
action(s).
- /US8447766
Method and system for searching unstructured
textual data for quantitative answers to queries.
- /US6112221
System and method for scheduling web servers with a
quality-of-service guarantee for each user.
- /US6418433
System and method for focussed web crawling.
- /US6389436
Enhanced hypertext categorization using hyperlinks.
- /US6336112B2/US6336112
Method for interactively creating an information database including
preferred information elements, such as, preferred-authority,
world wide web pages.
- /US6334131
Method for cataloging, filtering, and relevance ranking frame-based
hierarchical information structures.
-
Method and system for filtering of information entities.
- Method and system
for distributed autonomous maintenance of bidirectional hyperlink
metadata on the web and similar hypermedia repository.
- /
Feature diffusion across hyperlinks.
- /US6189005
System and method for mining surprising temporal patterns.
- System and method
for dynamic index-probe optimizations for high-dimensional
similarity search.
- /US6233575
Multilevel taxonomy based on features derived from training documents
classification using fisher values as discrimination values.
Links in areas of interest