CSAW: Curating and Searching the Annotated Web

Our ambition is to annotate mentions of named entities and quantities on billions of Web pages, improve query analysis, and thus, enable searching with entities and relationships at an unprecedented quality and scale.

Papers and Talk Slides

Neural Architecture for Question Answering Using a Knowledge Graph and Web Corpus. With Uma Sawant, Saurabh Garg, and Ganesh Ramakrishnan. Information Retrieval Journal.
Open-Domain Question Answering Using a Knowledge Graph and Web Corpus. Uma Sawant, Soumen Chakrabarti, Ganesh Ramakrishnan. SIGWEB Newsletter, 2018.
Task-Specific Representation Learning for Web-scale Entity Disambiguation. Rijula Kar, Susmija Reddy, Sourangshu Bhattacharya, Anirban Dasgupta, Soumen Chakrabarti. AAAI 2018. [Code]
Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. Mandar Joshi, Uma Sawant and Soumen Chakrabarti. EMNLP 2014.
Quantity Queries on Web Tables: Annotation, Response and Consensus Models. Sunita Sarawagi and Soumen Chakrabarti. SIGKDD 2014.
Joint Bootstrapping of Corpus Annotations and Entity Types. Siddhanth Jain, Hrushikesh Mohapatra and Soumen Chakrabarti. EMNLP 2013.
Web-scale Entity Annotation Using MapReduce. Shashank Gupta, Varun Chandramouli and Soumen Chakrabarti. HiPC 2013.
Learning Joint Query Interpretation and Response Ranking. Uma Sawant and Soumen Chakrabarti. WWW 2013.
Compressed Data Structures for Annotated Web Search. Soumen Chakrabarti, Sasidhar Kasturi, Bharath Balakrishnan, Ganesh Ramakrishnan, and Rohit Saraf. WWW 2012.
Annotating and Searching Web Tables Using Entities, Types and Relationships. By Girija Limaye, Sunita Sarawagi and Soumen Chakrabarti. In VLDB 2010.
Collective Annotation of Wikipedia Entities in Web Text, by Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti, in SIGKDD 2009. Talk slides, supplementary material.
Learning to Rank for Quantity Consensus Queries, by Somnath Banerjee, Soumen Chakrabarti and Ganesh Ramakrishnan, in SIGIR 2009. Talk slides.

Demo, poster, press, etc.

Web-scale Entity-Relation Search Architecture. By Devshree Sane, Ganesh Ramakrishnan, Soumen Chakrabarti. Poster in WWW 2011.
Curating and Searching the Annotated Web, by Amit Singh, Sayali Kulkarni, Somnath Banerjee, Ganesh Ramakrishnan, and Soumen Chakrabarti, in SIGKDD 2009.
Search market to get another engine. Business Standard, Thursday, Aug 27, 2009. (Disclaimer: We are definitely not in the engine market.)

Data

Syntax-poor translations of TREC-INEX and WebQuestions queries accompanying Joshi et al. paper.
Additional labeled data used to train some modules in AQQUCN.
TMI data from EMNLP 2013 paper.
Additional analysis associated with WWW 2012 paper.
Annotation data from KDD 2009 paper.
Quantity search data from SIGIR 2009 paper.

Code

We have some ancient Java code on SVN that we can share on request. More recent code is here:

Project members

(In approximate order of recency) Soumen Chakrabarti, Saurabh Garg, Uma Sawant, Ganesh Ramakrishnan, Mandar Joshi, Shashank Gupta, Siddhanth Jain, Hrushikesh Mohapatra, Sasidhar Kasturi, Devshree Sane, Apoorv Sharma, Amit Singh, Sayali Kulkarni, Somnath Banerjee.

Support

Partly supported by grants from IBM, nVidia, Google, HP Labs, Yahoo, Microsoft Research, NetApp and SAP.