CSAW: Curating and Searching the Annotated Web
Our ambition is to annotate mentions of named entities and
quantities on billions of Web pages,
improve query analysis, and thus,
enable searching with entities and relationships at an
unprecedented quality and scale.
Papers and Talk Slides
- Neural Architecture
for Question Answering Using a Knowledge Graph and Web Corpus.
With Uma Sawant, Saurabh Garg, and Ganesh Ramakrishnan.
Information
Retrieval Journal.
- Open-Domain
Question Answering Using a Knowledge Graph and Web
Corpus. Uma Sawant, Soumen Chakrabarti, Ganesh Ramakrishnan.
SIGWEB Newsletter, 2018.
- Task-Specific Representation Learning for Web-scale Entity
Disambiguation. Rijula Kar, Susmija Reddy, Sourangshu Bhattacharya,
Anirban Dasgupta, Soumen Chakrabarti. AAAI 2018.
[Code]
- Knowledge
Graph and Corpus Driven Segmentation and
Answer Inference for Telegraphic Entity-seeking Queries.
Mandar Joshi, Uma Sawant and Soumen Chakrabarti.
EMNLP 2014.
- Quantity Queries on Web Tables: Annotation, Response and Consensus Models. Sunita Sarawagi and Soumen Chakrabarti. SIGKDD 2014.
- Joint Bootstrapping
of Corpus Annotations and Entity Types.
Siddhanth Jain, Hrushikesh Mohapatra and Soumen Chakrabarti. EMNLP 2013.
- Web-scale Entity Annotation Using MapReduce.
Shashank Gupta, Varun Chandramouli and Soumen Chakrabarti.
HiPC 2013.
- Learning Joint Query Interpretation
and Response Ranking. Uma Sawant and Soumen Chakrabarti.
WWW 2013.
- Compressed Data Structures for Annotated
Web Search. Soumen Chakrabarti, Sasidhar Kasturi, Bharath Balakrishnan,
Ganesh Ramakrishnan, and Rohit Saraf. WWW 2012.
- Annotating
and Searching Web Tables Using Entities, Types and
Relationships. By Girija Limaye, Sunita Sarawagi and Soumen
Chakrabarti. In VLDB
2010.
- Collective Annotation of Wikipedia Entities in Web Text,
by Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti,
in SIGKDD 2009. Talk slides,
supplementary material.
- Learning to Rank for Quantity Consensus Queries,
by Somnath Banerjee, Soumen Chakrabarti and Ganesh Ramakrishnan,
in SIGIR 2009.
Talk slides.
Demo, poster, press, etc.
- Web-scale Entity-Relation Search Architecture. By Devshree Sane,
Ganesh Ramakrishnan, Soumen Chakrabarti. Poster
in WWW 2011.
- Curating and Searching the Annotated Web,
by Amit Singh, Sayali Kulkarni, Somnath Banerjee,
Ganesh Ramakrishnan, and Soumen Chakrabarti, in SIGKDD 2009.
- Search market to
get another engine. Business Standard, Thursday, Aug 27, 2009. (Disclaimer: We are definitely not in the engine market.)
Data
Code
We have some ancient Java code on SVN that we can share on request.
More recent code is here:
Related projects, services, products, links
Project members
(In approximate order of recency) Soumen Chakrabarti,
Saurabh Garg, Uma Sawant, Ganesh Ramakrishnan,
Mandar Joshi,
Shashank Gupta, Siddhanth Jain, Hrushikesh Mohapatra, Sasidhar
Kasturi, Devshree Sane, Apoorv Sharma,
Amit Singh, Sayali Kulkarni, Somnath Banerjee.
Support
Partly supported by grants from IBM, nVidia, Google, HP Labs, Yahoo,
Microsoft Research, NetApp and SAP.