CSAW: Curating and Searching the Annotated Web
Our ambition is to annotate mentions of named entities on billions
of Web pages with IDs, thus linking them to entity nodes in Wikipedia.
This will enable searching with entities and relationships at an
unprecedented scale. The project has two parts: annotating token
segments on Web pages with Wikipedia entity IDs, and a new aggregated
search mechanism for quantities.
Papers and Talk Slides
- Joint Bootstrapping
of Corpus Annotations and Entity Types.
Siddhanth Jain, Hrushikesh Mohapatra and Soumen Chakrabarti. EMNLP 2013.
- Web-scale Entity Annotation Using MapReduce.
Shashank Gupta, Varun Chandramouli and Soumen Chakrabarti.
- Learning Joint Query Interpretation
and Response Ranking. Uma Sawant and Soumen Chakrabarti.
- Compressed Data Structures for Annotated
Web Search. Soumen Chakrabarti, Sasidhar Kasturi, Bharath Balakrishnan,
Ganesh Ramakrishnan, and Rohit Saraf. WWW 2012.
and Searching Web Tables Using Entities, Types and
Relationships. By Girija Limaye, Sunita Sarawagi and Soumen
Chakrabarti. In VLDB
- Collective Annotation of Wikipedia Entities in Web Text,
by Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti,
in SIGKDD 2009. Talk slides,
- Learning to Rank for Quantity Consensus Queries,
by Somnath Banerjee, Soumen Chakrabarti and Ganesh Ramakrishnan,
in SIGIR 2009.
Demo, poster, press, etc.
- Web-scale Entity-Relation Search Architecture. By Devshree Sane,
Ganesh Ramakrishnan, Soumen Chakrabarti. Poster
in WWW 2011.
- Curating and Searching the Annotated Web,
by Amit Singh, Sayali Kulkarni, Somnath Banerjee,
Ganesh Ramakrishnan, and Soumen Chakrabarti, in SIGKDD 2009.
- Search market to
get another engine. Business Standard, Thursday, Aug 27, 2009. (Disclaimer: We are definitely not in the engine market.)
- To get an SVN account, send me email.
Your login ID will be your full email address.
- You will be mailed back a temporary password that you should
- Once you get an account, use the following SVN base URLs to access
We might integrate these more closely at some point and you
may need to relocate your working copy.
- In case hostname soumen.cse.iitb.ac.in does not work, you can try
soumen.in although the certificate hostname will not match and you
will get a warning. You will get a warning anyway because the
certificate is self-signed.
Related projects, services, products, links
(In approximate order of recency) Soumen Chakrabarti, Uma Sawant,
Shashank Gupta, Siddhanth Jain, Hrushikesh Mohapatra, Sasidhar
Kasturi, Devshree Sane, Ganesh Ramakrishnan, Apoorv Sharma, , Amit
Singh, Sayali Kulkarni, Somnath Banerjee.
Partly supported by grants from Google, HP Labs, Yahoo,
Microsoft Research, NetApp and SAP.