Login
Talks & Seminars
Title: Clustering Queries with Similar Intent at Web Scale
Dr. Manoj Agarwal, Microsoft India
Date & Time: March 27, 2019 15:00
Venue: Department of Computer Science and Engineering, Room No. 109, 01st Floor, New CSE/CC Building
Abstract:
Identifying dense clusters in graphs has many applications, such as data summarization, data analysis and visualization, recommendation, etc. However, as the size of the graph grows, classical graph clustering methods do not scale. In this talk, we present a novel method to identify dense clusters in massive graphs in a scalable, incremental and distributed manner. We consider the weighted Query-URL (QU graph) bipartite graph, generated on the real user data on a commercial search engine, with the objective of clustering similar intent queries. Clustering queries with similar intent is important for search engines to not only disambiguate the user intent, but also to help users reformulate the query. Such graphs are massive as hundreds of millions of unique user queries are generated in a single day. The clusters are identified in a distributed manner by first partitioning the input graph along the time line, and then joining each of the partitions such that the similar intent queries end up in the same cluster. These clusters are incrementally updated as QU graph is updated with more data. We conducted experiments over real data and show our method is highly scalable and can handle input QU graph with up to two billion nodes and produce high quality clusters. Our experiments show, more than 98% queries occur only in a single cluster, indicating high cluster quality in a highly distributed settings. We adopt a hierarchical clustering approach to handle such massive graphs. To the best of our knowledge, ours in a first method for clustering nodes in a massive dynamic graph in a distributed, incremental and hierarchical manner. Our technique is generic, i.e., not only it can handle bipartite graphs generated by different applications, but also, different node similarity measures can be induced seamlessly to identify clusters with different objectives.
Speaker Profile:
Manoj Agarwal is a Principal Applied Scientist at AI & Research team in Microsoft India, Hyderabad. He received his B. Tech from IIT Roorkee, Masters from University of Texas, Austin and PhD from IIT Bombay. Title of his PhD thesis was “Data as Graph: Search, Discovery, Retrieval”. His PhD thesis was awarded ACM India Doctoral Dissertation Award (Honorary Mention) for year 2017. Before joining Microsoft in 2015, he worked at IBM Research for close to 14 years. His research interests are in the areas of web mining, graph mining, pattern recognition, data mining and information retrieval. He has more than 25 patents filed. He has also published close to 25 research papers in reputed journals and conferences and has won Best Paper Award twice at prestigious international conferences besides getting various recognitions from IBM Research and Microsoft AI & Research over the years.
List of Talks

Webmail

Username:
Password:
Faculty CSE IT
Forgot Password
    [+] Sitemap     Feedback