Title: Graph-based Large-scale Knowledge Harvesting from Text
Dr. Partha Pratim Talukdar, Carnegie Mellon University, Carnegie Mellon University
Date & Time: February 21, 2013 15:00
Venue: Conference Room, 01st Floor, C Block, Department of Computer Science and Engineering, Kanwal Rekhi Building
The spectacular growth of the Internet and the World Wide Web has resulted in a huge explosion of data, a significant portion of which is stored as unstructured text in webpages, blogs, emails, tweets and other media. The need for automatic tools to intelligently process such large and diverse sources in order to organize, categorize and make them suitable for human consumption is now greater than ever before. Existing techniques fall short of achieving this goal as they often require significant amounts of human supervision which is expensive and time consuming to obtain. In order to overcome this challenge, in this talk I shall provide evidence that weak supervision in conjunction with coupled inference, especially over graphs, result in effective and scalable methods for harvesting knowledge from large data. I shall demonstrate effectiveness of this approach on two important knowledge harvesting subtasks: (1) assigning semantic types to entities (e.g., determining that Pittsburgh is a City); and (2) determining temporal validity of entity-relation-entity facts (e.g., Bill Clinton-PresidentOf-USA was true only during 1993-2001). I shall conclude the talk with a discussion on future work and open challenges.
Partha Pratim Talukdar is a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University, working with Tom Mitchell on the Never Ending Language Learning (NELL) project. Partha received his PhD (2010) in CIS from the University of Pennsylvania, working under the supervisions of Fernando Pereira, Zack Ives, and Mark Liberman. Partha is broadly interested in Machine Learning, Natural Language Processing, Data Integration, and Cognitive Science. His dissertation introduced novel graph-based weakly-supervised methods for Information Extraction and Integration. His past industrial research affiliations include HP Labs, Google Research, and Microsoft Research. Partha was a co-organizer of the NAACL-HLT 2012 joint workshop on Web-scale Knowledge Extraction (AKBC-WEKEX 2012), and an area chair for the EMNLP-CoNLL 2012 conference. He is currently co-authoring a book on graph-based semi-supervised learning. More details available at http://talukdar.net.
