Talks & Seminars
Efficient inference algorithms for information extraction using graphical models
Sunita Sarawagi,
Date & Time: August 22, 2007 10:45
Venue: F. C. Kohli Auditorium
The success of any automated entity extraction method depends on how effectively it unifies diverse clues in the unstructured source. Graphical models provide an elegant framework for combining such diverse clues in a principled framework. They transform the entity extraction problem to an inference problem of finding the highest scoring labeling out of the exponentially many possible labelings of the nodes of a graph. While much research exists on exact and approximate algorithms for general graphical models, the graphs that arise here still present novel optimization opportunities. The typical graph for extraction is a chain, for which a simple forward-backward algorithm gives the optimal labeling. As we exploit the clue that repeated words in a document are likely to get the same label, the chain transforms to a general graph on which exact inference is hard and existing approximation schemes quadratic or cubic. We present new approximate sub-quadratic algorithms for such graphs. Next, as we exploit clues arising out of close matches to existing entities in a database, the chain labeling problem becomes a segmentation problem. We present efficient algorithms for combining the search for close matches in a database with the search for the highest scoring segmentation, leading to significant performance improvements for database-driven information extraction tasks. Joint work with Amit Chandel, A A Diwan, Rahul Gupta, and P C Nagesh.
Speaker Profile:
List of Talks


Faculty CSE IT
Forgot Password
    [+] Sitemap     Feedback