In this report, we consider the problem of multiple lightweight devices monitoring unstructured data on the web. To do this effectively, we propose a 3-tiered architecture for a consolidated system that monitors changes in unstructured web data to cater to these clients (lightweight devices). Typically, our clients work under constrained resources like memory, network bandwidth, processing power, etc. This limits their ability to run applications that require timely and relevant web-data: applications like news readers, offline/online search engines, directories, or general web-page monitoring tools. The proposed system can deliver relevant and timely data to its clients by adapting to their profiles and usage statistics, which it uses to crawl and monitor the web effectively.
Our layered architecture includes modules to manage novelty detection, workload, monitoring and scheduling, client profiles, client usage statistics, data packaging and delivery, client connections, etc. In this report, we show how relevant ideas from related literature apply to the monitoring and crawling side of our system. We develop a new way of delivering content to lightwieght devices from the middle layer of our system under bandwidth constraints based on novelty of the content. We investigate various novelty detection algorithms, and how they are used in strategies to choose the best documents from the server.
.pdf
.ps
I am also interested in Economic theory, Algorithms, Web-Search, and Formal Methods. Still a toddler. Overall.
M.Tech. First Year seminar.
M.Tech. Foundations Lab project.