Talks & Seminars
Duplicate-Insensitive Computation of Aggregates over Massive Data Streams
Prof. Srikanta Tirthapura, Iowa State University
Date & Time: August 10, 2005 15:30
Venue: Lecture Hall, 'B' Block, 02nd Floor, KReSIT
Today's computer networks, whether IP networks, content delivery networks, or sensor networks, have to transmit and process massive amounts of data. Such data usually takes the form of a "stream", which has to be processed in a single pass through the data, using limited memory and processing power. In many cases, the data may be distributed, with each node in the network only getting a local data stream, and no node able to observe the entire data. Further, real data streams may consist of duplicate data, which has to be carefully processed to avoid over-counting while computing statistics.

In this talk, I will present new algorithms to estimate certain key aggregate functions on massive data streams in both centralized and distributed contexts. I will focus on a problem called duplicate-insensitive-sum: Given a stream of positive numbers from a finite domain (for example, temperature readings from sensors), compute the sum of all the distinct numbers in the stream. This basic problem has applications in diverse areas such as data aggregation in sensor networks, and IP network monitoring. Our algorithm combines two novel techniques: adaptive random sampling and range sampling. The resulting algorithm yields significantly improved performance bounds than previous work. Further, the techniques developed here seem to be more generally applicable; for example, they yield the currently best known performance guarantees for the computation of the max-dominance norm of multiple streams.
Speaker Profile:
Srikanta Tirthapura is currently an assistant professor of Computer Engineering at Iowa State University. He earlier received his Ph.D. from Brown University in 2002, and his B. Tech. from IIT Madras in 1996, both in Computer Science. He is interested in research in distributed data processing, and distributed coordination in wired and wireless networks. His work has appeared in leading conferences and journals in distributed computing.
