1. Graph Mining Techniques and Their Applications

Duration: 3 hrs
Date and Time: December 17th, 11:30-13:00 and 14:30-16:00
Presenter: Sharma Chakravarty
Univ. of Texas at Arlington
Session Chair: P. Sreenivasa Kumar

Abstract :
In this tutorial, we present graph mining techniques and their relevance to a number of applications. Most of the currently used mining approaches assume transactional and other forms of data. However, there are a large number of applications for which relationships among data objects are extremely important. For these applications, use of conventional approaches results in loss of information that will critically affect the knowledge discovered. Mining techniques that preserve and exploit the domain characteristics are extremely important and graph mining is one such general purpose technique that uses a graph representation facilitating representation of complex relationships.
Graph mining, as opposed to transaction mining (association rules, decision trees and others), is suitable for mining structural data. Complex relationships that exist between entities can be faithfully represented using graphs. Associations between objects in a complex structure are easy to understand when represented graphically. Most importantly, the representation in graph format preserves structural information.
In this tutorial, we overview transactional mining techniques, contrast them with the requirements of applications, and introduce graph mining as an alternative approach for a large class of applications. In the first half of the tutorial, we present details of several graph mining approaches, such as Subdue, FSG, AGM, and gSpan. In the second half of the tutorial, we present scalability issues of graph mining and how SQL-based approaches can handle graph mining on very large data sizes. Finally, we present a novel application of graph mining for classifying documents (email, web, etc.).

Sharma Chakravarthy is Professor of Computer Science and Engineering Department at The University of Texas at Arlington, Texas. He established the Information Technology Laboratory at UT Arlington in Jan 2000 and currently heads it. Sharma Chakravarthy has also established the NSF-funded, Distributed and Parallel Computing Cluster (DPCC@UTA) at UT Arlington in 2003. He is the recipient of the university-level “Creative Outstanding Researcher” award for 2003 and the department level senior outstanding researcher award in 2002.
He is well known for his work on semantic query optimization, multiple query optimization, active databases (HiPAC project at CCA and Sentinel project at the University of Florida, Gainesville), and more recently scalability issues in graph mining and its applications. His group at UTA has developed DBSubdue and DB-FSG – scalable versions of corresponding approaches for graph mining, and InfoSift – a classification system for text, email, and web that uses graph mining techniques. His current research includes web technologies, stream data processing, complex event processing, mining and knowledge discovery – association, graph and text, push/pull technologies, web content monitoring, and information integration. He has published over 140 papers in refereed international journals and conference proceedings. He has given tutorial on a number of database topics, such as graph mining, database mining, active, real-time, distributed, object-oriented, and heterogeneous databases in North America, Europe, and Asia. He is listed in Who's Who Among South Asian Americans and Who's Who Among America's Teachers.
Prior to joining UTA, he was with the University of Florida, Gainesville. Prior to that, he worked as a Computer Scientist at the Computer Corporation of America (CCA) and as a Member, Technical Staff at Xerox Advanced Information Technology, Cambridge, MA. Sharma Chakrvarthy received the B.E. degree in Electrical Engineering from the Indian Institute of Science, Bangalore and M.Tech from IIT Bombay, India. He worked at TIFR (Tata Institute of Fundamental Research), Bombay, India for a few years. He received M.S. and Ph.D degrees from the University of Maryland in College park in 1981 and 1985, respectively.

2. Uncertain Clustering: Models, Methods and Applications (Slides)

Duration: 1.5 hrs
Date and Time: December 18th, 11:00-12:30
Presenter: Zhenjie Zhang and Anthony K. H. Tung
National University of Singapore
Session Chair: Sunita Sarawagi

Abstract :
Clustering analysis is a well studied topic in computer science with many different applications in data mining, information retrieval and electronic commerce. However, traditional clustering method can only be applied on data set with exact information. With the emergence of web-based applications in last decade, such as distributed relational database, traffic monitoring system and sensor network, more and more uncertain data are becoming ubiquitous in many real applications. No trivial solutions over such uncertain data is available on clustering problem, by extending conventional methods.
In this tutorial, we discuss some new studies on uncertain clustering from theories to applications. Several different basic computational models on uncertain clustering will be presented. The models satisfy the requirements of different applications, and are all independent to the clustering criterion and underlying calculation algorithm. Based on the models, we will show how they can be incorporated with some popular clustering algorithms, such as k-means algorithm. Given the methods above, some concrete example will be presented on how to monitor the k-means clustering over moving objects with less communication cost. Finally, we will discuss some extension from k-means algorithm to some more complicated clustering method, such as EM algorithm.