Tutorials

1. Using Weak isolation Levels in Database Applications

Duration: 1.5 hours
Presenter: Prof. Alan Fekete
School of Information Technologies
University of Sydney, Australia
Session Chair: Bala Iyer (IBM)

This tutorial will explain the issues and present the recent research advances relating to the properties of weak isolation levels, such as Read Committed or Snapshot Isolation. These levels are extensively used in practice, due to the excessive performance impact of strict two-phase locking. Every common DBMS provides as default behaviour, something less safe than serializable isolation! Indeed, in some of the most widespread products, even declaring an application as "SERIALIZABLE" does not guarantee that this will happen. However, many of the application programmers and DBAs are unaware of the impact of these weak mechanisms on data consistency. We show some ways for these IT professionals to reason about their application's logic, to determine whether or not a particular set of transactions can run safely when weak isolation is provided.

Biography: Prof. Alan Fekete has a BSc from the University of Sydney in Australia, and a PhD from the Math Department of Harvard University. He is Associate Professor in the School of Information Technologies at University of Sydney. He is the co-author of a monograph "Atomic Transactions" (with Lynch, Merritt and Weihl, published by Morgan Kaufmann). His recent work has focussed on the theme of "Consistency with Weak Mechanisms". Fekete's other research has been in the Theory of Distributed Systems, including fault-tolerance, replicated data and process group membership algorithms. He has served on program committees for ICDE, VLDB, PODS and CIKM.

2. Probabilistic Queries and Uncertain Data Management

Duration: 1.5 hours
Presenter: Prof. Sunil Prabhakar
Dept. of Computer Sciences
Purdue University, USA
Session Chair: Amarnath Gupta (UCSD)

While databases have traditionally been designed to handle exact values and queries typically return precise answers, there is a broad set of applications for which queries return probabilistic answers. These include applications where the data is inherently imprecise or uncertain. Examples of these applications include sensor and biological databases with inherent uncertainty in recorded values, queries over text databases, data that has been cleaned to improve quality, and obfuscated data for protecting privacy. There has been a renewed interest in handling uncertain data and probabilistic queries in order to address the growing needs of these applications. This tutorial aims to present the issues related to probabilistic queries and uncertain data management, summarize key research results in the area, and highlight some outstanding challenges.

Biography: Prof. Sunil Prabhakar is an Associate Professor of Computer Sciences at Purdue University. He received the Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology, Delhi in 1990, and M.S. and Ph.D. in Computer Science from the University of California, Santa Barbara in 1998. His research interests are in uncertain databases, sensor and streams databases, data privacy, and biological databases. He is a recepient of the NSF CAREER award. He is a senior member of the IEEE and serves on the editorial boards for the Distributed and Parallel Databases Journal.

3. From DNA to Tissue: Mining the Data In the Process

Duration: 3 hours
Presenter: Prof. Anthony Tung
Dept. of Computer Science
National Univ. of Singapore, Singapore
Session Chair: Sanjay Chawla (U. Sydney)

Recent development has lead to large amount of diverse biological datasets being generated including DNA/protein sequences, RNA/protein structures, proteomic/metabolic mass spectrometry data, gene expression data, protein-protein interaction graphs etc. In this tutorial, we will look at how these data are involved in charting the biological pathway and how data mining techniques are being used in extracting the relevant knowledge for that purpose. This tutorial is targeted at audience who are new to the field and who like to have a good overview of the role of various biological datasets in medical studies.

Biography: Prof. Anthony K. H. Tung is currently an Assistant Professor in the Department of Computer Science, National University of Singapore (NUS). He received both his B.Sc.(2nd Class Honour) and M.Sc. in computer sciences from the National University of Singapore in 1997 and 1998 respectively. In 2001, he receive the Ph.D. in computer sciences from Simon Fraser University (SFU). His research interests involve various aspects of databases and data mining.

4. Indexing Methods for Biological Sequences

Duration: 3 hours
Presenter: Prof. Srinivas Aluru
Dept. of Electrical and Computer Engineering
Iowa State University, USA
Session Chair: Vikram Pudi (IIIT, Hyderabad)

This tutorial will focus on indexing methods and query algorithms for biological sequence data. The tutorial begins with an introduction to various types of biological sequence data including DNA, cDNA, RNA, EST and protein sequences. A number of publicly available biological databases will be introduced and query requirements unique to biological sequences will be discussed. Several indexing methods will be presented including suffix trees, suffix arrays, SB-trees and indexing methods developed for approximate matching. The current state of biological database implementations will be reviewed and topics for further investigation will be identified. No background in biology is required to attend the tutorial.

Biography: Prof. Srinivas Aluru is a professor and associate chair for research in the Dept. of Electrical and Computer Engineering at Iowa State University. He also chairs Iowa State's Bioinformatics and Computational Biology graduate program. Earlier, he held faculty positions at Syracuse University and New Mexico State University. Prof. Aluru's esearch interests include parallel computing, computational biology, scientific computing and applied algorithms. He is a recipient of the NSF Career award in 1997, an IBM Faculty Award in 2002, and the Young Engineering Faculty Research Excellence Award from Iowa State in 2002. He is an IEEE Computer Society Distinguished Visitor.

5. Text Mining: Techniques and Applications

Duration: 1.5 hours
Presenter: Prof. Rohini Srihari
Dept. of Computer Science and Engineering
State Univ. of New York, Buffalo, USA
Session Chair: Sunita Sarawagi (IIT, Bombay)

The past decade has seen tremendous advances in the area of text mining, the process of deriving usable intelligence from text. The earlier approaches were based on information retrieval techniques and the underlying bag-of-words representation. Improvements to these basic techniques involved latent semantic analysis, and many commercial text mining tools are based on this. Recent approaches to text mining have focused on more granular and sophisticated techniques based on natural language processing, and information extraction (IE) in particular. IE systems have become the tools of choice for text mining in the intelligence community. Typically, IE tools are interfaced with tools such as link analysis and visualization in order to achieve the information discovery goals. This tutorial will cover all major approaches to text mining which span the disciplines of information retrieval (search), machine learning, and natural language processing. It will include a discussion of applications in diverse domains such as counterterrorism, bioinformatics, education and business analytics. It will conclude with some recent text mining models which combine IE and IR techniques; these use probabilistic graph models for finding connections between concepts across multiple documents.

Biography: Dr. Rohini K. Srihari is an Associate Professor in the Department of Computer Science & Engineering at the State University of New York at Buffalo. Dr. Sriharis current research focus is on advanced information retrieval and text mining. Dr. Srihari also works in the area of computational linguistics and information extraction. Much of this work has been through her role as founder and chief scientist at Cymfony Inc .,and more recently, as founder and CEO of Janya Inc., a company specializing in text analytics solutions. She received a B. Math. from the University of Waterloo, Canada and a Ph.D. in Computer Science from SUNY at Buffalo in 1992. She is the author of numerous publications in the area of multimedia information retrieval, information extraction and text mining.

6. Adaptive Query Processing

Duration: 1.5 hours
Presenters:
Dr. Vijayshankar Raman
IBM Almaden Research Center, USA
Dr.Amol Deshpande
University of Maryland
Session Chair: P. Sreenivasa Kumar (IIT Madras)

As databases grow in size, variety, and target enviornments, numerous limitations have surfaced in traditional cost-based query optimization: poor cost models, data correlations, changing system resources, changing data distributions, etc. One promising technique to tackle these limitations is to abandon the optimize-then-execute model of query processing, but instead interleave optimization and execution in an adaptive fashion. There has been a flurry of such adaptive query processing papers in recent years. Some propose evolutionary solutions such as changing query plans mid-flight, while others propose to do away with query plans altogether and instead route tuples adaptively. In this tutorial we will introduce these adaptive query processing ideas and seek to put them in a unifying context — what are the problems with query optimization, which methods address which problems, and in what enviroments is each method appropriate. Since the field is still in its infancy there are numerous unsolved problems, and a goal of this tutorial is to crisply state these problems and outline topics for future research. We will assume basic familiarity with relational algebra, dynamic programing optimization, and standard join algorithms.

Biography: Dr. Vijayshankar Raman is a research staff member at IBM's Almaden Research Center working on a variety of data management and query processing problems. His current focus is on Autonomic and Grid Computing and on Adaptive Query Processing. He is also interested in data cleaning and data integration. At IBM, Vijayshankar has developed the Progressive Optimization technique for changing query plans during execution. He has also developed a statistiics collection and mining tool for DB2 8.2. Before joining IBM, Vijayshankar finished a Ph.D in Computer Science at the University of California at Berkeley, and a B. Tech from IIT Madras. In his dissertation, Vijayshankar developed several new interactive and adaptive query processing algorithms. One of these on Online Reordering was protoyped in the Informix Dynamic Server DBMS, and was selected as one of the best papers at VLDB 99. Vijayshankar has also developed the data cleaning tool Potter's Wheel A-B-C.

Prof. Amol Deshpande is an Assistant Professor at the University of Maryland at College Park. He received his BTech degree from IIT Bombay in 1998, and his PhD from UC Berkeley in 2004. In his dissertation, he worked on handling estimation errors in database query processing through a variety of techniques such as sophisticated synopsis structures and adaptive query processing. His current research interests also include management of probabilistic, imprecise and incomplete databases, and query processing in sensor networks. His recent paper on applying probabilistic modeling techniques for data acquisition in sensor networks won the best paper award at the VLDB 2004 conference.