1. Using Weak isolation Levels in Database Applications
Duration: 1.5 hours
Presenter: Prof. Alan Fekete
School of Information Technologies
University of Sydney, Australia
Session Chair: Bala Iyer (IBM)
This tutorial will explain the issues and present the recent research
advances relating to the properties of weak isolation levels, such as
Read Committed or Snapshot Isolation. These levels are extensively
used in practice, due to the excessive performance impact of strict
two-phase locking. Every common DBMS provides as default behaviour,
something less safe than serializable isolation! Indeed, in some
of the most widespread products, even declaring an application as
"SERIALIZABLE" does not guarantee that this will happen. However,
many of the application programmers and DBAs are unaware of the
impact of these weak mechanisms on data consistency. We show some
ways for these IT professionals to reason about their application's
logic, to determine whether or not a particular set of transactions
can run safely when weak isolation is provided.
Biography:
Prof. Alan Fekete has a BSc from the University of Sydney
in Australia, and a PhD from the Math Department of Harvard
University. He is Associate Professor in the School of Information
Technologies at University of Sydney. He is the co-author of a
monograph "Atomic Transactions" (with Lynch, Merritt and Weihl,
published by Morgan Kaufmann). His recent work has focussed on
the theme of "Consistency with Weak Mechanisms". Fekete's other
research has been in the Theory of Distributed Systems, including
fault-tolerance, replicated data and process group membership
algorithms. He has served on program committees for ICDE, VLDB,
PODS and CIKM.
2. Probabilistic Queries and Uncertain Data Management
Duration: 1.5 hours
Presenter: Prof. Sunil Prabhakar
Dept. of Computer Sciences
Purdue University, USA
Session Chair: Amarnath Gupta (UCSD)
While databases have traditionally been designed to handle exact
values and queries typically return precise answers, there is a
broad set of applications for which queries return probabilistic
answers. These include applications where the data is inherently
imprecise or uncertain. Examples of these applications include
sensor and biological databases with inherent uncertainty in
recorded values, queries over text databases, data that has been
cleaned to improve quality, and obfuscated data for protecting
privacy. There has been a renewed interest in handling uncertain
data and probabilistic queries in order to address the growing needs
of these applications. This tutorial aims to present the issues
related to probabilistic queries and uncertain data management,
summarize key research results in the area, and highlight some
outstanding challenges.
Biography:
Prof. Sunil Prabhakar is an Associate Professor of Computer Sciences
at Purdue University. He received the Bachelor of Technology in
Electrical Engineering from the Indian Institute of Technology,
Delhi in 1990, and M.S. and Ph.D. in Computer Science from the
University of California, Santa Barbara in 1998. His research
interests are in uncertain databases, sensor and streams databases,
data privacy, and biological databases. He is a recepient of the NSF
CAREER award. He is a senior member of the IEEE and serves on the
editorial boards for the Distributed and Parallel Databases Journal.
3. From DNA to Tissue: Mining the Data In the Process
Duration: 3 hours
Presenter: Prof. Anthony Tung
Dept. of Computer Science
National Univ. of Singapore, Singapore
Session Chair: Sanjay Chawla (U. Sydney)
Recent development has lead to large amount of diverse biological
datasets being generated including DNA/protein sequences,
RNA/protein structures, proteomic/metabolic mass spectrometry data,
gene expression data, protein-protein interaction graphs etc. In
this tutorial, we will look at how these data are involved in
charting the biological pathway and how data mining techniques
are being used in extracting the relevant knowledge for that
purpose. This tutorial is targeted at audience who are new to the
field and who like to have a good overview of the role of various
biological datasets in medical studies.
Biography:
Prof. Anthony K. H. Tung is currently an Assistant Professor
in the Department of Computer Science, National University of
Singapore (NUS). He received both his B.Sc.(2nd Class Honour)
and M.Sc. in computer sciences from the National University of
Singapore in 1997 and 1998 respectively. In 2001, he receive the
Ph.D. in computer sciences from Simon Fraser University (SFU).
His research interests involve various aspects of databases and
data mining.
4. Indexing Methods for Biological Sequences
Duration: 3 hours
Presenter: Prof. Srinivas Aluru
Dept. of Electrical and Computer Engineering
Iowa State University, USA
Session Chair: Vikram Pudi (IIIT, Hyderabad)
This tutorial will focus on indexing methods and query algorithms
for biological sequence data. The tutorial begins with an
introduction to various types of biological sequence data
including DNA, cDNA, RNA, EST and protein sequences. A number
of publicly available biological databases will be introduced
and query requirements unique to biological sequences will be
discussed. Several indexing methods will be presented including
suffix trees, suffix arrays, SB-trees and indexing methods
developed for approximate matching. The current state of biological
database implementations will be reviewed and topics for further
investigation will be identified. No background in biology is
required to attend the tutorial.
Biography:
Prof. Srinivas Aluru is a professor and associate chair for
research in the Dept. of Electrical and Computer Engineering at
Iowa State University. He also chairs Iowa State's Bioinformatics
and Computational Biology graduate program. Earlier, he held
faculty positions at Syracuse University and New Mexico State
University. Prof. Aluru's esearch interests include parallel
computing, computational biology, scientific computing and applied
algorithms. He is a recipient of the NSF Career award in 1997,
an IBM Faculty Award in 2002, and the Young Engineering Faculty
Research Excellence Award from Iowa State in 2002. He is an IEEE
Computer Society Distinguished Visitor.
5. Text Mining: Techniques and Applications
Duration: 1.5 hours
Presenter: Prof. Rohini Srihari
Dept. of Computer Science and Engineering
State Univ. of New York, Buffalo, USA
Session Chair: Sunita Sarawagi (IIT, Bombay)
The past decade has seen tremendous advances in the area of text
mining, the process of deriving usable intelligence from text.
The earlier approaches were based on information retrieval
techniques and the underlying bag-of-words representation.
Improvements to these basic techniques involved latent semantic
analysis, and many commercial text mining tools are based on this.
Recent approaches to text mining have focused on more granular and
sophisticated techniques based on natural language processing,
and information extraction (IE) in particular. IE systems have
become the tools of choice for text mining in the intelligence
community. Typically, IE tools are interfaced with tools such as
link analysis and visualization in order to achieve the information
discovery goals. This tutorial will cover all major approaches to
text mining which span the disciplines of information retrieval
(search), machine learning, and natural language processing.
It will include a discussion of applications in diverse domains
such as counterterrorism, bioinformatics, education and business
analytics. It will conclude with some recent text mining models
which combine IE and IR techniques; these use probabilistic graph
models for finding connections between concepts across multiple
documents.
Biography:
Dr. Rohini K. Srihari is an Associate Professor in the Department
of Computer Science & Engineering at the State University of New
York at Buffalo. Dr. Sriharis current research focus is on
advanced information retrieval and text mining. Dr. Srihari also
works in the area of computational linguistics and information
extraction. Much of this work has been through her role as
founder and chief scientist at Cymfony Inc .,and more recently,
as founder and CEO of Janya Inc., a company specializing in text
analytics solutions. She received a B. Math. from the University
of Waterloo, Canada and a Ph.D. in Computer Science from SUNY at
Buffalo in 1992. She is the author of numerous publications in the
area of multimedia information retrieval, information extraction
and text mining.
6. Adaptive Query Processing
Duration: 1.5 hours
Presenters:
Dr. Vijayshankar Raman
IBM Almaden Research Center, USA
Dr.Amol Deshpande
University of Maryland
Session Chair: P. Sreenivasa Kumar (IIT Madras)
As databases grow in size, variety, and target enviornments,
numerous limitations have surfaced in traditional cost-based query
optimization: poor cost models, data correlations, changing system
resources, changing data distributions, etc. One promising technique
to tackle these limitations is to abandon the optimize-then-execute
model of query processing, but instead interleave optimization and
execution in an adaptive fashion. There has been a flurry of such
adaptive query processing papers in recent years. Some propose
evolutionary solutions such as changing query plans mid-flight,
while others propose to do away with query plans altogether and
instead route tuples adaptively.
In this tutorial we will introduce these adaptive query processing
ideas and seek to put them in a unifying context — what are
the problems with query optimization, which methods address which
problems, and in what enviroments is each method appropriate. Since
the field is still in its infancy there are numerous unsolved
problems, and a goal of this tutorial is to crisply state these
problems and outline topics for future research. We will assume
basic familiarity with relational algebra, dynamic programing
optimization, and standard join algorithms.
Biography:
Dr. Vijayshankar Raman is a research staff member at IBM's Almaden
Research Center working on a variety of data management and query
processing problems. His current focus is on Autonomic and Grid
Computing and on Adaptive Query Processing. He is also interested
in data cleaning and data integration. At IBM, Vijayshankar has
developed the Progressive Optimization technique for changing
query plans during execution. He has also developed a statistiics
collection and mining tool for DB2 8.2. Before joining IBM,
Vijayshankar finished a Ph.D in Computer Science at the University
of California at Berkeley, and a B. Tech from IIT Madras. In
his dissertation, Vijayshankar developed several new interactive
and adaptive query processing algorithms. One of these on Online
Reordering was protoyped in the Informix Dynamic Server DBMS, and
was selected as one of the best papers at VLDB 99. Vijayshankar
has also developed the data cleaning tool Potter's Wheel A-B-C.
Prof. Amol Deshpande is an Assistant Professor at the University of
Maryland at College Park. He received his BTech degree from IIT Bombay
in 1998, and his PhD from UC Berkeley in 2004. In his dissertation, he
worked on handling estimation errors in database query processing through
a variety of techniques such as sophisticated synopsis structures and
adaptive query processing. His current research interests also include
management of probabilistic, imprecise and incomplete databases, and query
processing in sensor networks. His recent paper on applying probabilistic
modeling techniques for data acquisition in sensor networks won the best
paper award at the VLDB 2004 conference.