Tutorials




Tutorial 1


Title: Pattern Recognition Algorithms for Analysing Biological Data


Duration: 2 hours.


Speaker: Prof. Sanghamitra Bandyopadhyay

Sanghamitra Bandyopadhyay is a Professor in the Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India. She obtained her B. Sc. degree in Physics from Presidency College, Kolkata, B. Tech. in Computer Science and Engineering, Kolkata, M. Tech. in Computer Science and Technology from Indian Institute of Technology (IIT), Kharagpur and Ph.D. in Computer Science from Indian Statistical Institute, Kolkata in 1988, 1991,1993 and 1998 respectively. She has worked in Los Alamos National Laboratory (Los Alamos, USA), University of New South Wales (Sydney, Australia), University of Texas (Arlington, USA), University of Maryland Baltimore County (Baltimore, USA), Fraunhofer Institute (Sankt Augustin, Germany), Tsinghua University (Beijing, China) and La Sapeinza University of Rome (Rome, Italy). Sanghamitra has also visited Nice University (Nice, France), Monash University (Melbourne, Australia), University of Illinois (Chicago, USA) and Imperial College (London, UK), NUS and NTU (Singapore), University of Aizu (Japan), Open University, University Kebangsaan Malaysia (Kuala Lumpur, Malaysia) and ICTP, Italy. She has served on the program committee of several international workshops and conferences, and has delivered tutorials in a number of places.  She has more than 170 technical publications in international journals, and proceedings of conferences, workshops, symposia. Sanghamitra was the Program Co-Chair of the First International Conference on Pattern Recognition and Machine Intelligence, PReMI'05  held in ISI during December 18-22, 2005. She has edited a conference proceedings, three edited volumes and an authored book. She has served as the guest editor of special issues of the IEEE Transactions on Systems, Man and Cybernetics - B (Special issue on Distributed and Mobile Data Mining), and IETE Journal of Research (Special issue on Evolutionary Computation in Engineering Sciences). Sanghamitra is the recipient of several prestigious awards that include Humboldt Fellowship, Swarnajayanti Fellowship in Engineering Sciences from DST, Young Scientist Medal from both Indian National Science Academy, and Indian Science Congress, Young Engineers Award from Indian National Academy of Engineering, Dr. Shanker Dayal Sharma Gold Medal, IIT Kharagpur and Institute Silver Medal, IIT Kharagpur, and Prof. A.K. Chowdhury Memorial Award, Calcutta University. She is a Senior Member of IEEE.

Brief Description:
Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize and index the data, and for specialized tools to view and analyze the data. Bioinformatics can be viewed as the use of computational methods to make biological discoveries.  It is an interdisciplinary field involving biology, computer science, mathematics and statistics to analyze biological sequence data, genome content and arrangement, and to predict the function and structure of macromolecules.

Many problems of bioinformatics require searching through the biological data to extract patterns in it. Hence pattern recognition algorithms have found widespread applications in Bioinformatics. Typical algorithms include those of feature selection, classification and clustering. In this tutorial, we will first outline the basic principles of moelcular biology. Different types of biological data will be mentioned briefly. This will be followed by a description of some typical research areas where pattern recognition and computational intelligence methods have been successfully applied. Finally two pattern recognition algorithms will be described for analyzing biological data - one in the supervised framewok and the other in the unsupervised framework.

Outline of the presentation:
1. Basic molecular biology
2. Some biological data
3. Applications of some PR algorithms in Bioinformatics
4. Results
5. Summary and future work


Affiliation:

Professor
Machine Intelligence Unit
Indian Statistical Institute
Kolkata 700018.





Tutorial 2




Title: Filtering: Bilateral and Wavelet-Based.


Duration: 2 Hours


Speaker: Prof. Sharat Chandran



http:/www.cse.iitb.ac.in/~sharat


Sharat Chandran holds a doctorate in Computer Science from the University of Maryland (1989) and an undergraduate degree in Electrical and Electronics Engineering from the Indian Institute of Technology, Bombay (1984). He has held academic or engineering appointments at the Indian Institute of Technology, Bombay; University of Maryland, College Park (USA); Stanford University (USA), Oracle Corporation, Redwood Shores (USA), NTT (Japan), and Schlumberger Corporation, Austin (USA).


Dr. Chandran's research interests are in computer graphics and vision, and in parallel computation. In these fields he has made noteworthy contributions in the form of archival journal articles and peer reviewed conference publications. About 35 students have graduated at the Masters level or higher under his supervision; a similar number of students have graduated with an undergraduate "honors" degree.


Brief Description
Image filtering is a fundamental operation that has been studied in great detail over the years. The purpose of this tutorial is NOT to present any new innovative filtering, nor indeed to claim that *this* is the way to do filtering. Rather, it is to expose three fundamental tools -- that can possibly be useful elsewhere also. That said, these tools come from the recent study of Bilateral Filtering and Wavelets and Fast Gaussian filtering. This tutorial is intended for the novice; if the tutorial is successful, then the audience should be able to take away some fundamental concepts in filtering.


Affiliation:
Professor,
Indian Institute of Technology,
Bombay.


Tutorial 3



Title: Soft Computing and Pattern Recognition for Data Mining


Duration: 3-4 hours.


Speakers: Roberto Baragona and Ujjwal Maulik


 

Affiliation:


Dr. Roberto Baragona
Professor
University La Sapienza of Rome
Rome, ITALY


 Junior researcher (1973 - 1974) with a grant from the National Research Council, Italy. Holding a work contract (1975) with the University La Sapienza of Rome, Institute of Probability. In charge of the Bureau for National Transportation Statistics (1976 - 1982) by the Department for Transport, Italy. Senior researcher since 1982 with the University of Rome, Department of Statistics, Probability and Applied Statistics. Visiting fellow in 1991 at Department of Statistics and Computational Mathematics, Liverpool University (UK). Associate professor (1992 - 1994) at the University of Trieste, Italy, Faculty of Economics, with the Department of Economic and Statistical Sciences, later (1995) again at the University of Rome, with the Department of Sociology, now Department of Sociology and Communication. Since 2000 he is a full professor in University La Sapienza of Rome. Has participated to the Summer School  “Evolution: from Biology to Statistical Modeling” July, 3-9, 2004, Erice (Italy) Ettore Majorana Center and has given a lecture on Genetic algorithms for time series analysis. During his sabbatical leave in 2006/2007 has visited the Department of Statistics of Venice University and the  “European Center for Living Technology” Laboratory, Venice, Italy. He is a member of the COMISEF research project (Computational Optimization Methods in Ststistics, Econometrics and Finance), a MARIE CURIE Research Training Network (RTN) that involves 12 academic institutions in the European Union (the project has started on December 2006). During the tutorial “Advanced Econometrics” held in Rome, June 2008, has given a lecture on Meta-heuristic and evolutionary methods in time series. He's a member of the teaching staff of the PhD on Social theory and research. He is the Associate Editor of the Journal of Statistical Methods and Applications His research interest includes time series analysis, multivariate statistics, meta-heuristic method and  Evolutionary computing where he has publisded extensively in the last fifteen years.


Dr. Ujjwal Maulik
Professor
Jadavpur University
Kolkata , INDIA


Dr. Ujjwal Maulik did his Masters and Ph.D in Computer Science in 1991 and 1997 respectively. He is currently a professor in the Department of Computer Science and Technology, Jadavpur University.  He has served as the Head of the Computer Science and Technology Department of Kalyani Government Engineering College during 1996-1999. Dr. Maulik has worked in Center for Adaptive Systems Application and Los Alamos National Laboratories, Los Alamos, New Mexico, USA, in 1997, University of New South Wales, Sydney, Australia in 1999, University of Texas at Arlington, USA in 2001, University of Maryland at Baltimore county, USA in 2004, AIS laboratory in Fraunhofer Institute, in 2005, Tingsua University, Chaina in 2007, University of Rome, Italy in 2008, German Cancer Research Center (DKFZ) and University of Heidelberg in 2009.  He has received postdoctoral BOYSCAST fellowship from the Dept. of Science and Technology, Govt. of India in 2001. Dr. Maulik is a Fellow of Institution of Electronics and Telecommunication Engineers (IETE), India as well as Institute of Engineers (IE), India and a senior member of Institute of Electrical and Electronics Engineers (IEEE), USA. He has co-authored/edited several books and around 150 technical articles in international journals, book chapters and conference/workshop proceedings. He has served on the program committees of several International Conferences, and has delivered many invited talks and tutorials around the world. His research interests include, Soft Computing, Pattern Recognition, Data Mining, Bioinformatics and Parallel and Distributed Systems.


Brief Description:

      Soft computing is a consortium of methodologies that work synergistically and provides, in one form or another, flexible information processing capabilities for handling real life ambiguous situations. Its aim, unlike conventional (hard) computing, is to exploit the tolerance for imprecision, uncertainty, approximate reasoning and partial truth in order to achieve tractability, robustness, low solution cost, and close resemblance with human like decision-making. At this juncture, Fuzzy Sets (FS), Artificial Neural Networks (ANN), Evolutionary Algorithms (EAs) (including genetic algorithms (GAs), genetic programming (GP), evolutionary strategies (ES)), Support Vector Machines (SVM), Wavelets, Rough Sets (RS), Simulated Annealing (SA), Swarm Optimization (SO), Memetic Algorithms (MA), Ant Colony Optimization (ACO), Tabu Search (TS), Chaos Theory and Case Based Reasoning (CBR) are the major components of Soft Computing.


          Pattern recognition and machine learning form a major area of research and development activity that encompasses the processing of pictorial and other non-numerical information obtained from the interaction between science, technology and society. Machine recognition of patterns can be viewed as a two-fold task, comprising learning the invariant properties of a set of
samples characterizing a class, and of deciding that a new sample is a possible member of the class by noting that it has properties common to those of the set of samples. The tasks required for developing and implementing a decision rule can be described as a transformation from the measurement space (M) to the feature space (F) and finally to the decision space (D), i.e.,
M --> F --> D. Here the mapping delta: F --> D is the decision function, and the elements d in D are termed as decisions. A typical pattern recognition system consists of three phases: data acquisition, feature extraction and classification. In the data acquisition phase, depending on the environment within which the objects are to be classified, data are gathered using a set of sensors. These are then passed on to the feature extraction phase, where the dimensionality of the data is reduced by measuring/retaining only some characteristic features or properties. In a broader perspective, this stage significantly influences the entire recognition process. Finally, in the classification phase, the extracted features are passed on to the classifier that evaluates the incoming information and makes a final decision. This phase basically establishes a transformation between the features and the classes. Depending on the availability or unavailability of labeled data, classification can either be supervised, e.g., k-NN rule, Bayes classifier, or unsupervised, e.g., k-means, single linkage or graph based clustering.


          The growth in the amount of data collected and generated has exploded in recent times with the widespread automation of various day-to-day activities, advances in high-level scientific and engineering research and development of efficient data collection tools. This has given rise to the need for automatically analyzing the data in order to extract knowledge from it, thereby making the data potentially more useful. Knowledge discovery and data mining (KDD) is the process of identifying valid, novel, potentially useful and ultimately understandable patterns from massive data repositories. It is a multi-disciplinary topic, drawing from several fields including expert systems, pattern recognition, machine learning and soft computing, intelligent databases, knowledge acquisition, case based reasoning and statistics.


          In this talk we will first describe the basic principles of some soft computing and pattern recognition techniques. Subsequently we will discuss how soft computing and pattern recognition techniques can be utilized for data mining and knowledge discovery. In this regard, particular emphasis will be placed on evolutionary algorithms both single as well as multi-objective, fuzzy theory and neural network. Real-life applications in the domain of satellite image classification as well as analysis of time series data including micro array gene expression data will be discussed in detail.


Outline of the presentation:

1.    Soft Computing Techniques including Genetic Algorithms both Single and Multi-objectice, Neural Network and Fuzzy Sets. Also we will discuss other techniques like Simulated Annealing, Differential Evolution etc.


2.    Pattern Recognition techniques both Supervised and unsupervised. Here we will discuss techniques like K-NN, Baysian Leaning, different clustering techniques like K-Means, Fuzzy C-Means, Hierarchical and Graph Based Clustering. Also we will discussed advanced clustering techniques like Multiobjective Clustering as well as application techniques like Support Vector Machine (SVM).


3.    We will also discuss some other issues of Data Mining like Association Rule etc.


4.    Real-life application in the domain of Satellite Image Classification as well as analyzing of other time series data including Gene Micro Array will be discussed.