The Eighth Indian Conference on
Vision, Graphics and Image Processing

16th to 19th December, 2012

Plenary Talks

Peter Belhumeur | William T Freeman | Katsushi Ikeuchi | Manik Varma

  • Peter Belhumeur Columbia University, USA

    Peter N. Belhumeur is currently a Professor in the Department of Computer Science at Columbia University and the Director of the Laboratory for the Study of Visual Appearance (VAP LAB). He received a Sc.B. in Information Sciences from Brown University in 1985.

    He received his Ph.D. in Engineering Sciences from Harvard University under the direction of David Mumford in 1993. He was a postdoctoral fellow at the University of Cambridge's Isaac Newton Institute for Mathematical Sciences in 1994. He was made Assistant, Associate and Professor of Electrical Engineering at Yale University in 1994, 1998, and 2001, respectively. He joined Columbia University as a Professor of Computer Science in 2002.

    His research focus lies somewhere in the mix of computer vision, computer graphics, and computational photography. He is a recipient of the Presidential Early Career Award for Scientists and Engineers (PECASE) and the National Science Foundation Career Award. He won both the Siemens Best Paper Award at the IEEE Conference on Computer Vision and Pattern Recognition and the Olympus Prize at the European Conference of Computer Vision.

    Title: Describable Visual Attributes for Computer Vision


    Many recent advances in computer vision borrow from the field of taxonomy and its centuries-old technique for describing and classifying organisms. New computer vision methods have been developed for automatically labeling objects within images with a wide range of describable visual attributes. These visual attributes can then be used for numerous applications such as identification, image search, and photo editing. This talk will discuss the advantages (and disadvantages) of visual attributes and their success in real-world domains. In addition, this talk will present new work in discovering a dictionary of part-based attributors which focus on possibly unnameable visual attributes at object part locations. We demonstrate the usefulness of these attributors with new state-of-the-art results on verifying the identity of human faces in uncontrolled settings, identifying bird species in natural images, and determining dog breeds in iPhone photographs.

    Session Chair: Subhasis Chaudhuri, IIT Bombay, India

  • William T Freeman Massachusetts Institute of Technology, USA

    William T. Freeman is Professor of Electrical Engineering and Computer Science at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, joining the faculty in 2001.

    His current research interests include machine learning applied to computer vision, Bayesian models of visual perception, and computational photography. He received outstanding paper awards at computer vision or machine learning conferences in 1997, 2006, 2009 and 2012. Previous research topics include steerable filters and pyramids, the generic viewpoint assumption, color constancy, computer vision for computer games, and bilinear models for separating style and content.

    He is active in the program or organizing committees of computer vision, graphics, and machine learning conferences. He was the program co-chair for ICCV 2005, and is program co-chair for CVPR 2013.

    Title: Seeing things that are hard to see


    I will describe two recent projects that both seek to reveal things in images or videos that are otherwise difficult to see. Video magnification amplifies small motion or color changes in videos, allowing a real-time "microscope" to view otherwise invisible changes. (Joint work with Michael Rubinstein, Hao-Yu Wu, Neal Wadhwa, Eugene Shih, John Guttag, and Fredo Durand). The second project, accidental camera, reveals information about structures outside the frame of a photograph or a video through the relatively common formation of accidental cameras. (Joint work with Antonio Torralba).

    [Presentation Slides]

    Session Chair: Andrew Zisserman, University of Oxford, UK

  • Katsushi Ikeuchi University of Tokyo

    Katsushi Ikeuchi is a Professor at the Institute of Industrial Science, the University of Tokyo, Tokyo, Japan. He received the Ph.D. degree in Information Engineering from the University of Tokyo, Tokyo, Japan, in 1978. After working at the Artificial Intelligence Laboratory, MIT for three years, the Electrotechnical Laboratory, MITI for five years, and the School of Computer Science, Carnegie Mellon University for ten years, he joined the university in 1996.

    He has served as the program/general chairman of several international conferences, including 1995 IEEE-IROS (General), 1996 IEEE-CVPR (Program), 1999 IEEE-ITSC (General) and 2003 ICCV (Program). He is an Editor-in-Chief of the International Journal of Computer Vision. He has been a fellow of IEEE since 1998. He was selected as a distinguished lecture of IEEE SP society for the period of 2000-2001.

    He has received several awards, including the David Marr Prize in computational vision, and IEEE RA society K-S Fu memorial best transaction paper award. In addition, in 1992, his paper, "Numerical Shape from Shading and Occluding Boundaries," was selected as one of the most influential papers to have appeared in Artificial Intelligence Journal within the past ten years.

    Title: E-Heritage, Cyber-Archaeology, and Cloud Museum


    We have been conducting the e-Heritage project, which converts assets from out cultural heritage into digital forms, by using computer vision and computer graphics technologies. We hope to utilize such forms 1) for preservation in digital form of out irreplaceable treasures for future generations, 2) for planning and physical restoration, using digital forms as basic models from which we can manipulate data, 3) for cyber archaeology, i.e., investigation of digitized data through computer analysis, and 4) for education and promotion through multimedia contents based on the digital data. This talk briefly overviews out e-Heritage projects underway in Italy, Cambodia and Japan. We will explain what hardware and software issues have arisen, how to overcome them by designing new sensors using recent computer vision technologies, as well as how to process the data using computer graphics technologies. We will also explain how to use such data for archaeological analysis and review new findings. Finally, we will discuss a new way to display such digital data by using mixed reality systems, i.e., head-mounted displays on site, connected to cloud computers.

    Session Chair: P J Narayanan, IIIT Hyderabad, India

  • Manik Varma Microsoft Research, India

    Manik Varma received a bachelor's degree in Physics from St. Stephen's College, University of Delhi in 1997 and another one in Computation from the University of Oxford in 2000 on a Rhodes Scholarship. He then stayed on at Oxford on a University Scholarship and obtained a DPhil in Engineering in 2004. Before joining Microsoft Research India, he was a Post-Doctoral Fellow at MSRI Berkeley.

    He has been an Adjunct Professor at the Indian Institute of Technology (IIT) Delhi in the Computer Science and Engineering Department since 2009 and jointly in the School of Information Technology since 2011.

    His research interests lie in the areas of machine learning, computational advertising and computer vision. He has been invited to serve as an Area Chair for machine learning and computer vision conferences such as NIPS and ICCV.

    Title: On Internet Computer Vision


    We develop computer vision and machine learning techniques for tackling vision applications on the Web. Focussing on keyword based image search, we discuss how limitations of existing image search engines can be overcome by leveraging various types of meta-data. In particular, we demonstrate how click data can be used to go from a purely text based query, which restricts current image search engines to primarily textual features, to a visual representation allowing computer vision techniques to be brought to bear. We also show how contextual information and Web corpora can be used to diversify image search results and produce a visually rich depiction covering various aspects of the query. Finally, we tackle the online search advertising problem of recommending a set of phrases relevant to a given advertisement by posing the task as multi-label learning with millions of categories. We develop an extremely efficient classifier, with logarithmic prediction costs in the number of categories, which leverages click data and the ad's HTML context thereby allowing us to analyse visually rich ads that are beyond the pale of state-of-the-art natural language processing techniques.

    Session Chair: Bill Triggs, CNRS, France