Live Gesture Recognition

Aamod Kore, Nisheeth Lahoti, S S Kausik and Kshitij Singh Mehlawat
Computer Science and Engineering Department, IIT Bombay, Mumbai, MH 400076, INDIA

May - June 2012
This is our ITSP project, a live gesture recognition software, that can detect gestures with the use of webcam with good accuracy. The recognition system uses an algorithm to detect the trajectories of motion of nearby objects and reducing stray noise to detect the path of the gesture. It uses the OpenCV library for basic utilities like streaming input from the webcam. The project was displayed at IIT Bomnbay's Tech Exhibition and was covered on a national news channel. It received the best coding project award under the ITSP scheme.

Project Abstract

(View as PDF)


We wish to a Linux-based application for live motion gesture recognition using webcam input in C++. This project is a combination of live motion detection and gesture identification. This application uses the webcam to detect gesture made by the user and perform basic operations accordingly. The user has to perform a particular gesture. The webcam captures this and identifies the gesture, recognises it (against a set of known gestures) and performs the action corresponding to it. This application can be made to run in the background while the user runs other programs and applications. This is very useful for a hands-free approach. While it may not be of great use for browsing the web or writing a text document, it is useful in media player and while reading documents or files. A simple gesture could pause or play the movie or increase the volume even while sitting afar from the computer screen. One could easily scroll through an eBook or a presentation even while having lunch.

The project essentially consists of four parts :

We have used the OpenCV library for handling and manipulating input from the webcam. Difference between subsequent frames helps detect motion. We included further modifications to eliminate noise so that only the moving object (hand or finger) may be interpreted. The interpreted gesture is scanned against a set of known gesture to find which gesture matches the best. The action, either a system command or a keystroke, corresponding to the keystroke is then performed accordingly.

Various features of the code of the project are:

Important Links :

[Go to top]

Project Team


Piyush Kumar

Team Members

  1. Aamod Kore
  2. Kshitij Singh
  3. Nisheeth Lahoti
  4. S S Kausik
[Go to top]


The code continuosly streams video from the webcam and processes the frames of the video to recognise the gesture. This is done by subtracting the RGB value of the pixels of the previous frame from the RGB values of the pixels of the current frame. Then this image is converted to octachrome(8 colours only - red, blue, green, cyan, magenta, yellow, white, black). This makes most of the pixels neutral or grey. This is followed by the greying of those pixels not surrrounded by 20 non-grey pixels, in the function crosshair(IplImage* img1, IplImage* img2). The non-grey pixels that remain represent proper motion and noise is eliminated. A database is provided with the code which contains a set of points for each gesture. As the user performs the gesture, a set of points are generated (using the average of x & y coordinates of non-grey pixels in each frame) which are matched with the gestures in the databases to find the best match. To match the gestures the points are appropriately scaled as per their standard deviation and then corresponding points of the user's gesture and that from the database are compared. The gesture which has the least sum of squares of the differences between the correponding points is returned as the match for the gesture. According to the gesture recognised, certain set of commands are executed, like executing a keystroke or a particular system command.

[Go to top]

Usage and Implementation

Implementing the code is very simple. However sometimes executable files do not run correctly, in which case the code has to be compiled before running. The packages required for compiling the code are gcc, opencv-doc, libcv2.1, linhighgui2.1, libcvaux2.1, libcv-dev, libcvaux-dev, linhighgui-dev, libx11-dev, and libxtst-dev. These packages can be collectively installed from the Synaptic Package Manager or using individual system commands:

		$ sudo apt-get install [package-name]

After installing all the packages download the file, uncompress it, go to the directory and run the file:

		$ tar -zxvf glive.tar.gz
		$ cd glive
		/glive$ ./

Running the install file inturn compiles all the other required files, provided the required libraries are installed correctly and up-to-date.

(If the file 'installer' runs correctly you dont need to do this) Alternatively, you can compile all the files individually using the command:

		$ g++ `pkg-config opencv --cflags` [filename].cpp -o [filename] `pkg-config opencv --libs` -lX11 -lXtst

The files to be compiled are : initialize.cpp, main.cpp, gesture.cpp, addgesture.cpp, checkgesture.cpp and delgesture.cpp.

Before beginning run the file initialize:

		$ ./initialize
		$ ./gesture

Already existing gestures and their functions can be checked by running the command:
		$ ./checkgesture m
		 Load Succesfull : scmmd.bin
		 Gesture : m
		 Command : firefox &

		 ... Press ESC to continue ...

New gestures can be added by the command :

		$ ./add-gesture 

			 for example:
		$ ./addgesture z google-chrome

The must be a single character like 'w' or 'n', i.e. something like 'star' will not do for the . Also the system command must be a valid system command. If a gesture already exists for the character, it will be overwritten. Some characters have gestures fixed by default like 'u','d','l','r','s','m','o' etc, and cannot be overwritten.

For description of individual gestures and further details, read the User Manual before using the program

[Go to top]


NIL. It's completely a coding project.

Future Prospects and Modifications

This project has a vast arena of development , notably the Sixth Sense project of Pranav Mistry which comletely revolutionises the digital world. The code can be extended to incorporate mouse movements as well as still gestures. Further tweaks can be incorporated in the code to increase the efficiency of the gesture recognition process. The code can be improved for better interpretation and recognition of the gestures and newer gestures maybe incorporated for more functionalities. The user interface for adding and checking gestures as well as running the program can be improved greatly, e.g. providing an interactive GUI rather than using terminal commands.

[Go to top]