Our method is built upon this concept. We detect lead stars considering such important scenes of the video. To reduce false positives and negatives, our method clusters the faces for each important scenes separately and then combines the results. Unlike other methods, our method provides a natural segmentation for clustering. Our method is shown to considerably reduce the computation time of the previously mentioned state-of-the-art for computing lead star in motion picture (a factor of 50). We apply this method to sports video to identify the player of the match, motion pictures to find heroes and heroines and TV show to detect guest and host.

The first step in the problem is to find important scenes which have audio highlights. Once such important scenes are identified, they are further examined for potential faces. Once a potential face is found in a frame, subsequent frames are further analyzed for false alarms using concepts from tracking. At this point, several areas are identified as faces. Such confirmed faces are grouped into clusters to identify the lead stars.

Audio Highlight Detection

The intensity of a segment of an audio signal is summarized by the root-mean-square value. The audio track of a video is divided into windows of equal size and the rms value is computed for each audio window. From the resulting rms sequence, the rms ratio is computed for successive items in the sequence.

The rms ratio is marked as low when the value is below a user defined threshold. In our implementation, we use 5 as the threshold, and the video frames corresponding to such windows are considered as "important".

Finding & Tracking Potential People

Once important scenes are marked, we seek to identify people in the corresponding video segment. Fortunately there are well understood algorithms that detect faces in an image frame. We select a random frame within the window and detect faces using the Viola & Jones face detector.

Every face detected in the current frame is then voted for a confirmation by attempting to track them in subsequent frames in the window. Confirmed faces are stored for the next step in the processing in a data matrix. Confirmed faces from each highlight i, is stored in the corresponding data matrix Di as illustrated in the Figure.

Face Dictionary Formation

In this step, the confirmed faces are grouped based on their features. There are a variety of algorithms for dimensionality reduction, and subsequent grouping. We observe that the Principal Component Analysis (PCA) method has been successfully used for face recognition. We use PCA to extract feature vectors from Di and we use the k-means algorithm for clustering. The number of clusters is decided based on minimal mean square error. Representative faces from clusters of all highlights are clustered again to get the final set of clusters. The representative faces of these clusters forms the face dictionary.

At this point, we have a dictionary of faces, but not all faces belong to lead actors. We use the following parameters to shortlist the faces to form lead stars.

  1. The number of faces in the cluster. If a cluster (presumably of the same face) has a large cardinality, we give this cluster a high weightage.
  2. Position of the face with respect to center of the image. Lead stars are expected to be in the center of the image.
  3. Size of the detected face. Again, lead stars typically occupy a significant portion of the image.
  4. Duration for which the faces in the cluster occur in the current window as a fraction of the window size.

The face dictionary formed for the movie Titanic is shown in the Figure below. Our method has successfully detected the lead actors in movie. As can be noticed, along with lead stars, there are patches that have been misclassified as faces.