AutoMontage - Photo Sessions Made Easy

Its no more that only professional photographers who take pictures. Almost anyone has a good camera, and often takes lot of photographs. Group photographs as a photo session in reunions, conferences, weddings, and so on are de rigueur. It is difficult, however, for novice photographers to capture good expressions at the right time, and realize a consolidated acceptable picture.

A video shoot of the same scene ensures that expressions are not missed. Sharing the video, however, may not be the best solution. Besides the obvious bulk in the video, the poor expressions (false positives) are also willy nilly captured and might prove embarassing. A good compromise is to produce a mosaiced photographs assembling the good expressions, and discarding poor ones. This can be achieved by a cumbersome manual editing; in this paper, we provide an automated solution, illustrated in Figure. The photo shown has been created from a random youtube video excerpt and the photo shown does not exist in any frame of the original video.

Technical Contributions

The technical contribution to this work includes:

A frame analyzer that uses camera panning motion to generate candidate frames
An expression analyser from detected faces
A photo mosaicer that enables seamless placement of faces for group photos

Similar work on photographs is presented in the work [1], where user selects the part they want from each photo, which is merged using graph cut to create a final photo. Our work differs from [1] in the way of interesting part is selected from different frames. We use face expressions to select the faces to be used in the final photo montage. Hence the complete process is automated.

Methodology

The first step in this problem is to identify the frames which can be merged to create mosaic. We merge the frames with same camera movement, so that mosaicing is effective. Then we measure the facial expression of all the faces and select the faces with best expression. The selected faces are substituted in the mosaic image using graph cut and merging techniques post alignment. The following Figure illustrates these steps.

The steps involved in photo montage creation are

Photo Mosaic Creation
Facial Expression Measurement
Final Montage Creation

Photo Mosaic Creation

In our work, we first detect the camera movement direction by tracking the face's position and sizes. For each shot, we select the frames till the camera direction changes and then do a mosaic of the selected images to get final mosaic. The frame selection is done to ensures that the mosaic created is not fuzzy. This mosaic is further processed to create the final photo montage.

Facial Expression Measurement

Measuring facial expression plays an important role, as good facial expression yields good photo summary. Facial expression is measured as deviation from neutral expressions as illustrated in the Figure. We have manually collected around one hundred neutral expression faces for training our system. All these neutral expression faces are aligned using its facial features which is detected based on non-skin region within the face.

Neutral expression is learned as follows:

Lightning compensation on the neutral images are done by mapping normalized image values in the range [0.3 - 1] to the range [0 - 1].
Mean image is computed and subtracted from neutral faces.
Performing Principal Component Analysis and selecting top vectors to represent neutral expression. In our experiment, we have chosen top thirty vectors.

For computing Eigen face matrix, first all the P neutral faces of dimension m X n are converted to one dimensional array of dimension mn and stacked up to form face matrix of dimension P x mn, as shown in the following Figure.

Principal Component Analysis (PCA) is performed on the face matrix of dimension P x mn to get Eigen vector matrix of dimension mn x mn. The top Q Eigen vectors are selected to form Principal Component vectors (top Eigen vectors) of dimension mn x Q. This is illustrated in the following Figure.

As shown in the following Figure, The face matrix is then projected to this Eigen vector space to get Eigen Face matrix of dimension P x Q.

Once the top Eigen vectors and Eigen Face matrix are calculated, for each face detected, the facial expression measure is computed.

Lightning compensation on the face is done by mapping normalized image values in the range [0.3 - 1] to the range [0 - 1].
Eigen face is computed by projecting the one-dimensional array of the face into Eigen vector space. This yields a Eigen face of dimension $1$ X $Q$. This is illustrated in the following Figure.
Euclidean distance between the Eigen face and mean of Eigen face matrix is computed. This is used as one of the measure of facial expression.
Euclidean distance between the Eigen face and each face of Eigen face matrix are computed and minimum value is used as one of the measure of facial expression.
The final facial expression measure is computed as weighted sum of both the distance measures. In our experiments we give equal weightage to both these measures.

As faces with full face expression takes more skin area, amount of skin area in the face is also calculated as one of the distance measure and given a slight weightage.

Photo Montage Creation

For each shot, a photo montage is created by placing the best expression face in the mosaics created. We use a placement strategy to place the faces in to the mosaic image so that artifacts are minimal. This is illustrated in the following Figure.

Photo montage is created as follows.

The detected face's coordinates are mapped to the corresponding coordinates in the mosaic image.
The faces who has same mosaic coordinates are grouped and the face with maximum facial expression measure is selected.
The selected faces are placed into the mosaic image using the following placement strategy.
1. The selected faces are aligned with the one in the mosaic image using cross-correlation. This is done, as there could be transition in the face position. Otherwise the faces might be slightly displaced from the body.
2. On each sides of the face, graph cut is used to find the boundary of the new face. This is done to reduce rectangular artifacts around the face boundaries, as illustrated in the following Figure.
3. Around the graph-cut segmentation, image blending is done between the mosaic image and the selected face to get smooth transition.

We use the graph cut technique developed by Yuri Boykov et. al [2]. The mosaic's image's corresponding region is extracted and on the each side of the boundaries, graph cut is performed. The boundaries are tied to source node and assigned a hight weight $\infty$. Similarly the inner side of the sace is tied to destination node with high weight $\infty$. The in-between nodes are assigned the absolute difference of gradient level. Once the graph cut is computed, around the graph cut images are blended to give a smooth transition.