GPU-based Global Illumination for Point Models using Fast Multipole Method

Home Publications Academics Upcoming conferences & more Work-Experience Personal Blog & Links

ABSTRACT

Advances in scanning technologies and rapidly growing complexity of geometric objects motivated the use of point-based geometry as an alternative surface representation, both for efficient rendering and for flexible geometry processing of highly complex 3D-models. Based on their fundamental simplicity, points have motivated a variety of research on topics such as shape modeling, object capturing, simplification, rendering and hybrid point-polygon methods.

Global Illumination for point models is an upcoming and an interesting problem to solve. We use the Fast Multipole Method (FMM), a robust technique for the evaluation of the combined effect of pairwise interactions of $n$ data sources, as the light transport kernel for inter-reflections, in point models, to compute a description -- illumination maps -- of the diffuse illumination. FMM, by itself, exhibits high amount of parallelism to be exploited for achieving multi-fold speed-ups.

Graphics Processing Units (GPUs), traditionally designed for performing graphics specific computations, now have fostered considerable interest in doing computations that go beyond computer graphics; general purpose computation on GPUs, or "GPGPU". GPUs may be viewed as data parallel compute co-processors that can provide significant improvements in computational performance especially for algorithms which exhibit sufficiently high amount of parallelism. One such algorithm is the Fast Multipole Method (FMM). This report describes in detail the strategies for parallelization of all phases of the FMM and discusses several techniques to optimize its computational performance on GPUs.

The heart of FMM lies in its clever use of its underlying data structure, the Octree. We present two novel algorithms for constructing octrees in parallel on GPUs and discuss their performance based on memory efficiency and running time. These algorithms can eventually be combined with the GPU-based parallel FMM framework.

Correct global illumination results for point models require knowledge of mutual point-pair visibility. Visibility Maps (V-maps) have been designed for the same. Parallel implementation of V-map on GPU offer considerable performance improvements and has been detailed in this report.

A complete global illumination solution for point models should cover both diffuse and specular (reflections, refractions, and caustics) effects. Diffuse global illumination is handled by generating illumination maps. For specular effects, we use the Splat-based Ray Tracing technique for handling reflections and refractions in the scene and generate Caustic Maps using optimized Photon generation and tracing algorithms. We further discuss a time-efficient kNN query solver required for fast retrieval of caustics photons while ray-traced rendering.

[report] [presentation]
MOTIVATION and PROBLEM DEFINITION

Photorealistic computer graphics attempts to match as closely as possible the rendering of a virtual scene with an actual photograph of the scene had it existed in the real world. Of the several techniques that are used to achieve this goal, physically-based approaches (i.e. those that attempt to simulate the actual physical process of illumination) provide the most striking results. My thesis is on the problem of producing global illumination (GI) solution for point models, which happens to be a photorealistic, physically-based approach central to computer graphics. For example, these three-dimensional point models may be scans of cultural heritage structures (see Figure above) and we may wish to view them in a virtual museum under various lighting conditions. Note that the point clouds of interest are not solitary models, but may consist of hard to segment entities thereby inhibiting a surface reconstruction algorithm that produces meshes.

PROBLEM DEFINITION: Capturing interreflection effects in a scene when the input is available as point samples.
WORK DONE
- We compute diffuse inter-reflections using the Fast Multipole Method}(FMM) (Done)
- Computing a mutual visibility solution for point pairs is one major and a necessary step for achieving good and correct global illumination effects (Done)
- CPU-based implementations of visibility and FMM algorithms were quite time consuming and hence did not have a high practical usage. Parallel implementation of visibility and FMM algorithms on Graphics Processing Units(GPUs) using CUDA was done so as to achieve multi-fold speedups for generating the diffuse global illumination solution (Done)
- We have three different implementations of a parallel octree construction algorithm on GPU which can be used with the GPU-based parallel FMM algorithm (Done)
- Inter-reflection effects include both diffuse (Done) and specular effects like reflections, refractions, and caustics. Capturing specular reflections is a part of work to be done in the coming year, which essentially, when combined with the diffuse inter-reflection implementation, will give a complete global illumination package for point models. Methods like Splat-based ray-tracing and Selective Photon Tracing will be used for the same.
PUBLICATIONS
TECHNICAL REPORTS
1. GPU-based, Parallel Octree Construction Algorithms, Rhushabh Goradia, Prekshu Ajmera, Sharat Chandran, Srinivas Aluru.

RESULTS

CPU-based V-map Construction (Sample Videos)

		Validates the proposed method of visibility map for point clouds
		Asserts the correctness of a visibility map in the context of global illumination

CPU-based Global Illumination for Point Models using FMM

We converge to the final Global Illumination solution shown in the figure below by performing the FMM algorithm (Upward and then Downlward passes) for all colors (RGB) three times. We can see that the solution converges to a good extent in 3 iterations.
GPU-based V-map Construction

Quality Results

Figure on the left shows a point model of an empty Cornell room with some artificial lighting to make the model visible. Note the default colors of the walls. We now introduce a bluish white Stanford bunny in the second figure, the eye (w.r.t. which visibility is being computed) is on the floor, marked with a cyan colored dot. The violet (purple) color indicates those portions of the room that are visible to this eye. Notice the "shadow" of the bunny on the back wall. The same idea is repeated with the eye (marked in cyan) placed at different locations for various different point models (all bluish white in color) of the Buddha, and an Indian Goddess Satyavathi. We found that an octree of height 8 gave us accurate visibility results, however, it is more prudent to go to higher depths such as 9 or 10. The results are visually similar to the ones generated by our CPU-based algorithm

Timing Results

Figure shows the running time of our implementation. Each graph in the figure refers to a particular model, and shows the only-CPU, and CPU-GPU combo running time for various octree levels (5-10). For example., Figure on the left (first row) shows results for the Stanford bunny in the Cornell room while the second figure shows the same for Buddha. The table also shows the number of leaves (at various depths) in the various adaptive octrees. This is an important parameter on which the degree of parallelism indirectly depends. The running times tabulated, also depends on the number of threads per block. A block size of 16x16 gives the best results with 256 threads per block.

The CPU and GPU have almost identical run times if the model has octree height of 6 or below. For best throughput from the GPU we need at least 16384 leaf-pairs to be considered concurrently which is generally not the case for octrees built till level 6. The GPU starts out performing the CPU for octrees with greater heights. However, the quality of the visibility solution is below par for octrees with heights 7 or below (refer the figure on the left below). Thus, to get a good acceptable accuracy in results and throughput from GPU, we use octree heights of at least 8. Speedup is also given in the tables. We achieve an average speed-up (across all models) of 15 when the input models are divided with octree of height 10. Thus, we see that the CUDA implementation of V-map construction algorithm is efficient and fast. Once constructed, they allow for an interactive walkthrough of the point model scene.

Dragon viewed from the floor (cyan dot). The quality is unacceptable for octrees of heights of 7 (left) or less. The figure on the right is for an octree of height 9
GPU-based Global Illumination for Point Models using FMM

Quality Results

First we compare the results obtained on the GPU for quality with the corresponding implementation on the CPU. To compare CPU and GPU implementations we use 3-d point models of bunny and Ganesha in a Cornell room each having four light sources on the ceiling. As we can see in the figure, the CPU-GPU results look identical. Global illumination effects like color bleeding and soft shadows are also clearly visible. Note that for FMM the visual quality of result does not depend on the kind of GPU used (NVIDIA 8800 GTS or Quadro FX 3700). The GPU should just support CUDA and have enough memory (more than 256Mbs).

Timing Results

The timing calculations are done on a machine having a dual core AMD Opteron 2210 processor with 2 Gbs of RAM, NVIDIA GeForce 8800 GTS with 320 Mbs of memory and Fedora Core 7 (x86\_64) installed on it. The total time taken by the upward and downward passes of the FMM algorithm for all 3 iterations and all 3 colors RGB is shown in the results below. The time taken by each iteration is approximately same.
GPU-based Parallel Adaptive Octree Construction (Top-Down)

We compare our implementation of octree on the GPU with the corresponding implementation on the CPU based on running time. We use 3-d points models of bunny and Ganesha in a Cornell room as inputs to create the octree.

We see that the GPU outperforms the CPU at higher levels. We implemented the top-down GPU-based parallel octree construction algorithm using the latest NVIDIA GPUs featuring support for atomic operations like atomic increment/decrement etc. These GPUs have G92 architecture. The machine used has a Intel Core 2 Duo 1.86 GHz with 2 Gbs of RAM, NVIDIA Quadro FX 3700 with 512 Mbs of memory and Fedora Core 7 installed on it.

The GPU-based parallel octree constructed in this fashion can be used with GPU-based parallel FMM frame-work.

CONCLUSION

Point-sampled geometry has gained significant interest due to their simplicity. The lack of connectivity touted as a plus, however, creates difficulties in many operations like generating global illumination effects. This becomes especially true when we have a complex scene consisting of several models, the data for which is available as hard to segment aggregated point-based models. Inter-reflections in such complex scenes requires knowledge of visibility between point pairs. Computing visibility for point models becomes all the more difficult, than for polygonal models, since we do not have any surface or object information.

Point-to-Point Visibility is arguably one of the most difficult problems in rendering since the interaction between two primitives depends on the rest of the scene. One way to reduce the difficulty is to consider clustering of regions such that their mutual visibility is resolved at a group level. Most scenes admit clustering, and the Visibility Map data structure we propose enables efficient answer to common rendering queries. We extended, in this report, the novel, provably efficient, hierarchical, visibility determination scheme for point based models to the highly parallel structures of modern day GPUs. By viewing this visibility map as a `preprocessing' step, photo-realistic global illumination rendering of complex point-based models have been shown. By extending the V-map construction algorithm on the GPU, efficient speed-ups have also been reported.

Further, we have used the Fast Multipole Method (FMM) as the light transport kernel for inter-reflections, in point models, to compute a description -- illumination maps -- of the diffuse illumination. Parallel implementation of FMM is a difficult task with data decomposition and communication efficiency being the major challenges. We discussed one such algorithm which uses only a static data decomposition on octrees and offers communication efficiency. We exploited the parallel computing power of GPUs for implementation of the Fast Multipole Method based radiosity kernel as well as the point-pair visibility determination algorithm using Visibility Maps to provide an efficient, fast inter-visibility and global illumination solution for point models. Different implementations of parallel octree construction on GPU were also presented which are to be eventually merged with the GPU-based parallel FMM algorithm.

A complete global illumination solution for point models should cover both diffuse and specular (reflections, refractions, and caustics) effects. Diffuse global illumination is handled by generating illumination maps. We, thus, further saw in the report how various algorithms from the literature were combined under a single domain to get us a time-efficient system designed to generate the desired specular effects for point models. We now aim to implement these algorithms, merge them together and get the specular effects solution for point models.

We, thus, will have a two--pass global illumination solver for point models. The input to the system will be a scene consisting of both diffuse and specular point models. First pass will calculate the diffuse illumination maps, followed by the second pass for specular effects. Finally, the scene will be rendered using splat-based ray-tracing technique. However, a question remains that since we are parting the diffuse and specular effect calculations for the scene, how would we handle specular objects (and their effects on diffuse objects) while calculating only diffuse global illumination (This issue is very well handled in Photon Mapping) in the first pass of the global illumination solver. This important issue needs to be investigated thoroughly.

PRESENTATION

[Poster(pdf)] [Poster(pptx)]

More information on research in computer vision, graphics and image processing at ViGIL, IIT Bombay can be found at ViGIL: Vision, Imaging and Graphics Laboratory

Last update: 25 Aug, 2008

Rhushabh Goradia