ATFGame's IdIosyncrasy: Programming - CG Acceleration Research

Abstract

The proposed research concerns hardware acceleration in interactive rendering of realistic and complex images using programmable graphics hardware. Computing images with global illumination is very expensive; the edge-and-point rendering system renders such images by sparsely sampling the scene and reconstructing the image using these sparse samples (points) while respecting perceptually important discontinuities (edges). The system produces high quality anti-aliased images at interactive rates.

Each frame, the system finds visually important discontinuities (edges) such as silhouettes and shadows. Shading samples (points) are computed using parallel processors that ray trace the scene and include complex shading effects such as shadows and global illumination. The edge-and-point system combines edges and points by performing the following operations: edge rasterization, point reprojection, edge-respecting interpolation, among others. These computations are parallelizable, have high arithmetic density and follow simple control flow, so we propose to move them to the GPU. This will free up the CPU to perform other useful tasks.

Introduction

The main focus of this research is to help achieve real-time performance in global illumination through hardware acceleration. Rendering images of complex scenes with expensive shading effects such as global illumination is slow. The edge-and-point system, introduced in SIGGRAPH 2003, renders such scenes by reconstructing pixel color from sparsely sampled points, achieving interactive rendering performance. An important aspect of the system is that interpolation between shading samples is never done across perceptual discontinuities such as silhouettes and shadows. The system can thus effectively render sharp, high quality images of complex scenes at interactive rates of 8-14 frames per second.

The edge-and-point system can achieve interactive speeds because the image reconstruction from samples (points) and discontinuities (edges) is separated from the computation of shading samples. That is, image reconstruction is done synchronously per frame, while shading samples points are done asynchronously as fast as possible. Since complex shading is slow, it is done through parallel processors that each trace a ray through the scene and compute the global illumination along that ray. Tracing each ray can be potentially slow, because each ray may result in many more rays being traced to compute shading. However, we do not propose to research shading acceleration because ray tracing is extremely parallelizable and scales well with the addition of more CPUs. Also, acceleration in parallel ray tracing is an active area of research.

But the image reconstruction from edges and points is entirely done on a single processor and could be accelerated by exploiting programmable graphics hardware. This is because the per-frame computations performed to reconstruct an image are highly parallelizable and have high arithmetic density with simple control flow. By carefully designing vertex and fragment shaders, it is possible to program the GPU to perform the said per-frame computations efficiently, independent of the CPU. As a result, the CPU is freed to perform more complex tasks. By balancing the load between the CPU and GPU, it may be possible to collect more samples from the scene thus improving performance and image quality.

Proposed Research Project

In this research, we will address the issue of achieving real-time performance in global illumination by using GPUs to evaluate the main components of the algorithm: finding discontinuities, combining the samples (points) and discontinuities (edges) with depth culling, determining which samples are reachable to a pixel (reachability), interpolating reachable samples, and anti-aliasing.

In the edge-and-point system, an intermediate display image, called the edge-and-point image (EPI), is used to interpolate between sparsely sampled points while respecting edges. The EPI is created by the following computational steps:

- Silhouette and shadow discontinuities are first found using hierarchical interval-based trees.

- Then, the found discontinuity edges are rasterized onto the EPI with sub-pixel accuracy, and each pixel records where these edges cross them. In the process, edges that are occluded are culled.

- Finally, returned shading samples are projected onto the image. When this process is completed, both edge and sample (point) information is compactly represented in the EPI.

Once the EPI is constructed, it can be used in reconstructing the final output. This is done by the following:

- A reachability map is created. The reachability map stores whether or not samples in a 5x5 pixel neighborhood are reachable from a given pixel. This information is used to interpolate the sparsely sampled points to find the color of a pixel, while respecting perceptual discontinuities.

- Because discontinuity information is already present, antialiasing can be performed at a relatively cheap cost.

All of the operations outlined above are performed on the CPU. However, they can be accelerated in the GPU. We have already implemented silhouette edge finding and found that the GPU performs much faster on these operations.

Objective

The primary object is to move the remaining per-frame computations onto the GPU. This can be done as described below:

- Object silhouettes change as the view changes. So for each frame, every edge needs to be checked if it is a silhouette. Since edges and their adjacent normals do not change (unless the scene is edited), we can encode edge data onto textures and store it on the GPU. Every frame, we can set up an image so that the fragment program is executed on each one of the edges stored in the texture.

- Shadows are cast by the light source and are view-independent. Therefore, we can compute shadow discontinuities only when an object is moved. Analytically finding shadow discontinuities is quite complex, and is not suitable to be performed on the GPU. So shadow edge finding should be performed on the CPU, while the GPU is working on other computations.

- Edge rasterization can also be performed on the GPU. For every edge, we can activate the appropriate pixels with a fragment program that performs intersection tests to record edge crossings at sub-pixel accuracy.

- We are not certain yet how finding a reachability map can be computed on the GPU. However, an important point to note is that, the performance of reachability map finding, interpolation, and antialiasing depend on the image size. While the CPU spends more time as the image size grows, GPUs scale much better with respect to image size. Moving reachability finding to the GPU should be very beneficial.

The secondary objective is to optimize for performance. The goal is to offload overhead from the CPU to the GPU to help achieve interactive rendering rates.

Challenges

Programmable GPUs are relatively new. In particular, writing shaders to perform custom computations on the GPU is a new idea. Therefore, there will be challenges in researching the specific details of the GPU and the new programming model.

There are also various engineering problems associated with optimizing performance. For instance, communication between the CPU and the GPU (through the AGP bus) incurs significant performance hits. Therefore, it may be necessary to keep as much information in the GPU as possible and only communicate with the CPU when absolutely necessary.

Bibliography

K. Bala, B. Walter, and D. Greenberg. Combining Edges and Points for Interactive High-Quality Rendering. Proc. SIGGRAPH '03, 2003.