In this post we will look into a particular optimization for the raytracing algorithm, namely data structures. Once you got your raytracer up and running on the GPU, you will check for every ray, which triangles it intersects. In the last post we saw that checking all triangles for every ray is simply impossible. In order to save some time, we store the triangles in a spatial data structure, which is easier to query for a ray.
There are many possible data structures to store a set of triangles. The most frequently used ones for raytracing are: k-d trees, octrees and bounding volume hierarchies (BVH). Any one of them is going to speed up the process a LOT. If your ray would only hit the background, in both the octree and the BVH there would only be a single ray-box intersection and then we're done. That's already saving n triangle checks!
But this post wouldn't be interesting to just point to some data structures. In this post I'm going more in depth on the BVH, specifically using bounding boxes. Why bounding boxes? Because there is a very simple, computationally efficient test for ray-box intersections. No if-statements, so it's a very solid algorithm for massive parallel execution as well. Fits right in our GPU ray tracer! Now there are basically two types of BVHs. The LBVH (linear BVH) can be built really fast, but has less tracing performance than SAH (Surface Area Heuristic) BVHs. The SAH splits the set of triangles so that the surface areas of the two child spaces, weighted by the number of objects in each child, are equal.
Tero Karras from Nvidia research came up with a very cool algorithm to build LBVHs in parallel on the GPU. On the Nvidia blog you can find this and this post. It explains how you can traverse a BVH on the GPU, and some great performance tips to speed it up. Part three of the tutorial shows how you can build the LBVH on the GPU making use of the massive parallelism available. It's worth checking out the research paper on the subject, but below is an easier and even faster version. The basic principle of the paper is building a radix tree of a set of objects which are sorted using morton codes. This creates a spatial code by interleaving positional vectors in one integer or long. From this radix tree, you can precalculate where the tree would split, and thus process the whole tree in parallel.
This LBVH creation method is really fast: about 16ms for a million triangles is not uncommon. Even though, the algorithm could be way easier, as described in Fast and Simple Agglomerative LBVH Construction (download link). This paper shows that the same restrictions on the radix tree can be used to build the tree bottom-up instead of using an unnecessary top-down iteration. It shows that by comparing values with the neighbor, you can decide the parent node. It's even a bit faster than the method above, and a lot simpler to implement.
This is as fast as it's going to get for building times for BVHs. But how about tracing times? As mentioned above, the SAH is a good heuristic for ray tracing performance. This graph is an awesome comparison:
This graph is from another Nvidia paper: Fast Parallel Construction of High-Quality Bounding Volume Hierarchies. This is another paper on building LBVHs really really fast. They also mention an optimization in tracing speed by pre-splitting some large triangles. There is a lot more in depth information about building times and tracing times as well. They mention a lot of methods, but the main point from this image is to show that (H)LBVH has about 60~70% of the tracing speed in comparison with SAH BVHs. The building time is about 10~20 times slower than that of LBVHs though.
So how do we build these awesome BVHs using the SAH? Ingo Wald wrote a paper on building these kind of BVHs very fast. This algorithm is performed on the CPU in a top-down fashion. There are a lot of possibilities for performance boost which are also explained in the paper. The building times could be about 3 times as slow as building a LBVH by trading in some tracing performance for build optimizations. But, as shown in the graph above, this method still gives the best raytracing performance.
This image below shows a visualization of rays intersecting bounding boxes with 8 stanford dragon models. You can find the source code for this BVH in part 2.1.
I hope to have informed you enough about the possibilities of optimizing raytracing by using spatial data structures. There is some speculation about the creation of BVHs on the GPU, since the GPU will be working on tracing rays all the time anyway. More on this discussion can be read on ompf2.com (where you can also find an implementation of most of these methods).
Part 2.1: code sample of Fast and Simple Agglomerative LBVH Construction