Cycles supports multiple ray-tracing acceleration structures, depending on the device. When rendering with multiple devices, a different BVH may be built for each.
- Embree for CPUs
- OptiX BVH for hardware ray-tracing on NVIDIA GPUs
- Custom BVH implementation for everything else, like CUDA and HIP devices
Long term, we would like to remove the custom BVH, as we believe the CPU and GPU vendors can do a better job optimizing ray-tracing for their device. For now we have to keep it as a fallback.
Our custom BVH has poor performance with (fast moving) motion blur compared to other devices, as we only support motion blur at the primitive level, not at intermediate modes. This is another reason to replace it.
Rays are traced for various purpose, and each has its own behavior.
- Closest hit intersections, for camera and indirect light rays. One intersection is recorded, the closest one to the ray starting position.
- Shadow intersections. If an intersection is found with an primitive known to have an opaque material, traversal is stopped immediately and the ray is considered fully occluded. If an intersection with a (potentially) transparent primitive is found, we record up to N closest transparent intersections, to have their shader evaluated. If more than N transparent intersections are found, the ray will be traced again from a further starting point.
- Subsurface scattering intersections. Similar to closest hit intersections, however only intersects within the same object. Objects with subsurface scattering for this reason are always considered "instanced", such that they have a dedicated BVH that can be intersected directly.
- Ambient occlusion shader node intersections. Same as subsurface scattering intersections, finds a single closest hit.
- Bevel shader node intersections. Finds up to N intersections within the same object, nearby the current shading point. If more than N intersections are found, reservoir sampling is used to probabilistically pick a subset.
Ray starting from a triangle or curve may self-intersect. To avoid this problem, the ray start position gets a small offset along the geometric normal, similar to A Fast and Robust Method for Avoiding Self- Intersection. However due to precision issues with instancing and curves, we've had to increase this offset quite significantly, which can cause artifacts in other places.
When primitives are intersected by a ray from a large distance, precision may be poor. To improve this, we re-intersect triangles from a closer distance to improve the precision of point of intersection. This helps avoid some cases of self-intersection.
Primitives have a visibility bitmask, matched against a ray bitmask to determine if they should be intersected.
This used to implement ray visibility settings on objects. It is also used for the shadow catcher, where synthetic objects are excluded for certain rays.
For instancing and dynamic updates, we build a two level BVH. The BVH's are built independently for each mesh, and then a top level object BVH instances these meshes. This reduces tree quality but for instances leads to lower memory usage and for dynamic updates faster rebuilds as object as are transformed, added or removed.
With offline rendering, the triangles of non-instanced objects are transformed and placed in the top level of the tree.
If no new objects or triangles are added, rather than rebuilding the BVH entirely, we refit it with new coordinates. This means we keep the same tree structure, and only update the bounding boxes. As the coordinates deviate further from the original, the tree quality goes down.
Custom BVH Implementation
The code is based on an implementation from NVidia under the Apache license, with code adapted from Embree as well. This code includes spatial splits to make it more competitive with kd-trees. On top of that, we added support for instancing and motion blur, as well as dynamic updates through refitting and a two-level BVH.
This BVH used to support SIMD instructions, but since it is only for GPUs now that was removed. This BVH can still be used on the CPU for debugging.
The BVH is built based on the surface area heuristic (SAH) and spatial splits. Build performance is optimized with binning and multithreading. For traversal, the nodes from two levels are still packed into a single array. The BVH traversal algorithm is implemented in such a way that on the GPU, BVH node intersections can be performed coherently even when two threads are on different BVH levels.