From BlenderWiki

Jump to: navigation, search
Note: This is an archived version of the Blender Developer Wiki. The current and active wiki is available on wiki.blender.org.

Results of my GSoC

Volume optimizations (~10% speedup)

Here you can find an overview of my Cycles work, during GSoC 2014. I worked on Cycles Performance and Memory optimizations.

All changes are in master already, and will be available in Blender 2.72. You can find more details inside the commit logs.

Volume optimizations

I optimized volume rendering code a bit, especially heterogeneous volumes. I also fixed up the code, so it can be compiled for GPU again.

  • Optimization of Heterogeneous Volume Shadows. (5af00a3d12cb)
  • Optimize Equi-Angular sampling using binary range search. (32a5313b4170)
  • Several fixes for the GPU, so Volume can be enabled again when compiling for CUDA. This seems to work fine, but same as Subsurface Scattering, enabling it causes a slowdown for the GPU kernel. It is therefore unclear when we can enable GPU Volume Rendering in an official build. (5aec61f8493f, 5fefc8478355)

AVX2 CPU Kernel

An AVX2 kernel was added, which is compiled with AVX2, FMA3, and BMI compiler flags. At the moment only Intel Haswell benefits from this, but future AMD CPUs will have these instructions as well. Gives a few percent speedup on these CPUs. (866c7fb6e63d)

XYZ nodes

Calculate Face Normal on the Fly

The Face normal can be calculated on the fly, with a simple cross product of the vertex coordinates. This way we save 3 float variables (48 byte) per mesh triangle, which in practise can be even more due to memory fragmentation and data alignment. The impact on performance is negligible (~1%). (49df707496e5)

Support for uchar attributes

Cycles' attribute system has been expanded to support uchar attributes. This is being used for Vertex Colors at the moment, which saves some memory (4 unsigned characters, instead of 4 floats per Vertex). (0ce3a755f83b)

Separate / Combine XYZ nodes

Two new nodes have been added, to separate and combine Vectors. This was already possible via the RGB counterparts, but now we have it natively in the UI, which is more logical. The internal implementation was refactored, so no overhead is added here. (3de3987ea190, 0c1b4c35cdf5)

Miscellaneous

  • Improve performance for scenes without transparent shaders. (14be4b506ac8)


Work in Progress

Apart from all the improvements mentioned above, I also worked on more experimental things, which are not finished or didn't turn out to be useful (only available in the soc-2014-cycles branch).

  • Optimize sin/cos calculations by using the sincos() function. This improved performance a bit on gcc, on clang this failed for some reason, also msvc and CUDA seem to have their own implementations of it. Will probably try again in the future. (4f96edd254c6)
  • Quad BVH. I looked into a QBVH implementation, based on some older code from the early Cycles days. The code here actually improved the performance in some files (Cornell Box was ~7% faster), but other scenes like the BMW one were slower. I don't think this will be useful in its current state, instead we should look into more sophisticated algorithms, like the QBVH traversal code from Embree. (590aae22a6cb)