User:OmarSquircleArt/GSoC2019/Documentation/Code Repetition: Performance Readability And Portability

While working on improving and extending procedural textures, I found it challenging to write performant, readable, and portable code while also reducing code repetition. This document outlines and discusses those challenges.

Portability in this document denote the facility of porting an implementation from one backend to another. In particular, the facility of writing code for all three backends GLSL, SVM, and OSL.

Problem
Most of the procedural textures we write can operate in 1D, 2D, 3D, or 4D dimensions. In most cases, the implementation is very similar and can be implemented using a Generic Programming approach. Unfortunately, non of the backends support generic programming. So we end up duplicating the same code four times, one time for each dimension. For instance, in GLSL and OSL, since the `noise` function is overloaded, the `noise_tubulance` function is exactly the same for all dimensions, the only difference is that the input vector `p` have a different type:

Moreover, SVM does not support function overloading, so the `noise` function will also have to be changed to `noise_[1|2|3|4]d`. This code duplication can be seen in `node_texture.h`. Similarly, the situation is the same for the Musgrave texture code, as can be seen in `node_musgrave_texture.osl`.

Solution
One solution to this problem would be to utilize the preprocessor to simulate generic programming. For instance, writing the `noise_turbulence` function as a macro as follows:

Alternatively, the implementation can be done in a separate file that uses preprocessor constants instead of types and function names. Then the implementation file can be included multiple times with different constant definitions. For instance, for SVM, the implementation file can be as follows:

Which can be included as follows:

The second approach is easier, cleaner, and more flexible, but it is harder to setup.

Eventually, I decided to duplicate the code for simplicity and readability. I also prepended the code with the following comment to make refactoring by other developers more optimal:

Macro Redefinitions
The SSE version of the Jenkins hash have a slightly different bit-rotate macro, while the `mix` and `final` macros are exactly the same. This forces us to redefine `mix` and `final` after redefining `rot`. This can be seen in `util_hash.h`.

Different Voronoi Dimensions
The Voronoi texture also have similar code for different dimensions, however, unlike the Musgrave and Noise functions, the Voronoi functions have different control structures for different dimensions. In particular, the 1D case have a single loop, the 2D case have 2 nested loops, the 3D case have 3 nested loops, and the 4D case have 4 nested loops.

Personally, I don't view this as code repetition as it seems unavoidable.

Different Voronoi Features
Currently, we support five Voronoi features. Initially, I tried to reduce code repetition by implementing all features in a single function. This didn't work out for the following reasons:


 * Search Kernel: The search kernel at the heart of the Voronoi algorithm was not constant. Smooth Voronoi required larger kernels, Distance To Voronoi Edges required two passes of big kernels, and N-Sphere Radius required two passes of small kernels. Moreover, GLSL compilers prefer constant expressions in loops for the purpose of compile-time optimizations and validation, in fact, the OpenGL ES specification doesn't even allow dynamic expressions in loops. So trying to dynamically control the loops based on required feature may not be the best thing to do.
 * Branching: The code branched a lot, which is bad for GPUs. Some branching is turned into conditional assignments like in the case of F1 voronoi, but more complicated branching can impact performance. (I am unsure if this issue make sense, since all wavefronts will choose the same branch, and thus there shouldn't be any stalling ...)
 * Compile Time: It appears shaders are recompiled every time a `GPU_constant` is changed (Which shouldn't happen?), having a big function would mean slower compile time.
 * Readability: The code became somewhat unreadable and not easy to refactor.

So, I eventually chose the simple approach of implementing everything independently.