The kernel contains the core rendering routines, where most of the render time is spent. The code here can be compiled as both C++ and CUDA, with support for OpenCL planned. In order to support this, we must be careful to only use language features that are supported for all targets, with a few macros thrown in to smooth over the language differences.
How this will evolve is unsure, for best performance we will need to start optimizing code for different targets. Currently we get away with the use of a "megakernel" for pathtracing, without complex memory writing, which keeps things fairly simple, but is unlikely to be optimal or possible for other algorithms.
In short, this currently means:
- C syntax
- Vector types like float3 or uchar4, with common operators
- OpenCL like qualifiers
- Constant memory for small amount of fixed parameters
- Texture memory for most read-only data
- No call stack, no recursive functions
- No dynamic memory allocation
- No C++ features like classes, templates or references
- No doubles, only floats
The vector types are the same as OpenCL: float2, float3, float4, and similar for int, uint, uchar. Common operators like add, multiply, etc work as expected. For construction, use the make_*() functions, for example:
float3 v = make_float3(0.0f, 1.0f, 0.0f);
For OpenCL, these vector types are built into the language. For C++ and CUDA we define the necessary classes and operator overloading to implement them.
- __device: for functions, all kernel functions should use this
- __global: for pointers to global memory (mostly function parameters)
- __local: explicitly places stack variables in local memory
- __shared: shared memory (not used yet)
- __constant: constant memory
These are defined as macros for each target. Some targets may define some qualifiers as empty, or may accept them only in particular contexts, so it's best to test compiling all.
Constant Memory and Textures
Small, fixed size data is stored in constant memory. KernelData kernel_data contains all constant memory, and is available as a global variable everywhere.
All large read-only data is stored as texture. These textures are accessible as global variables, and must be accessed through the kernel_tex_* functions. We use textures because texture reads are cached on all GPU hardware, on the CPU these become simple array lookups.