The project focuses on optimizations that help with typical animation production scenes that might be rendered in Gooseberry.
We've left out things like bidirectional/MLT because we should be avoiding such difficult lights setups (it's typically direct light + a few bounces), and things like irradiance caching because it's difficult to make flicker free and interact well with glossy shaders.
Further notes can be found here: Notes
To Do: gather about 3 self-contained production scenes on download.blender.org that we can use for benchmarking (Caminandes, Tears of Steel files?), and which are representative of what might be Gooseberry scenes.
It would be good to cover the following cases
- Outdoor scene
- Indoor scene
- Many lights
- Furry characters
- Subsurface scattering
- Smoke/fire volume
They shouldn't be too heavy in memory usage, and don't need the full samples set to render them noise free, just enough that the scene is useful for benchmarking.
Some benchmarks can be found here: Cycles benchmarks. They can be checked out with SVN.
There's a few different ways to approach optimizations:
- Low level optimizations: SSE/AVX, memory prefetching, ...
- Reduce memory usage: for bigger scenes and avoid cache misses
- Algorithmic optimizations: better BVH, handling many lights, shader optimization
- Sampling: avoid unneeded light paths, skip empty background areas, branched path tracer tweaks, ...
- Tricks: skip some light paths, blur background for diffuse light, alpha threshold for hair, ...
- Shaders: make a good set of shader groups for various common materials with production tricks.
Low Level Optimization
- Find places where we can use SSE/AVX instructions
- See if we can benefit from memory prefetching anywhere
- See if we can use fast inverse sqrt instructions (watch out for precision issues)
- Look at latest embree code
- Add second level BVH traversal function so hair and motion blur don't slow down traversal everywhere.
- Add multithreaded spatial splits builder so it becomes more usable, might be good as default then
- Use SIMD to intersect multiple triangles at once
- Store smooth normals only when used
- Store quads instead of triangles
- Hair segments storage needs to be separated from triangles to reduce memory usage
- Hair needs to be more tightly bounded so we can avoid intersections (spatial splits?)
- Use SIMD to intersect multiple hairs at once
- Try doing minimum width from camera POV before rendering instead of in the BVH traversal
- Faster traversal/intersection, and other renderers do it this way as well apparently
- For (shadow) rays from fur, it may be faster to intersect against triangles first, and then hairs.
- About half of the shadow rays will likely hit the base surface anyway.
- Transparency cutoff so we can stop shading after most of the light is blocked (for all types of rays)
- The hair BSDF gets replaced by a transparent BSDF for backfacing curve points. Shouldn't we skip the intersection altogether, but where exactly? Further it uses a fixed 1,1,1 weight for each BSDF which will blow up to fireflies if multiple such BSDFS are mixed.
- Minimum hair width: currently it uses stochastic termination based on the thickness of the hair rather than transparency when the hair is enlarged. This may be quite good for path tracing, if we used transparency we would stochastically continue or scatter anyway, and this avoids having to do multiple scene intersection calls or shader setups, instead doing it right in the intersection function. A problem also is that the stochastic termination does not use a QMC sequence currently, it's not entirely clear if that will work well or how to get that working. Some noise from this stochastic termination is hard to get rid of.
- For branched path tracing with fewer AA samples this may not be ideal, though to beat it we may need to record all intersections, and perhaps smarter behavior for hairs that are nearly fully covered by others. Each AA sample would get less variance but also more costly so it's tricky to find the right tradeoff.
- Add object attribute system so we can reduce object memory usage (for very high number of instances)
- Don't store vectors/colors in float4
- Convert CORNER to VERTEX attributes by splitting.
- A workaround for the terminator problem would reduce noise due to fewer rays going below the surface, some options:
- Flip ray direction above surface when it is below.
- Ignore backfaces of the same object (correct for closed meshes)
- Somehow keeps rays above the surface by remapping them and smooth blend when near the surface.
- Slightly increase the glossiness for camera rays based on ray differentials to avoid noisy sharp highlights
- Or increase it a lot for depth of field and motion blur?
- Tweak the formulas and magic values used for Filter Glossy to see if we can get it to behave better
- Option to disable glossy for indirect light on given BSDFs, or some factor to control the amount for direct and indirect.
- Add Constant folding for nodes where it is commonly useful
- Mipmapping and OIIO texture cache support
- More compact storage of image textures with fewer than 4 channels.
- Add frequency clamping or other ways to use ray differentials to filter perlin noise.
- Design a set of shader groups with production tricks, like
- Simpler texture or fixed color for indirect light or shadows for faster shader executions
- Shader that replaces glossy by diffuse for indirect light
- Fake shadows for glass to avoid caustics
- Hairs without transparency for indirect light
- Smoother light falloff to avoid fireflies for geometry near light sources
- For MIS of background, shader that makes area below horizon black to avoid unnecessarily sampling there.
Random Number Sequences
- Test correlated multi-jittered sampling on more complex scenes to see if it helps
- Can we get more memory coherence and fewer cache misses by evaluating pixels and their samples in some other order?
- Test if using the power heuristic instead of balance heuristic helps for combining BSDFs
- Test using the power heuristic for combining SSS falloffs
- If a MIS weight is near 0 or 1, can we round it to avoid the sample?
- Make light tree for quick lookup of lights that have some influence when there are many lights
- SSS rays could be optimized by ensuring the object is instanced and tracing only the sub-BVH for the object.
- Hair: look into other BSDF importance sampling papers (e.g. paper)
- Hair: dual scattering could be used as an approximation for faster indirect light
- Adaptive number of (AA) samples: unsure if this can work reliable, needs tests on real world scenes.
- Hair shading could cache and interpolate shading results at curve vertices
- Probably use a per-thread, least recently used cache with fixed number of entries
- For indirect glossy shaders this does not work however, only camera rays and diffuse, but optimized shaders could skip the glossy here or turn them into diffuse
- Option for lower resolution viewport render, or fewer bounces, etc.
- On retina or high DPI display, it may be good to use lower resolution by default
- Rather than progressively rendering samples by sample, have a mode where it renders progressively higher resolution
- Use one of the recent filtering techniques to denoise the image. These gives artifacts in final renders but can give a better preview with few samples.