Note: This is an archived version of the Blender Developer Wiki (archived 2024). The current developer documentation is available on developer.blender.org/docs.

User:KevinDietrich/Unfinished And Abandonned Work

Unfinished and abandonned work

This is a list of things I worked on during my development fund grants which were not completed for various reasons. Some of which I never shared or talked about in rendering meetings. Since I am shy, and I am not too confident in myself, I tend to not mention something if it is not finished, or if there is no milestone reached. Some of this work was never discussed with other Blender developers. My shyness also prevents me to simply ask questions, and sometimes I just do something for the sake of doing something, even if it is not productive at all. Other times, I just waste time going in the wrong direction.

Normals Compression

This uses octohedron compression to reduce memory usage for normals in the Cycles kernel. Memory is reduced down to 4 bytes per normal vector, instead of the 16 bytes required currently (although 12 bytes are only used for the data, the reamining 4 bytes are wasted for alignement purposes to make the code more portable). It is a technique used in game engine, data transfer libraries like Google's Draco library, and some production render engine like Solid Angle's Arnold.

  • Status: the panorama dicing test is failing, most likely due to the lossy nature of the compression. I tried revisiting the maths, which improved the situation but did not fix the failing test.
  • Code: private branch

Partial device buffer updates

When updating the scene, the device buffers are either recreated or partially filled with new data. Either way, the entire buffers are sent to the device. However, in the case where the data is only partially updated, we can simply send that to the device, which helps reducing data transfers to the devices, and improve update time. This would also work as if each Geometry had its own buffer, instead of a shared single buffer, and maybe cleaned up the code to make it more obvious that this or that class member is only for filling the shared buffer at a given position.

  • Status: this did not work out so well as Multi-Device is not thread-safe. Essentially, the partial buffer would have a pointer to its "parent" buffer, which has the right device pointer. When doing the partial update, we would take the parent's device pointer and add to it the child offset so that we copy the data at the right place. However, in Multi-Device, the parent buffer has its device overwritten, which invalidates the child device data. To fix this, maybe the map from device_memory to device_pointer in Multi-Device should be moved to device_memory, so each device_memory knows about their actual address on every device.
  • Code: a WIP patch was shared, but no reviewers were set as it was not finished (D10515), also present in the cycles_procedural_api branch

Delta Compression

This compares the previous values for the vertices and curves points, and only sends to the device the difference

  • Status: for this to be useful, we need to precompute the deltas and cache them on disk as computing them during Scene updates slows them down. Also, precomputation might remove the possibility to have displacement shaders or tesselation, and might conflict with Cycles applying the object transformation to the data before rendering if the deltas were computed without an applied transformation or if the object has a transformation change between frames.
  • Mentioned in: part of the main developement grant project
  • Code: a WIP patch was shared, but no reviewers were set as it was not finished (D10516), also present in the cycles_procedural_api branch

Fuzzer

The fuzzer works by generating a random scene and asks for it to be rendered. The purpose was to automate some testing, and discover new bugs that may be hard to create or reproduce in a production environment (e.g. some parameters that no one thought of using together which end up producing a bug). Cycles does not know about the fuzzer, and the fuzzer does not know about Cycles' internals. It was made to emulate someone using Cycles in their own software, so the fuzzer is only allowed access to Cycles' public API.

The fuzzer uses a procedural to generate the random nodes in the Cycles scene. It is thanks to this that I found out that procedurals defined by external software did not have their shaders compiled and rendered, which led to adding reference counting to the Cycles Nodes as a way to ensure that shaders are always compiled if they have at least one user (2577e31889, f9bc8c8ac5).

  • Status: it is not really useful as is, as it generates absolutely random data. To be more useful it would need to have access to a repository of Nodes and Shaders, and randomly assemble them in order to create random valid scenes. This might require some serialization for the Nodes, as well as extending the introspection data on Nodes to also include some information about the valid range of some data, and what are dependencies between sockets.

Reference counting for arrays

For the Alembic procedural, we build caches for every socket in the Nodes that we created, and later copy the value from the cache to the Node socket. This copy could be avoided by sharing the array between the Node socket and the cache. It is somewhat required to copy the array, instead of just swapping the pointers, so that the cache remains always in a valid state and we can safely partially update the cached data during live edits (e.g. requesting an attribute during look development which was not loaded yet). Using reference counting was meant to prevent accidentally freeing some data. Note that when passing array data to a Node socket, the socket takes ownership of the data and frees whatever data previously existed there, and we don't want it to free an array from the cache. Maybe we could also do some bookkeeping with respect to the origin of the data in the socket itself.

  • Status: the reference counting logic was not robust, leading to a lot of crashes.
  • Code: private branch

Constant value compression

There are several places in Cycles data structures where arrays store the same value, or simply store a single value. For example, on Meshes (or Volume bounding meshes) with only a single shader, we store a shader index, the same index, for every triangle. But we also store a single pointer to some shader in the used shaders arrays. Both of those memory allocations are useless. The goal was to avoid such allocations, and maybe improve cache coherency, when only a single value is stored in arrays, and only allocate the array when another, different value, was added to it.

  • Status: using the array class for both the Node sockets and the shader compilation would cause crashes. The logic for allocating arrays only when needed was making shader compilation fail as it was not robust.
  • Code: private branch

OptiX BVH build temporary memory optimization

When creating BVHs we use a lot of temporary memory. Those allocations are really costly, and we can reuse the temporary memory from one BVH build to the next, only growing the allocation when necessary.

  • Status: this was removed from the cycles_procedural_api branch in January, as the BVH building process was refactored to allow to use OptiX and CPU for rendering. This change made the OptiX device not thread-safe anymore, so reusing the memory from different thread was a bit tricky. However, this could have been reintroduced after a bug fix which enforced to OptiX BVHs to only be built one at a time, as multi-threading was consuming too much memory.
  • Mentioned in: N/A
  • Code: removed from the cycles_procedural_api

OptiX BVH build memory optimization

When creating BVHs (this is not the same as the previous point), we copy vertices and triangles to the device. However, we also copy the same data for the path tracing, through the kernel buffers. This change would make it so geometry data packing would happen prior to BVH build so that the OptiX device could reuse memory from the geometry data, instead of doing its own temporary copies.

  • Status: this would only work when no motion blur is available. For the motion blur case, we could optimize the data transfer for the motion vertices, but not the vertices and triangles, as those are stored differently. The main problem is lack of thread-safety for the Multi-Device case.
  • Mentioned in: N/A
  • Code: part of this remains in the cycles_procedural_api branch

Alembic procedural cache preloading

This would preload N frames from the disk while the previous N frames were rendered, to reduce overall memory usage, and speed up data updates.

  • Status: there were some design issues, leading to thread safety issues. So the code was removed from the cycles_procedural_api branch, while making sure that the rest of the code is nice and robust.
  • Mentioned in: part of the main development grant project
  • Code: removed from the cycles_procedural_api branch, some improvements were made in a separate private branch

Cycles API

Improvements to the Cycles API.

  • Status: this was taken over by Brecht Van Lommel for some reason. My last patch for it was a patch to encapsulate member access across the public API. The next patches would have been for moving header files to a public folder, and then simplify and improve the API itself. This is the order I wished things were made.
  • Mentioned in: part of the main development grant project
  • Code: patch was for the encapsulation (D10082), other works are in private branches

Cycles Nodes

This was to make use of Nodes for pretty much everything in Cycles

  • Status: Device and Image Nodes are not finished
  • Mentioned in: part of the main developement grant project
  • Code: patches were shared (device node D8750, image node D8649), code revision is in private branches

Cycles Node Definition Language

I wanted to make node definition more robust and extend the instrospection data to include some information for socket range, tooltips, etc. so that external software can more easily integrate Cycles without needing to come up with their own definitions for those. Dedicated page on the wiki.

Faster RNA updates

While profiling Cycles I noticed that the RNA was slow as well. Namely the type refinement is performed for every elements in arrays, which could be done for only the first one and remember it for the rest. This type refinement is the main bottleneck. This could make the code up to 10x faster.

  • Status: some changes were made for data arrays on Meshes (polygons, vertices), but for custom data layers, this was a bit trickier, and is simply unfinished.
  • Mentioned in: N/A
  • Code: was briefly in the cycles_procedural_api branch, removed and placed in a private branch

Blender optimizations

Blender has a lot of slow code paths, over the months I profiled and tried to optimize some of it. Those optimizations would include avoiding computing some data, to just inlining functions if the function call has a greater cost than the body of the function.

  • Status: some of it could be shared I think
  • Code: private branches, or deleted