Weekly Reports 2021
Week 1-2 4 - 15 January 2021
- Fixed remaining known issues in optimization patch (D9555), to be landed in master
- Started working on improvements to OpenSubDiv in Blender and Cycles, based on features present in Alembic
- face holes (faces that are deleted after the subdivision was made, not sure how useful, will need user feedback)
- vertex creasing
- studied how rendering subdivision could be done
- Reviewed D9961, to be landed
- Started working on generating more complicated files (with higher resolution objects) for testing the optimization work, and perhaps do a second round of optimization.
Week 3 18 - 22 January 2021
Kept working on making tests files for the optimization work with more complicated scenes and higher polygon count objects, and continued working on improving subdivision tools and rendering.
- Cycles: optimize device updates (bbe6d44928)
- Subdivision: add support for vertex creasing (D10145) (rework based on quick review in progress, but might need a redesign of how subdivision is handled in Blender to properly support such a feature)
Week 4 25 - 29 January 2021
- Continued profiling and trying to optimize Cycles updates and the Alembic procedural, defined some areas to look at next week.
- Continued work on subdivision a little bit.
- Worked on proxies for the Alembic procedural (D10197, patch to be updated)
- Bug fixes :
Week 5 1 - 5 February 2021
- Cycles: investigated ways to reduce memory usage for socket and attribute data (reference counting, constant value optimization), but this made ccl::array complex and slower, might retry in a different way
- Alembic procedural: added support for instancing
- Alembic procedural: started working on a cache policy
- Worked on documentation for the Cycles Node API, and the Cycles Scene in general
- Bug fixes :
- (unreported) Alembic procedural: fix crash when cancelling a render during synchronization (55c88e5582)
Week 5 8 - 12 February 2021
- Cycles: investigated ways to reduce data transfer to the GPU
- Alembic Procedural: continued work and testing for the cache policy
- Finalized D10197: Cycles: experimental integration of Alembic procedural in viewport rendering
- Bug fixes :
Week 6 15 - 19 February 2021
This week I continued working on memory usage and data transfers optimizations. Patches are mostly ready to be extracted for code reviews.
Patches will be for:
- partial updates of the devices arrays to only retransfer to the GPU(s) parts of the buffer that have been modified (in my files data transfers were cut by 60% at best, more can be achieved depending on the file).
- delta compression of the vertices and curve keys, this computes the difference in positions from the last update, stores this in 16-bits (instead of 32-bits) and sends that to the GPU where it is applied to the original memory (thus saving 50% of data transfer for those data)
- octahedron compression of the normals, this projects the normals onto a unit octahedron, stores the results on 8-bits achieving a 75% memory saving. The normals are decompressed as needed in the kernel (this is a lossy compression, so slight (unnoticeable to the eye) render differences may be had, one test is failing so I am going to try a different approach).
- lazy array allocation, this only allocates memory for arrays if we store more than one value in them. Arrays are used throughout the code base, however, the following arrays are allocated on the heap even when they may hold a single constant value:
- Hair.radius (stores a float for every curve key)
- Hair.curve_shader (stores an integer for every curve)
- Hair.used_shaders (stores the list of shaders used by the object, useless heap allocation if only one shader is used)
- Mesh.shader (stores an integer per triangle, redondant if all triangles use the same shader)
- Mesh.smooth (stores a bool per triangle, redondant if all triangles are smooth)
- Mesh.used_shaders (stores the list of shaders used by the object, useless heap allocation if only one shader is used)
- Mesh.subd_shader (stores an integer per polygon, redondant if all polygons are using the same shader)
- Mesh.subd_smooth (stores a bool per polygon, redondant if all polygons are smooth)
- ImageHandle.tile_slots (stores the UDIM tiles used by the Image, allocating on the heap is a bit much if only using a single tile)
In spring.blend, this saves about 140 Mb of memory (I forgot to note down the total amount of memory, so I cannot give a percentage).
Week 7 22 - 25 February 2021
This week I mainly revised some of the patches I worked on last week before sending them to code review (although I haven't set any reviewers yet, so I can tackle some remaining todos).
The patch for normals compression I talked about in my last report was not revised nor sent to code review yet.
I decided to stop working on lazy allocation for the time being, as there are too many edge cases, especially when compiling shaders, using lazy allocation there makes shader execution fail as we miss some data since it will only allocate if the value added to the array does not change from the first one.
I also did some profiling and tried to optimize Blender a bit, as opening/loading large files takes a while. Some of the hotspot cannot be optimized easily or at all (like computing bézier curves, or tesselation), but inlining more maths functions does help.
- Bug Fixes:
Week 8 1 - 5 March 2021
- Improvements to the Alembic procedural (fixed missing updates 00f218602d, and infinite update loop ac4d45dbf1)
- Finished first implementation of cache prefetching for the Alembic procedural, this uses a secondary cache to preload data for the next N frames, to avoid stalls when playing back animations in the viewport
- Cycles: made delta compression more robust and handle potential overflows gracefully, however it seems to slow down device update will need to investigate
- Also did bug some triaging, no fixes though.
Week 9 8 - 12 March 2021
This week was spent mostly making the Alembic procedural work for objects with dynamic topologies (e.g. fluid simulations), this included handling a change in data size and reducing memory usage as the procedural in this case will load redundant data: if the topology is static between frame 1 and 25 with a sudden change at frame 26, we will load 25 times the same data which is not nice.
Other work included starting to refactor the delta compression support: the initial implementation was simplistic and optimized, it worked but it did not handle potential integer overflows in the compressed representation. After making it more robust to those overflows, it would slow down the device update (we want to be faster); the culprit being the computation of the deltas. Sending deltas to the GPU and unpacking there is still faster than sending uncompressed data. So now the idea is to put the deltas in an attribute, cache that in the Alembic procedural, and simply copying and unpacking them to the device during the updates. That should remove the cost of computing the deltas, and speed up device updates again.
Week 10 15 - 19 March 2021
Somewhat slow week which was mostly spent refactoring the delta compression support based on the stakeholder's concerns and desires. Also did a lot of benchmarks to check if it is indeed faster which it is if we exclude the unconditional host memory update that I was doing. Generalized the compression formula so ranges outside of [-1.0, 1.0] are also supported.
Pre-computing the deltas is done in the procedural and stored in an attribute, however this has drawbacks: we lose the ability to have deltas and displacement or subdivision at the same time, but also we need to take care that the deltas are not applied on transformed data. Cycles will pre-apply the transformations in some cases, so we might need to also transform the deltas, or avoid applying the transformation if deltas can be done.
As an aside, I also did some profiling and exploratory work to speedup edit mode, seems like the bottleneck is in the data transfers to the GPU, where we upload more data than necessary. The Blender modeling might need to become smarter so the render engine may know exactly what was modified. This would be similar to what I did in Cycles. The modeling team is quite busy so I might pick this up, as it is tied to rendering and the community is desperately asking for improvements in this area.
Week 11 22 - 26 March 2021
This week was primarily spent fixing and improving the Alembic procedural caching logic based on the stakeholder's feedback and code review. The cache prefetching mechanism did not handle a lot of edge cases, essentially it was assuming that the data access was linear in time, that no jumps in frame greater than the amount of cached frames could occur. This would make us access data at out of bound indices, or produce empty geometries making Cycles crash. Some setting to disable cacheing was also added.
Some more work on delta compression to include min/max motion ranges into account, allowing objects with motion changes > 1.0 to also have deltas.
- Alembic procedural: deduplicate cached data accross frames (781f41f633)
- Procedural branch:
- Fix missing shader update when adding shaders to Alembic objects
- Add a way to ignore subdivision data from Alembic, those will be loaded as regular polygon meshes
- Make deltas an attribute, although I think they should be a standard socket as we also have sockets for min/max deltas
- Bug fixes:
Week 12 29 March - 2 April 2021
This week was mostly spent maintaining the Alembic procedural, fixing some bugs and design issues reported by the stakeholder, as well as looking into some more device optimizations for Cycles.
I started refactoring the cache system to be more robust, however after some discussion, it was decided to remove for the time being cache prefetching as it is a bit complicated to make work reliably and instead focus on making the current procedural nice and robust. The cache controls are now a simple boolean to enable caching, and a parameter to set the maximum cache size in megabytes, if the procedural cannot fit the entire file in memory within the limit, render is aborted.
I started refactoring the data reading routines to be more generic and handle edge cases, this has revealed a number of flaws in the system: shader assignments were ignoring changes in topology, subdivision objects were not reading UVs and were using more memory than necessary. This work is almost done, and will also allow to load custom attributes in a nice and generic manner.
We also had a discussion to make time handling more robust. Alembic is time based, which means that data has to be looked up using the time in seconds, instead of the frame time. This requires to know the correct frame rate (frame per second) used to create the file. This knowledge is not present in the Alembic archives, and so for now both the Blender importer and the Cycles procedural require an explicit FPS to be specified. Usually, this comes from the Scene settings.
However, it is possible to use different archives with different frame rates in the same scene, so using a single setting for all of them is wrong. We could have a setting on the Cache File in Blender to set this, but this requires to know in advance which FPS was used, however given that productions may reuse old files, this knowledge may have been lost. It should be possible though to detect this from the archive.
Outside of the procedural, I am also starting to refactor the attribute system in Cycles to be more generic, and allow for more granular device updates. Currently when adding or removing an attribute of some type (e.g. float2 for UVs) on a Geometry we also update other attribute types (e.g. the float3 attributes for generated coordinates). Another quirk is that the attributes' device data is also updated event when the attribute is not stored there (which is for now only the case for normals).
Next week, I shall continue working on those refactors, and improving the Blender Alembic importer to better support reloading a Cache File, and to change the Cache File to a different version (with a different file path).
- Bug fixes:
Week 13 5 - 9 April 2021
This week I continued the Alembic procedural refactor mentioned in last week's report, it is almost ready although it needs proper testing. There is already some better support for UV and vertex colors attributes for subdivision meshes, and custom attributes should be somewhat supported for regular and subdivided meshes.
I also spent some time debugging some issues reported by the project's main stakeholder. Mainly those were about race conditions in the multi device code, so we decided to remove some of my work until those race conditions are fixed.
One exploratory work I did to refocus on the API is to create a fuzzer for Cycles. Simply put the fuzzer will generate random scenes and ask for the render. For now, I only got it to execute a device update from the Scene, so actual renders are not performed. It is designed to be some sort of client application which uses Cycles, so it should not have access to or knowledge about Cycles' internals, which makes it a good candidate I think to study the API, and its quirks. This is not high priority for me, so I don't know when I am going to resume working on this, as it is more of a distraction from the project, but it did, for the few hours I spent on it, make me focus on the API and the documentation, and helped me find some corner cases. One issue I found is that shaders only used by procedurals defined by external applications are not compiled as the ShaderManager compiles shaders before the procedurals can generate their data. I already have a fix for that.
On another front, I added an operator on the MeshSequenceCache modifier to update the Scene frame range based on information found in the archive. This is useful to update the Scene frame range when reloading an Alembic cache which has a different animation timing.
Week 14 12 - 16 April 2021
This week I finished refactoring the data reading for the Alembic procedural. Will be committed once the patch is cleaned up a little, and all my tests file pass. So far so good, attribute reading is now nice and mostly generic, although it still needs a nice solution to handle index mapping between Alembic and Cycles data structures for Curves/Hair, but that needs example files.
For the Alembic procedural, some more data is precomputed and cached (mostly normals) as requested by the project's stakeholder in order to avoid computing this data in the device update and speed updates up a little bit.
I also improved the fix for shader detection for external Procedurals mentioned in last week's report and sent it to code review.
For subdivision rendering in the viewport, I investigated how to possibly render this in a compute pass as discussed in the module meeting, nothing came out of this yet due to time constraint, will start actually coding things next week.
- D10965 Cycles: use reference counting to detect used shaders.
Week 15 & 16 19 - 30 April 2021
The last two weeks were spent working on OpenSubdiv for the viewport, doing some tests in Cycles, and bringing the current state of the cycles_procedural_api branch to master by updating current patches or starting to extract patches for code review.
For OpenSubdiv, it appears to be a bit tricky to use the proper indices for the vertex buffer (which should be computed by the library) so I guess I would need to separate subdivision data into a dedicated GPUBatch; I am currently overwriting the regular surface batch. Normals also would have to be computed somehow in the geometry or vertex shader, as well as keeping the polygon/shader associations.
- Code Reviews
- Requested Changes D11127: Fix T87929: Cycles 'Indirect Only' collection property missing update.
Week 17, 18 & 19 - 3 - 22 may 2021
The past 3 weeks were mostly spent wrapping up and updating patches for the first installment of the Alembic procedural and associated Cycles optimizations. I also did some work on the Node API documentation, although it is in French and I would need to translate it. I am not much of an English writer, so I prefer to write such specs in French, before translating it. Hopefully translation from my native language will make a better read. No real progress were made for viewport subdivision. My current try to implement it is not working, I am still not sure where the issue is, or issues, most likely because it is too "shallow", it might need a deeper integration into the drawing code with a dedicated surface batch.
Since I forgot to do some weekly reports, I may have forgotten quite a few details.
- General :
- Bug Fixes :
- Patches :
- Updated D10197: Cycles: experimental integration of Alembic procedural in viewport rendering.
- D11154: Cycles: allow Optix BVH refit for background rendering.
- D11156: Alembic: operator to set the Scene frame range from the Archive time information.
- D11162: Alembic Procedural: setting to ignore subdivision.
- D11163: Alembic Procedural: basic cache control settings.
- D11373: Cycles: optimize attributes device updates.
Week 20 & 21 - 26 may - 4 June 2021
The past two weeks were spent revising some patches in code review and bringing the GPU implementation of OpenSubDiv to fruition.
For OpenSubDiv, I initially tried to rewrite the implementation using a custom batch cache for subdivision data as connectivity is not preserved in the current Mesh batch cache, however, a much simpler solution was found: adding an extra VBO to the current GPUBatches. This also made subdivision in edit mode a lot simpler to handle, as the cage (that which artists modify) can simply be created and drawn using the current code.
Until now I was using a geometry shader to create triangles from the subdivision data, this was removed and is currently being replaced by a compute shader since this is the direction that we want to go to (as geometry shaders complicate the shading code). Normals will have to be computed somehow in this compute shader and I am not sure how this would be done yet (maybe using a couple of passes for smooth normals). For now, I am using a CPU based triangulation and normal computation which requires copying the data from the GPU, processing it before reuploading it, which is degrading performance compared to the geometry shader.
Overall the implementation will need to be reviewed, until now I prefered implementing my own version of the subdivision routines, disregarding the currently available Blender OpenSubDiv API to make sure that any bug is my own, and to fully understand what is needed or not. Reusing the Blender OSD API, and extending with a GPUEvaluator is not so complicated and should be quick now that the code works as expected.
- Updated Patches :
- Plans for next week :
- finish the compute shader
- make use of the Blender OSD API and extend it
- implement the logic to generate triangles from an adaptive patch
- make UVs and vertex colors work
- fix missing subdivision updates and data corruption in edit mode
Week 22 - 4 - 11 June 2021
Last week was mostly spent revising patches, and working on new Alembic features/improvments. For viewport subdivision, caching appears to be tricky, it needs a safe place to be stored, however storing it in the Mesh runtime data does not appear to work as it gets cleared with every single update. Maybe there is a way I don't know about yet to keep it around as long as the topology or subdivision settings stay the same.
- Revised Patches:
- New Patches: