Weekly Reports 2021
Week 1-2 4 - 15 January 2021
- Fixed remaining known issues in optimization patch (D9555), to be landed in master
- Started working on improvements to OpenSubDiv in Blender and Cycles, based on features present in Alembic
- face holes (faces that are deleted after the subdivision was made, not sure how useful, will need user feedback)
- vertex creasing
- studied how rendering subdivision could be done
- Reviewed D9961, to be landed
- Started working on generating more complicated files (with higher resolution objects) for testing the optimization work, and perhaps do a second round of optimization.
Week 3 18 - 22 January 2021
Kept working on making tests files for the optimization work with more complicated scenes and higher polygon count objects, and continued working on improving subdivision tools and rendering.
- Cycles: optimize device updates (bbe6d44928)
- Subdivision: add support for vertex creasing (D10145) (rework based on quick review in progress, but might need a redesign of how subdivision is handled in Blender to properly support such a feature)
Week 4 25 - 29 January 2021
- Continued profiling and trying to optimize Cycles updates and the Alembic procedural, defined some areas to look at next week.
- Continued work on subdivision a little bit.
- Worked on proxies for the Alembic procedural (D10197, patch to be updated)
- Bug fixes :
Week 5 1 - 5 February 2021
- Cycles: investigated ways to reduce memory usage for socket and attribute data (reference counting, constant value optimization), but this made ccl::array complex and slower, might retry in a different way
- Alembic procedural: added support for instancing
- Alembic procedural: started working on a cache policy
- Worked on documentation for the Cycles Node API, and the Cycles Scene in general
- Bug fixes :
- (unreported) Alembic procedural: fix crash when cancelling a render during synchronization (55c88e5582)
Week 5 8 - 12 February 2021
- Cycles: investigated ways to reduce data transfer to the GPU
- Alembic Procedural: continued work and testing for the cache policy
- Finalized D10197: Cycles: experimental integration of Alembic procedural in viewport rendering
- Bug fixes :
Week 6 15 - 19 February 2021
This week I continued working on memory usage and data transfers optimizations. Patches are mostly ready to be extracted for code reviews.
Patches will be for:
- partial updates of the devices arrays to only retransfer to the GPU(s) parts of the buffer that have been modified (in my files data transfers were cut by 60% at best, more can be achieved depending on the file).
- delta compression of the vertices and curve keys, this computes the difference in positions from the last update, stores this in 16-bits (instead of 32-bits) and sends that to the GPU where it is applied to the original memory (thus saving 50% of data transfer for those data)
- octahedron compression of the normals, this projects the normals onto a unit octahedron, stores the results on 8-bits achieving a 75% memory saving. The normals are decompressed as needed in the kernel (this is a lossy compression, so slight (unnoticeable to the eye) render differences may be had, one test is failing so I am going to try a different approach).
- lazy array allocation, this only allocates memory for arrays if we store more than one value in them. Arrays are used throughout the code base, however, the following arrays are allocated on the heap even when they may hold a single constant value:
- Hair.radius (stores a float for every curve key)
- Hair.curve_shader (stores an integer for every curve)
- Hair.used_shaders (stores the list of shaders used by the object, useless heap allocation if only one shader is used)
- Mesh.shader (stores an integer per triangle, redondant if all triangles use the same shader)
- Mesh.smooth (stores a bool per triangle, redondant if all triangles are smooth)
- Mesh.used_shaders (stores the list of shaders used by the object, useless heap allocation if only one shader is used)
- Mesh.subd_shader (stores an integer per polygon, redondant if all polygons are using the same shader)
- Mesh.subd_smooth (stores a bool per polygon, redondant if all polygons are smooth)
- ImageHandle.tile_slots (stores the UDIM tiles used by the Image, allocating on the heap is a bit much if only using a single tile)
In spring.blend, this saves about 140 Mb of memory (I forgot to note down the total amount of memory, so I cannot give a percentage).
Week 7 22 - 25 February 2021
This week I mainly revised some of the patches I worked on last week before sending them to code review (although I haven't set any reviewers yet, so I can tackle some remaining todos).
The patch for normals compression I talked about in my last report was not revised nor sent to code review yet.
I decided to stop working on lazy allocation for the time being, as there are too many edge cases, especially when compiling shaders, using lazy allocation there makes shader execution fail as we miss some data since it will only allocate if the value added to the array does not change from the first one.
I also did some profiling and tried to optimize Blender a bit, as opening/loading large files takes a while. Some of the hotspot cannot be optimized easily or at all (like computing bézier curves, or tesselation), but inlining more maths functions does help.
- Bug Fixes:
Week 8 1 - 5 March 2021
- Improvements to the Alembic procedural (fixed missing updates 00f218602d, and infinite update loop ac4d45dbf1)
- Finished first implementation of cache prefetching for the Alembic procedural, this uses a secondary cache to preload data for the next N frames, to avoid stalls when playing back animations in the viewport
- Cycles: made delta compression more robust and handle potential overflows gracefully, however it seems to slow down device update will need to investigate
- Also did bug some triaging, no fixes though.
Week 9 8 - 12 March 2021
This week was spent mostly making the Alembic procedural work for objects with dynamic topologies (e.g. fluid simulations), this included handling a change in data size and reducing memory usage as the procedural in this case will load redundant data: if the topology is static between frame 1 and 25 with a sudden change at frame 26, we will load 25 times the same data which is not nice.
Other work included starting to refactor the delta compression support: the initial implementation was simplistic and optimized, it worked but it did not handle potential integer overflows in the compressed representation. After making it more robust to those overflows, it would slow down the device update (we want to be faster); the culprit being the computation of the deltas. Sending deltas to the GPU and unpacking there is still faster than sending uncompressed data. So now the idea is to put the deltas in an attribute, cache that in the Alembic procedural, and simply copying and unpacking them to the device during the updates. That should remove the cost of computing the deltas, and speed up device updates again.
Week 10 15 - 19 March 2021
Somewhat slow week which was mostly spent refactoring the delta compression support based on the stakeholder's concerns and desires. Also did a lot of benchmarks to check if it is indeed faster which it is if we exclude the unconditional host memory update that I was doing. Generalized the compression formula so ranges outside of [-1.0, 1.0] are also supported.
Pre-computing the deltas is done in the procedural and stored in an attribute, however this has drawbacks: we lose the ability to have deltas and displacement or subdivision at the same time, but also we need to take care that the deltas are not applied on transformed data. Cycles will pre-apply the transformations in some cases, so we might need to also transform the deltas, or avoid applying the transformation if deltas can be done.
As an aside, I also did some profiling and exploratory work to speedup edit mode, seems like the bottleneck is in the data transfers to the GPU, where we upload more data than necessary. The Blender modeling might need to become smarter so the render engine may know exactly what was modified. This would be similar to what I did in Cycles. The modeling team is quite busy so I might pick this up, as it is tied to rendering and the community is desperately asking for improvements in this area.
Week 11 22 - 26 March 2021
This week was primarily spent fixing and improving the Alembic procedural caching logic based on the stakeholder's feedback and code review. The cache prefetching mechanism did not handle a lot of edge cases, essentially it was assuming that the data access was linear in time, that no jumps in frame greater than the amount of cached frames could occur. This would make us access data at out of bound indices, or produce empty geometries making Cycles crash. Some setting to disable cacheing was also added.
Some more work on delta compression to include min/max motion ranges into account, allowing objects with motion changes > 1.0 to also have deltas.
- Alembic procedural: deduplicate cached data accross frames (781f41f633)
- Procedural branch:
- Fix missing shader update when adding shaders to Alembic objects
- Add a way to ignore subdivision data from Alembic, those will be loaded as regular polygon meshes
- Make deltas an attribute, although I think they should be a standard socket as we also have sockets for min/max deltas
- Bug fixes:
Week 12 29 March - 2 April 2021
This week was mostly spent maintaining the Alembic procedural, fixing some bugs and design issues reported by the stakeholder, as well as looking into some more device optimizations for Cycles.
I started refactoring the cache system to be more robust, however after some discussion, it was decided to remove for the time being cache prefetching as it is a bit complicated to make work reliably and instead focus on making the current procedural nice and robust. The cache controls are now a simple boolean to enable caching, and a parameter to set the maximum cache size in megabytes, if the procedural cannot fit the entire file in memory within the limit, render is aborted.
I started refactoring the data reading routines to be more generic and handle edge cases, this has revealed a number of flaws in the system: shader assignments were ignoring changes in topology, subdivision objects were not reading UVs and were using more memory than necessary. This work is almost done, and will also allow to load custom attributes in a nice and generic manner.
We also had a discussion to make time handling more robust. Alembic is time based, which means that data has to be looked up using the time in seconds, instead of the frame time. This requires to know the correct frame rate (frame per second) used to create the file. This knowledge is not present in the Alembic archives, and so for now both the Blender importer and the Cycles procedural require an explicit FPS to be specified. Usually, this comes from the Scene settings.
However, it is possible to use different archives with different frame rates in the same scene, so using a single setting for all of them is wrong. We could have a setting on the Cache File in Blender to set this, but this requires to know in advance which FPS was used, however given that productions may reuse old files, this knowledge may have been lost. It should be possible though to detect this from the archive.
Outside of the procedural, I am also starting to refactor the attribute system in Cycles to be more generic, and allow for more granular device updates. Currently when adding or removing an attribute of some type (e.g. float2 for UVs) on a Geometry we also update other attribute types (e.g. the float3 attributes for generated coordinates). Another quirk is that the attributes' device data is also updated event when the attribute is not stored there (which is for now only the case for normals).
Next week, I shall continue working on those refactors, and improving the Blender Alembic importer to better support reloading a Cache File, and to change the Cache File to a different version (with a different file path).
- Bug fixes:
Week 13 5 - 9 April 2021
This week I continued the Alembic procedural refactor mentioned in last week's report, it is almost ready although it needs proper testing. There is already some better support for UV and vertex colors attributes for subdivision meshes, and custom attributes should be somewhat supported for regular and subdivided meshes.
I also spent some time debugging some issues reported by the project's main stakeholder. Mainly those were about race conditions in the multi device code, so we decided to remove some of my work until those race conditions are fixed.
One exploratory work I did to refocus on the API is to create a fuzzer for Cycles. Simply put the fuzzer will generate random scenes and ask for the render. For now, I only got it to execute a device update from the Scene, so actual renders are not performed. It is designed to be some sort of client application which uses Cycles, so it should not have access to or knowledge about Cycles' internals, which makes it a good candidate I think to study the API, and its quirks. This is not high priority for me, so I don't know when I am going to resume working on this, as it is more of a distraction from the project, but it did, for the few hours I spent on it, make me focus on the API and the documentation, and helped me find some corner cases. One issue I found is that shaders only used by procedurals defined by external applications are not compiled as the ShaderManager compiles shaders before the procedurals can generate their data. I already have a fix for that.
On another front, I added an operator on the MeshSequenceCache modifier to update the Scene frame range based on information found in the archive. This is useful to update the Scene frame range when reloading an Alembic cache which has a different animation timing.
Week 14 12 - 16 April 2021
This week I finished refactoring the data reading for the Alembic procedural. Will be committed once the patch is cleaned up a little, and all my tests file pass. So far so good, attribute reading is now nice and mostly generic, although it still needs a nice solution to handle index mapping between Alembic and Cycles data structures for Curves/Hair, but that needs example files.
For the Alembic procedural, some more data is precomputed and cached (mostly normals) as requested by the project's stakeholder in order to avoid computing this data in the device update and speed updates up a little bit.
I also improved the fix for shader detection for external Procedurals mentioned in last week's report and sent it to code review.
For subdivision rendering in the viewport, I investigated how to possibly render this in a compute pass as discussed in the module meeting, nothing came out of this yet due to time constraint, will start actually coding things next week.
- D10965 Cycles: use reference counting to detect used shaders.
Week 15 & 16 19 - 30 April 2021
The last two weeks were spent working on OpenSubdiv for the viewport, doing some tests in Cycles, and bringing the current state of the cycles_procedural_api branch to master by updating current patches or starting to extract patches for code review.
For OpenSubdiv, it appears to be a bit tricky to use the proper indices for the vertex buffer (which should be computed by the library) so I guess I would need to separate subdivision data into a dedicated GPUBatch; I am currently overwriting the regular surface batch. Normals also would have to be computed somehow in the geometry or vertex shader, as well as keeping the polygon/shader associations.
- Code Reviews
- Requested Changes D11127: Fix T87929: Cycles 'Indirect Only' collection property missing update.
Week 17, 18 & 19 - 3 - 22 may 2021
The past 3 weeks were mostly spent wrapping up and updating patches for the first installment of the Alembic procedural and associated Cycles optimizations. I also did some work on the Node API documentation, although it is in French and I would need to translate it. I am not much of an English writer, so I prefer to write such specs in French, before translating it. Hopefully translation from my native language will make a better read. No real progress were made for viewport subdivision. My current try to implement it is not working, I am still not sure where the issue is, or issues, most likely because it is too "shallow", it might need a deeper integration into the drawing code with a dedicated surface batch.
Since I forgot to do some weekly reports, I may have forgotten quite a few details.
- General :
- Bug Fixes :
- Patches :
- Updated D10197: Cycles: experimental integration of Alembic procedural in viewport rendering.
- D11154: Cycles: allow Optix BVH refit for background rendering.
- D11156: Alembic: operator to set the Scene frame range from the Archive time information.
- D11162: Alembic Procedural: setting to ignore subdivision.
- D11163: Alembic Procedural: basic cache control settings.
- D11373: Cycles: optimize attributes device updates.
Week 20 & 21 - 26 may - 4 June 2021
The past two weeks were spent revising some patches in code review and bringing the GPU implementation of OpenSubDiv to fruition.
For OpenSubDiv, I initially tried to rewrite the implementation using a custom batch cache for subdivision data as connectivity is not preserved in the current Mesh batch cache, however, a much simpler solution was found: adding an extra VBO to the current GPUBatches. This also made subdivision in edit mode a lot simpler to handle, as the cage (that which artists modify) can simply be created and drawn using the current code.
Until now I was using a geometry shader to create triangles from the subdivision data, this was removed and is currently being replaced by a compute shader since this is the direction that we want to go to (as geometry shaders complicate the shading code). Normals will have to be computed somehow in this compute shader and I am not sure how this would be done yet (maybe using a couple of passes for smooth normals). For now, I am using a CPU based triangulation and normal computation which requires copying the data from the GPU, processing it before reuploading it, which is degrading performance compared to the geometry shader.
Overall the implementation will need to be reviewed, until now I prefered implementing my own version of the subdivision routines, disregarding the currently available Blender OpenSubDiv API to make sure that any bug is my own, and to fully understand what is needed or not. Reusing the Blender OSD API, and extending with a GPUEvaluator is not so complicated and should be quick now that the code works as expected.
- Updated Patches :
- Plans for next week :
- finish the compute shader
- make use of the Blender OSD API and extend it
- implement the logic to generate triangles from an adaptive patch
- make UVs and vertex colors work
- fix missing subdivision updates and data corruption in edit mode
Week 22 - 7 - 11 June 2021
Last week was mostly spent revising patches, and working on new Alembic features/improvments. For viewport subdivision, caching appears to be tricky, it needs a safe place to be stored, however storing it in the Mesh runtime data does not appear to work as it gets cleared with every single update. Maybe there is a way I don't know about yet to keep it around as long as the topology or subdivision settings stay the same.
- Revised Patches:
- New Patches:
Week 23 - 14 - 18 June 2021
The week was mostly spent revising the OpenSubdiv GPU acceleration work. After some issues with face varying data, most likely due to mixing up face corner indices resulting in face varying patches being scramble over the faces, I decided to align the implementation to what the CPU side doing: we generate uniform patch coordinates over the coarse faces and evaluate the subdivided patches to find the limit value. Since this can go in master pretty quickly, contrary to subdivision settings in the mesh datablock, the subdivision is now again performed on the GPU if there is a subdivision surface modifier at the end of the modifier list. Builds were shared with the community, and after some testing there are bugs to fix.
- Alembic: support reading per-vertex UV sets (3385c04598)
- Revised Patches:
- D10197: Cycles: experimental integration of Alembic procedural in viewport rendering.
Week 24 - 21 - 25 June 2021
This week was spent partly on furthering the OpenSubdiv GPU acceleration work, and doing some bug fixes.
- Added support for multiple UV maps
- Started implementing the various edit mode visualization options (wireframes, show on cage, etc.)
- Support per-face smooth flags
- Bug Fixes:
- OptiX: select BVH build options from Scene params (cd39e3dec1)
- Revised Patches:
- D10145: Subdivision: add support for vertex creasing.
Week 25 - 26, 28 June - 9 July 2021
The past two weeks were mostly spent improving the GPU OpenSubDiv project. I fixed multiple drawing issues when in edit mode, as well as finalized implementation of show_on_cage for the subsurf modifier. Some memory usage improvements were achieved, although there is still room for more. I also implemented a GPU version for the PatchMap, which allows to reduce memory usage by looking up the final patch coordinates on the GPU based on the patch coordinate description built on the CPU side.
- Bug Fixes
- Fix T70615: Cycles ignores normal inputs when fed by nodes without inputs from other nodes e7fc15e2ef
- Fix T87194: Attribute Node not working with Cycles Volume as Resolved by committing a5ed075110
- Fix T89455: Crash when using Cycles preview or render - custom data layer / auto smooth as Resolved by committing 2dbb492268
Week 27 - 28, 12 - 23 July 2021
The last three weeks were spent improving performance of the OpenSubdiv GPU implementation to better match user expectations. Improvements and benchmarks were made against files provided by users, as well as files available from the Blender Cloud.
Another area of work was feature completeness and bug fixing of the already implemented features, this include support for displaying loose vertices and edges which frequently appear during modeling,x-ray mode, multiple UV maps, selection in edit mode, per face material assignments, and better optimal display.
Vertex normals have to be manually computed when adaptive patches (limit surfaces) are not used, as artefacts appeared at patches boundaries.
The patch evaluation kernel was moved to the Blender side, instead of using OpenSubdiv's black box evaluator. This is so that we can better control its behaviour, but also reduce memory usage. Prior to this change, we would use temporary arrays to store evaluated vertices and derivatives, this would use 12 bytes per subdivided vertex. This was because it was not possible to directly fill the various draw buffers directly from the evaluation shader, we had to use a level of indirection. Now, the buffers (for positions, normals, uvs, and edit mode face center display) are filled as we evaluate patches.
Week 29-33, 26 July - 27 August 2021
As I forgot to do a few weekly reports, I also forgot most of the details of what was done. I also got quite ill in the past couple of weeks which put me behind in my work, however, I can still do some overtime to make up for it.
Most of the remaining Cycles Alembic Procedural patches for the UI and cache controls were at last committed. A draft for the article on the Facebook collaboration on Cycles optimizations was made, it will sent for review after translating it (I am not that good of an English writer, so I made the draft in French).
The OpenSubDiv GPU acceleration project is nearing the end. The last tricky thing to do is to get the dependency graph part right, which would let other objects in the scene use the subdivision result on the CPU side while still doing using the GPU for acceleration. An implementation of it was made, with good result. For example, it was possible to use a subdivided mesh as an operand in the boolean modifier with correct result while transforming the base mesh (i.e. the mesh on which the boolean modifier is applied). However, this implementation was not quite correct, and lead to some crashes. A new version is in the work and should be ready this week, then the patch will be sent to code review.
Other work done include bug fixes, code review, and gathering some ideas for improving support for the volume object in Blender.
- Cycles: avoid copying vertex normals attribute twice to the devices (b8ecdbcd96)
- Cycles: use object coordinates when generated coordinates are missing (6b041ad3d0)
- Alembic import: option to always add a cache reader (5b97c00e9f)
- Cycles: experimental integration of Alembic procedural in viewport rendering (51862c8445)
- Alembic Procedural: only subdivide if subsurf modifier is present (f8637cd8af)
- Alembic Procedural: basic cache control settings (9bfc47c933)
- Bug Fixes:
- Fix T82336: Cycles standard attributes missing in displacement shaders (60d6333b80)
- Fix T77307: Particle Info Node Does Not Consider Time Remapping (23132fcdc1)
- Fix T90776: Cycles normal map node produces artifacts (c0dd6f1164)
- Fix T90854: Cycles, normal map fails with applied transformations. (bffa168157)
- Code review:
- D12305: Modifiers: export motion blur velocity through attribute. (Requesting changes, I still need to look into the cause of the issues)
Week 34-36, 31 August - 17 September 2021
The past few weeks were mostly spent revising the OpenSubDiv patch from code review comments. There are some divergent point of view between reviewers for some architectural decision I made for this feature regarding how to pass GPU buffers between the OpenSubDiv API and Blender, but I think we reached a common ground. There is still some review work to do, so I am not sure if it can make in Blender 3.0. I also spent some time debugging and improving the patch and its functionalities based on user tests.
Some other time was spent rewriting some Alembic patches, some of them were drafts already shared with interested users and developers (D11592, D11591) but put aside while working on OpenSubDiv. One of them is for supporting custom attribute import. I had to restart the implementation I was not happy with some hardcoded decisions, and based on user suggestion, I parameterized them. For example, some software export vector data as flat arrays of floats, instead of a more common structure: an array of 3d vectors might be written as an array of floats 3 times the size of the intended domain (vertex, face, etc.). Instead of trying to detect this cases manually, which is rather tricky, I added some UI to add mapping from source data to destination data. This also will make it a bit easier to handle and convert vertex colors and UVs to Blender.
Related, I added some way to specify which attributes to load, so we do not load every single attribute from the files, which might be time consuming and memory heavy.
Other patches for Alembic include outputing geometry sets, and file layers. The latter allows for non-destructive workflows were data from one layer overrides data from layers below it. A layer is basically an Alembic archive. The only real change in Blender for supporting this feature is to modify CacheFile so that it stores a list of file paths instead of a single one.
Finally, I also gathered some ideas for work on the Volume object, especially for data processing, which include attribute transfers between meshes and volumes, simulation retiming (via temporal volumes, which can also be useful for motion blur rendering). It would also be nice to have OpenVDB AX at some point to let artists build their own volume processing tools.
- Code reviews:
- D12305: Modifiers: export motion blur velocity through attribute.
- D12406: OpenSubDiv: add support for an OpenGL evaluator.
- D12411: Cleanup: move ABC sequence length computation to Cachefile.
- D11156: Alembic: operator to set the Scene frame range from the Archive time information.
- (to be updated) D11592: use geometry sets to import data [WIP].
- (to be updated) D11591: import arbitrary attributes [WIP]