From BlenderWiki

Jump to: navigation, search
Note: This is an archived version of the Blender Developer Wiki. The current and active wiki is available on wiki.blender.org.

Threaded Dependency Graph: Results

Locked Interface

GSoC-DepsGraph-LockInterface.png

An option to lock the interface during rendering was added. If this option is enabled, only image editors would be re-drawing and only image navigation operators (panning, zooming) are allowed. The rest of the interface stays totally locked, meaning no buttons could be pressed and no interface redraw happens outside of image editors.

Note: Currently there're some experiments happening with freeing all the data used by the viewport if rendering in locked mode. This might give some slowdown after rendering is over (because the whole frame needs to be re-evaluated form scratch). It also might potentially lead to some objects not being updated properly (because of lag of dependency graph).

Curve Texture Space

GSoC-DepsGraph-MatchTextureSpace.png

As was mentioned here, curve's bounding box and texture space is getting approximated with points positions and radius. In some cases it might still be useful to have texture space matching tessellated curve.

For this cases it was added an operator to match texture space to tessellated curve. This operator could be found in the Texture Space panel in Curve buttons.

Derived Render

Modifiers

Blender does have an old issue which makes it so if boolean's operator does have subdivision surface preview settings for that operand will be used by boolean modifier when rendering. This is demonstrated on the next picture (Blender 2.68a is used to render the scene).

Issue with boolean modifier in current Blender version

This was solved by adding so-called derivedRender field to the object. This field is used to store result of object evaluation which happens for render purposes. And if modifier requires other object, it'll use this derivedRender while rendering.

This might sound crazy for unprepared artist, but basically it just mean that in the example above renderer will use render settings for boolean when rendering.

Render result with Blender built from depsgraph_mt branch

Unfortunately, there's a downside of such approach, which is increased memory usage when exporting scene to the renderer.

Note: this is only tested with blender internal. External engines, such as Cycles will still have issues.

Constraints

Such constraints as FollowTrack and Shrinkwrap also requires other object's derived mesh, and this constraints will use viewport's derived mesh when rendering.

Internally in the code constraints are now aware of this issue and will use derivedRender when evaluating for the viewport.

But this is not yet be noticed because was disabled. The reason why it was disabled is that viewport and renderer will use the same object's matrix to store object's transformation, which leads to huge conflicts between viewport and renderer.

As soon as internally object's data and state are separated constraints will work nicely as well.

Threaded Object Update

GSoC-DepsGraph-ThreadedCPULoad.png

Objects are now updating from the multiple threads. This mainly applies to playback, but some other cases might also utilize more than one thread.

For example, when there're lots of rather complex objects which depends on the same rig and one is editing the rig, objects will be updated in separate threads as well (in case they don't depend on each other).

Ideally, all the cores will be used during scene update and in theory it could give Nx speedup where N is a number of threads.

But in practice speedup would be smaller, mainly because of viewport drawing is not threaded at all yet and it might took quite reasonable amount of time just to draw complex objects on the screen.

Threaded Render Database

Blender internal's render database filling will now calculate derivedRender in multiple threads using the same approach as regular threaded update techniques.

This gives pretty much noticeable speedup of "Preparing Scene Data" stage of Blender Internal's rendering stage. It wouldn't give as much speedup as much cores you've got, because apart from derivedRender creation there're quite lots of other operations involved here which are not so simple to be threaded.

Faster Shape Key Blend Group

When a shape keys uses a vertex group for blending, it might be rather slow in cases when multiple key blocks uses the same group. In this case weights array was filling in for every block individually, which is not so much speed efficient.

Now code is re-arranged in a way that weights array for given group is filled in only once. This gave around 100% speedup on famous Chinchilla test file when updating objects in multiple threads. Speedup in cases of single-thread update might not be so much dramatic.

New Guarded Allocator

There were some issues discovered with the guarded allocator Blender was using for ages:

  • It requires thread lock when allocating/freeing memory block. Which becomes a real bottleneck when multiple threads are allocating the memory.
  • It sues rather huge MemHead which is not efficient from CPU cache utilization point of view.

Some tweaks were done to existing guarded allocator, so now it uses as few locks as it's possible there, but it's still does use locks and memory overhead is still there.

What was done is an extra guarded allocator was implemented, which replaced old one by default. This allocator doesn't have any thread locks and uses really small memory overhead. This makes it really friendly for cache and memory utilization and for threading as well.

As a downside, new allocator is not able to give list of non-freed datablocks, it only able of detecting memory leaks. So if one noticed message there's unfreed memory he need to run Blender with --debug or --debug-memory command argument, which will switch to old guarded allocator, which keeps full track on allocated blocks.

Still need to make benchmarking with both shape keys weights optimization and new guarded allocator.