User:Yiming/LineArt Further Improvements
LineArt Further Improvements
Just started another grant working on line art! Proposal here
- working on
temp-lineart-embreeto get occlusion query using embree (for now just perspective camera).
temp-lineart-containedbranch to latest.
Embree line art: Progress as of 0319:
- The branch
temp-lineart-embreeis now runnable.
- Performance mostly on par with
temp-lineart-containedbranch considering optimized not enabled. The calculation works mostly correctly, and necessary data/flags are all registered as what it would in legacy line art.
- No need to load an additional mesh structure, all embree related callbacks now uses geometry from loaded lineart data (also: remove that additional mesh).
Problems so far / stuff to be done:
The triangle index in
Meshis [supposedly] different than the index in
BMesh, so those triangles in in collide func (where I need the triangle data structure) and in bounds func (where it's just that plain mesh for embree) don't match. But even that,
it seems to add way fewer "potential virtual pairs" than it needs to, or may not, depending on the mesh layout, I just saw it using default cube and it only added 5 pairs. So I'll look into it later.
Hang on larger files, and seem to still take a few seconds to build BVH and everything before the actual occlusion query. So don't know what's going on there.
- Precision issue, regarding internal triangle
isectfunction, prominent in default cube (algorithmically it's due to lack of special treatment of triangles who share one vertex).
- Still copies
floatfor internal triangle
isect, need to get rid of that, and use line art own function (needs some modification because we don't want to add geometry in the call back)
- Need to take care of discarded triangles and lines.
- Try out 3d bounds call back for geometry used for intersection, but use 2d for occlusion only. Need to see if building two different BVH trees would have taken away the benefits of faster intersection stage using 3d bounds.
Basically this week have been trying out different ways of optimizing line art embree core.
- Changed tri-tri intersection call for virtual triangle (for both occlusion stage and intersection stage) into my own one instead of using blender's internal math function, a tiny little speed up, performance bottleneck mostly on the locks.
- Jacques suggested using
EnumerableThreadSpecific(TLS) to so storage per thread, so we don't need to lock the result array when worker threads add into it, it indeed improved performance, then the bottleneck mostly become the "occlusion cutting" stage where multiple threads trying to cut edges where they share memory, thus a lot of locking going on.
TODO: See if there's a thread local allocator instead ofSee below.
MEM_malloc()so it's gonna be faster in the threads.
- Tried spread out locks by assigning 100 locks incrementally to all edges in hope that the cutting function doesn't collide that often, but turns out the memory allocator is shared so it's not improving much.
- Tried two ways for pre-check potential triangle intersections (in occlusion stage), first way is to check if triangle intersects with internal tri-tri function, it does filter out a lot of non-intersecting ones, but that stage cost a lot of time. The second way is to disregard that part at all and directly feed potential intersection pairs into line-triangle occlusion call, the performance mostly stayed the same for these two. (Which is generally slightly faster than current
masterbut still slower than
- Technically I could do a pre-check using "if line crosses the triangle in 2d", but that's essentially the first step inside the actual occlusion call, so it's not gonna be very useful.
- A theory for the performance being this way is that embree only do bound box checks, while line art grid acceleration method put triangles in a denser & adaptive grid, so embree is giving more potentially intersecting triangle pairs than line art would have done, because if two triangles are slanted in such a way where they occupy overlapping bound boxes, they could also very likely be in separate grid tiles. I'm not sure which way is better now.
- Another reason for it being a bit slower than expected is line art legacy algorithm actually records intersection verts that's already been found onto the triangle, but in embree method we actually need to calculate that again for the same edge but for two sides of that, so nearly doubled the work there?
- Basically removed "intersection record" and did intersection calculation directly in embree
IntersectionCollide()callback. Reduced memory usage (supposedly, because I left those variables in place for convenience of testing...), and also increased a little bit performance (because the result points are recorded directly rather than copied again). So there's some improvements. However the generation of points still suffer from the memory allocating lock issue as mentioned above, need to find a solution for that. See below.
- Memory leaks fixed. (Just to be careful with
newed objects from C++ and use a wrapper to properly take care of them.)
- Fixed Sebastian's mesh loading code with
totedge==0handling, further sped up the whole loading rendering. (Still some minor crashes, due to reduced
edge_hash==NULLand I'm not sure what caused it because
- This code works on both embree branch and legacy branch.
On the topic of thread-friendly memory management [IMPORTANT]:
- Turns out
MEM_mallocNstuff internally uses
jemallocwhich is optimized for multithread already. So now need to take advantage of this by giving each thread a local mem pool (Now understandably, using thread local storage) so we don't lock anything for allocation, which would greatly increase the performance of line art. (Thanks Hans for clearing that confusion for me)
temp-lineart-embree branch has this code path which I found to be the fastest up till now:
- Directly record intersection result in
IntersectionCollide(). Use thread local storage and combine result afterwards to avoid locks.
- Directly calculate occlusion cutting and only record
rcutting positions in
OcclusionCollide()and later in
occlusion_workerapply all cuts in parallel. Still using locks, need
TLSor something like that.
- Do not use any pre-checks for potential "virtual triangle" intersections.
- Only set up basic 10x10 acceleration grid for the chaining code (which depends on that). (Note/TODO 0329: Well I checked afterwards, at some point the code in master becomes 4x4 again, I'm not sure if there's a merging issue or I never updated master for that 10x10 change, so that's slowing stuff a bit)
Also some other progress on GPencil:
- Made fading support for build modifier, some back and forth for UI and some hidden algorithm issue.
- Cyclic option for dot dash modifier to satisfy a weirder look.
- A little bit fix for curvature weight modifier.
- Tried Möller algorithm for tri-tri intersection speed up but turns out it doesn't give correct result. Not sure about the reason, need to maybe try copy original data into
floatand try again. But from the look of it I suspect it's the nature of this algorithm that it doesn't have good stability when triangles become quite small.
Vector::reserve()for getting combined occlusion result.
- Corrected crease loading, now faster object loading code is basically finished, need to test a bit more to see if there are hidden issues.
- Tested the
edge_hashbug but can't reproduce.
- Tested the
lineart-shadowbranch for correct intersection filtering logic (for whatever reason the logic was not merged from master changes).
- Fixed https://developer.blender.org/T94888
- Closed https://developer.blender.org/T96846
- Changed line art final edge list into an array and further sped up
- Finished up edge/face mark filtering logic under new object loading code and tested to work correctly.
- Feature line filtering by shadow region now working correctly.
- Shadow region enclosed shape support now working, but light contour didn't went into re-projection, needs further fixes to make the result look great.
Generic GPencil stuff:
- Global scale compensation for sample modifier. https://developer.blender.org/D14544
- Shadow contour re-projection logic is fixed, now the generated light/shadow shapes are guaranteed to be fully enclosed.
- Object loading code patch: https://developer.blender.org/D14627 Pending review.
- Sebastian also suggested a new way of building adjacent edges without using
EdgeHash(https://youtu.be/z5oWopN39OU?t=191), will look into it, and if that turns out to be faster then the object loading patch should be updated to include that.
- Sebastian also suggested a new way of building adjacent edges without using
- Tries embree build quality to
HIGHbut still didn't speed up that much.
- We kinda decide that if everything fails, we still go with the tile solution but leave embree for intersection because it's faster than line art tile method in that stage (and then we are not gonna need to do intersection in 2d tiles, which would save a lot of time for locks).
- Implemented an experimental
CAS treefor line art legacy tile algorithm, not completely working, yet doesn't feel like "very fast" either. It's in the
temp-lineart-containedbranch if anyone interested to test. Could be me still including the intersection stuff inside the tile adding process... Need further testing.
CAStree is producing correct result except it doesn't free any memory.
- Index-sorting based edge adjacent lookup is working correctly atm for old object loading code, needs to be migrated to new object loading code.
- It's also working in new object loading code right now, but about 25% slower than that, probably due to qsort performance.
- Trying to keep threads working by slicing
add_trianglesinto smaller chunks instead of using each object as a chunk,
so any single "huge" object would be split into different worker threads instead of being worked on by one thread.
- Well it did keep all threads working but also introducing a lot of conflicts in tile operations so ended up much slower.
- The object loading code is done and awaits review :D . Currently using index ordering to find out adjacent triangles and only adding loose edge with
- Silhouette group feature implemented and running correctly. (The algorithm is based on top of shadow cast calculation) Which means the goal for shadow support is basically finished.
- Silhouette works out of box but it introduces ambiguity with lit/shade regions. Currently I break the silhouette up to match this setting, and most of the time it's good enough. In the future this needs to be improved (Probably with node or some more logic stuff, or with more intuitive presets).
- Intersection lit/shade info is not registered, need to take care of that.
- Fixed edge cutting function for erroneous cuts in the last segment (not registering correct silhouette group).
CAStree without reallocating storage arrays, not succeeded yet.
- Fixed lit/shade cutting for intersection lines (But expectedly slow)
- Object loading code committed into master :D
- Progress about
CAStree acceleration experiment:
- Without reallocating is now a success. A little bit faster than traditional algorithm when no intersection line is involved.
- With embree intersection the whole performance just about to catch up with traditional algorithm but still not quite.
- Fixed Object loading iterator so it won't crash on stuff like particles.
- Committed Better smooth tolerance handling, now a greater value of smooth tolerance won't reduce the entire contour loop into a single line.
casmethod work correctly with the use of
- Updated 7 more patches on the Lineart task, pending review.
Not particularly productive.
- Fixed two bugs related to line art crashes.
CASpatch committed (But got reverted for some atomic-related issues, new patch is being reviewed)
- The way line art iterates objects when loading is unsafe in depsgraph, New method is being researched:
- Some minor fixes in shadow branch for getting the reference assigning correct under new object loading code.
- Fixed sample modifier behavior of the last vert: https://developer.blender.org/D15005
- "Speed up quad tree building" patch is finally fully polished and accepted into
master(Yay!). Eventually we did not go with
casalgorithm as it involves busy waiting, and it's not preferred in the sense of OS thread scheduling.
- Committed some minor fixes for line art that has not made into
lineart-shadowpatch, writing documentations and preparing for code review.
- Polished shadow patch more for consistency and removing irrelevant changes.
LineartDataand reorganized variables for better clarity. (https://developer.blender.org/D15172)
Otherwise nothing substantial is happening :thinking:
- Made a new model specifically for testing line art shadow functionality in one go, which is available here: https://developer.blender.org/D15109 , it demonstrates:
- Cast shadow and light contour.
- Cast shadow over transparent materials.
- Silhouette (wires)
- Selection of lit/shaded regions.
- Intersection priority grouping.
- during making of that model I found a few more bugs and fixed in that patch:
- Threading issue regarding
castfunction, which I reverted back to single thread, I'll design a better threading model for it in the future. (But since the entire shadow stage is pretty fast, it won't have much impact)
- Added another 4 bytes in
LineartEdgeto store light contour
target_referencefor both adjacent triangles because
t2is not applicable for them, now light contour adjacency don't have any ambiguity which is much better.
- Various stability improvements
- Threading issue regarding
- Also did some more variable name clean ups in master line art.
- Polishing shadow patch.
- Cleaned up a bunch of the UI logic, as well as removing a few bugs introduced by some typo.
- Updated the patch to include "Object Silhouette Group" functionality, when selected that, every object would have their own silhouette, but object and other objects in the same silhouette group isn't combined (e.g. two monkeys are overlapping each other, their shapes are separated, but their inner features are removed).
- Writing and making demonstration illustrations for manual updates.