From BlenderWiki

Jump to: navigation, search
Note: This is an archived version of the Blender Developer Wiki. The current and active wiki is available on wiki.blender.org.

Week 52: 1st - 7th January

  • Info

We decided that we need to focus on fixing bugs for the OGL render and paying some crucial technical depts.

  • General development
    • Made OGL render output Linear Floating point HDR data (better precision / no banding artifact). cbe9098bf980 There is issue with Clay and other object mode overlay that are not color managed.
    • Fixed glitchy SSR in OLG render. 4df11e3c7068 205978a48927
    • Fixed Problem with Alpha blending: bbf810f96976 b4ad0151c336
    • Fixed Sampling problems in OLG render: Now ogl renders looks less noisy. Noise is dependant on the number of AA samples used for OGL render. d73f74793ef5
  • Next week

I started implementing a fix for the non-meshes instancing but still have to finish it. Then I'll also have to fix the motion blur for renders.

Week 53: 8th - 14th January

  • Info

I finished the instancing refactoring. I went for the big bottlenecks first and some crucial showstopping bug.

  • General development
    • Reworked draw manager instancing code: This makes the whole thing faster and enables the display of non mesh dupli objects. aa0097ad5e80 377915b08144
    • Support lamp from dupli-objects in eevee (without shadows support unfortunatly)
    • Fixes: Eevee: Fix AO in planar reflections. a0655ed487a8
    • Eevee: Fix Planar probe refresh. 8aaf7bc438ad
    • Eevee: Lamps: Optimize lamps CPU/Memory usage: Scene with high number of object/lamps should be faster now. 014226450892
    • Eevee: Depth Of Field: Use 32bit framebuffer when doing OpenGL render to fix color artifacts with high Circle of Confusion radii. 5ef2be5f5939
    • Eevee: Fix Armature instances not drawing. da97b6930b47
    • Eevee: Fix Motion blur not working in render. 2bbc287af165 rB49d51a1e6246
  • Next week

I will still focus on polishing and fixing technical dept.

Week 54: 15th - 21nd January

Left old sun, Right new sun lamp with LTC lighting
New diffuse lighting (right) have smoother transition when surface intersect light radius.
  • Info

In an effort of simplification, since we do have temporal super sampling/denoising (yet no temporal reprojection), I decided to remove the sample parameters from the AO and SSR panel. This greatly reduce complexity of the shaders. Also it lead to fix some sampling problems. You should use a fairly high number number of samples to expect a perfectly noise free image (~128 samples). Note that very high sample count (~256 samples) will produce color drifting artifacts. I also added a few usability fixes (Probe intensity), performance fix (Global UBO, Clay without AO codepath) and rework the LTC area light code. The new LTC algorithm let me implement the sun lamp as an area light. The implementation is a bit different compared to cycles: the bigger the sun gets the less bright it gets (instead of remaining fairly constant for cycles). This is a technical limitation.

  • Next week

I will start the implementation of the new F12 render for OpenGL engine like Eevee. Outputting multiple passes in high quality, useful for compositing.

Week 55: 22nd - 28th January

  • Info

I faced some issue finding a good way to handle the F12 (final) render. We are currently limited by opengl not being thread friendly. As of now we cannot create a 2nd Opengl context and expect it to magically work (see T51736) so I had to make it work with one opengl context. Unfortunatly, this means we must freeze the UI until the rendering is finished. While it's not a big deal for still images (a still render is usually 1-2sec) it's becomming a problem for animation rendering. Also this will work in background mode (render on headless renderfarm). The base implementation for the final eevee render is here. Passes will be added one by one. However we will not support all cycles passes.

  • General development
    • DRW: Add "hardcoded" stipples for sun ray display. 01a62515cb2b
    • DRW / Render: Add support for render pipeline in drawmanager. b6dbd8723c1b
    • Eevee: Initial Final Render support. ba9a4deddad5
  • Next week

I will continue on Final "Offline" Render.

Week 56: 29th - 5th February

128 samples/pixel. Old AA (left) vs. New filtered AA (click and zoom to see the difference)
  • Info

This week was focused in adding passes support and adding features to the F12 render. I also added a proper pixel filtering for AA jittering so that it gives better results. The filter function is hardcoded to use blackman-harris for now. Use a filter of 1.5 to have the same default as Cycles.

  • General development
    • Eevee: Add new "render samples" properties. 8cce3391316.
    • Eevee: Add support for TAA/SuperSampling for final render. f107af35198.
    • Eevee: Add Z pass render result. 376d42304b7.
    • Eevee: Fix crash when Rendering (F12) using camera mapping. 85d3de94c62.
    • Eevee: Fix Crash when rendering using Render Border. 251fd91064d.
    • Eevee: Render: Add Normal pass output. 55a238edd60.
    • Eevee: Render: Fix crash when using a sun lamps with shadow. afaca68ea86.
    • Eevee: Render: Add Subsurface Pass support. 253b412acef.
    • Eevee: Render: Fix Normals of refraction shader. c95f3a36166.
    • Eevee: Render: Add mist pass support. e52c5bcdb56.
    • Eevee: Render: Add Transparent Background option. ab5f86a04e1.
    • Eevee: Render: Add support for multiview. 00f1bc16850.
    • Eevee: Render: Add ambient occlusion pass support. 36b259fa889.
    • Eevee: Perf: Put transparent sorting before the render loop. e530d0ccaa5.
    • Eevee: Render: Make sure all probes are refreshed before rendering. 4820c7400fb.
    • Eevee: Render: Fix hashed-alpha testing. 226685d3a0c.
    • Eevee: Render: Fix black reflections in 1st sample. 07e1212e341.
    • Eevee: Render: Fix black normals on blended material in the normal pass. f61bcc70e11.
    • Eevee: Render: Fix Ao pass background contamination. c8e87edccbd.
    • Eevee: Render: Reset winmat before jittering it again. 143b0ab52ae.
    • Eevee: AA: Add Blackmann-Harris pixel filter distribution. cc1e88b37a3.
  • Next week

There is hope for the multi-windows & non blocking render! I will give it a try and see if it workable (only non-blocking may fail, multi-window will be done for sure).

Week 57: 6th - 11th February

  • Info

I came up with a solution for the multi-windows & non blocking render: Use a separate opengl context for the draw engine. An initial implementation is up on the temp-drawcontext branch. X11 (Linux), Windows (thanks to Germano Cavalcante), and Osx (thanks to Brecht Van Lommel) support have been added.

  • General development
    • GPU: Remove Mesa + Vega hack. 708ef19d885.
    • Clay: Small refactoring of matcap_colors and put ubos into sldata. 25c8b5046fa.
    • DRW: Fix memory leak with dupli objects. af425f3f7a0.
  • Next week

While implementing the new offscreen context some very deep problem concerning Gawain and the Drawmanager came to my attention. I think i'll reserve a part of the next week to finishing this long due refactor that will makes things cleaner, fix limitations and performances.

Week 58: 12th - 18th February

  • Info

The gawain and drawmanager refactor took longer than I expected. I tried to implement something to fix multiwindow and VAOs usage at the same time but the complexity was not good. I changed it to only adress the VAO management and will fix the multiwindow issue with the separate context idea. The new VAO manager should be commited in the next days or so, improving CPU performance.

  • General development
    • GWN: Refactor: Support long attribs. a5afe13e1c4 df86e9cab54.
    • GWN: Refactor: Draw instance without additional batches 0f3bc636c82.
    • GWN: Refactor: Add GWN_batch_draw_procedural 1e9ef2a25e0.
    • DRW: Refactor: Make use of the new Gawain long attrib support. 01244df0077.
    • DRW: Refactor: Add instance buffer manager. 629a8748176.
    • DRW: Refactor: Less feature duplication with Gwn. 0ef981f603a.
    • Eevee: Fix broken AO and Contact shadows on certain platform. 2464dcef37f.
  • Next week

I need to finish the VAO manager. Then fix the separate context bugs to make Multi-Window and non-blocking F12 works.

Week 59: 19th - 25th February

  • Info

I've commited the VAO manager improving CPU performance quite a bit. I've spend most of the time working out the bugs in the separate context approach and it seems stable now. Just need final approval and we will merge it to 2.8. I also identified and fixed a serious performance problem.

  • General development
    • Gawain: Add new context/vao manager. 1b3f9ecd0d0 c5eba46d7f4.
    • Gawain: Refactor: VAOs caching AND use new VAOs manager.
    • DRW/GWN: Bypass glUseProgram: Leads to a significant performance improvement on SOME systems. 241c90c92d8.
  • Next week

Merging temp-drawcontext into 2.8 and focus on some performance / frustum culling.

Week 60: 26th - 4th March

  • Info

I commited the frustum culling code. This greatly reduce the render time of scenes with lots of objects. Performance impact of the culling itself is not very significant. We can still try to multi-thread this frustum culling but I don't know if we will really get better performance out of it.

  • General development
    • DRW: Refactor / Cleanup Builtin uniforms. ec0ecbe7952.
    • DRW: Refactor & Split draw_manager.c into multiple files. 0df21e2504e.
    • DRW: Merge calls_generate pool with calls pool & add DRWCallState pool. 1ba96857d1e.
    • DRW: Reuse DRWCallState for the same object. 64e35f6fd21.
    • DRW: Codestyle: Remove DRWCallHeader and DRWCallGenerate 725112cce73.
    • DRW: Initial implementation of Frustum culling. 68015f9d397.
    • Clay: Perf: Early out of SSAO if there is no need for it. 1c12e1a2ebc.
    • Object Mode: Make use of optimized DRW_shgroup_call_object_add 62390527b2e.
    • Clay: Make use of optimized DRW_shgroup_call_object_add. dee2efb968a.
    • DRW: Refactor simple instancing. d63829117c2.
    • DRW: Add DRWMatrixState to manage all matrices together. 5e730974fe3.
    • Eevee: Make use of culling when rendering the shadowmaps. c43d51c1c2c.
  • Next week

I'll focus on Lazy (or deferred) shader compilation. Firsts test are promising: Files loads faster and interface stays interactive despite some freezes (probably due to one of the compilation step not being threaded by the driver).

Week 61: 5th - 11th March

  • Info

This week focus was on shader compilation improvement (lazy/deferred compilation + shader caching) and some Eevee bugfix / polish (specially on Planar Probes and volumetrics). I also polished eevee's F12 render a bit : now support mid-render stop + progress bar.

  • General development
    • DRW: Deferred compilation initial implementation. 3a209c28575.
    • DRW: Shader Deferred compilation: Use a wmJob for threading. 94fadd00d88.
    • DRW: Fix culling with inverted view (planar reflections) 45ec962f68f.
    • DRW: Culling: Fix precision error. f043365c38c.
    • DRW: Culling: Expose & Add culling functions to engines. 7c31edb385f.
    • DRW: Put all view-only dependant uniform in a UBO. 8444aaaa693.
    • DRW: Change clip planes API. dfd8a52cd2e.
    • DRW: Change UBOs binding logic. 41abbc271c5.
    • Eevee: Update to support shader deferred compilation. f8b63b564de.
    • Eevee: More use of DRW_viewport_matrix_override_set_all a6e6d7e0221.
    • Eevee: Make use of the new view matrix UBO. 82957cfec88.
    • Eevee: Probes: Fix probes not working after a world update. 4e7d9b7a984.
    • Eevee: Shadows: Fix Cascaded shadowmap setup. d5ecadd643c.
    • Eevee: Volumes: Fix crash with volumetrics + default mat + alpha blend faf70e1e64b.
    • Eevee: Volumes: Fix garbage on the first frames when enabling volumetrics. aa07660201e.
    • Eevee: Volumes: Fix volume rendering glitches. cfba75a21af.
    • Eevee: Render: Add progress. 872df463f62.
    • Eevee: Render: Add cancel support c962f7ef775.
    • Eevee: Planar Probe: Add new clipping UBO. 4f55ee5a3cb.
    • Eevee: Planar Probe: Add supersampling jitter. e697c1da422.
    • Eevee: Planar Probe: Add transparent objects. 9cd09fee6ae.
    • Eevee: Planar Probe: Add refraction support for reflected objects. 92c2e2f3865.
    • Eevee: Planar Probe: Fix corrupted results in downsampling step. f3161bd2abe.
    • Eevee: Planar Probe: Add culling. 13b99b7bbba.
    • Eevee: Planar Probe: Fix last planar reflections remaining after deletion. 4540bd226dc.
    • GPUMaterial: Add Material shader cache. 765d7242d5a.
    • GPUImage: Add back garbage collection for the new viewport pipeline. 7fed3ad32bf.
    • Screen: Fix screen layout preview render. b7414d357af.
  • Next week

I still have some problems to fix with the Deferred/Lazy compilation among other issues. So it will be a polish/bugfix week and I will start looking at the next big challenge : OpenSubdiv.


Week 62: 12th - 18th March

  • Info

Lots of fixes, and some some improvement on eevee's F12 render. You can now see progress, cancel renders, and see what objects are being synced. Also a big news is that the Clay engine is now using a deferred shading path when AO is needed for a material. This means high poly scenes will be faster because they will not need to render all meshes twice. I also did some optimisation on Gawain and fixed sculpt updates.

  • Next week

I've started to look into OSD but I need more time to evaluate what the best approach is. I'll focus on some needed refactors while still looking into this.

Week 63: 19th - 25th March

  • Info

The refactor took longer than expected but should allow us to do more optimisation down the road. After a more in depth look at OSD it seems it's too early and too much work to do for now.

  • Next week

Some users have reported really bad UI performance compared to 2.79. So we decided we should address on this point now before adding more feature. 2.8 is supposed to be faster than 2.79 not slower!

Week 64: 26th - 1st April

Text colored by drawcall, before optimisation.
Text colored by drawcall, after optimisation. Icons follow a similar optimization scheme.
  • Info

This week of general UI optimisation is a clear success. Let me detail some points here:

    • All the UI is drawn using immediate mode (our emulation of it using Gawain). This was easier for us to use IMM instead of modern opengl for porting the majority of the UI drawing

code. However this method produces a lot of draw calls which are costly.

    • The UI drawing code (and in general all OpenGL drawing) is single threaded (only uses one CPU core). So CPU usage matters (more than GPU usage) a lot when doing UI and app-driver interactions are what defines what is
    • IMM uses glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT which is still creating latency on most systems (even glBufferSubData is faster on some systems). See this video for an explanation of the problem. So using IMM extensively in performance critical parts is not something we want to keep doing. Note that using persistent mapping is not an option for OpenGL 3.3 core compliant systems (persistent map is core only since OpenGL 4.4).
    • Basically every UI elements are doing MULTIPLE draw calls. We are doing more than 4000 drawcalls for the UI itself (on a default layout setup).

So how do we fix this? Answer: Batching and shaders.

    • 80% of the UI is using a widget base (every button and slider use it). Rendering this base was done with at most 35 drawcalls (Antialiasing was done by creating one drawcall for each jitter sample). Each drawcall was calling IMM and glMapBufferRange, hence the slowdown. Porting this to a shader based drawing completly eliminated the need of IMM and glMapBufferRange for these calls and reduced the number of calls to ... 1 per widget base.
    • Then pretty much of the UI is using Icons and/or Text. To my knowledge these never overlap each others (well if the UI is well designed!). So since the order between Icons and text is not important, we can group multiple draw calls together. Instead of producing one drawcall per icon and per text string, we batch icons by groups of 16 (technical choice see related commit below), and Text glyph by groups of max 4096 if the font does not change.

BLF (our text rendering API) was also a pretty huge bottleneck by itself. Sending less info (6x less data) to the GPU and creating a kerning table (instead of using FT_Get_Kerning) fixed this issue (at least for ascii strings).

Simple performance test on my system (No viewport, layout similar to the screenshot, Phenom II x6 2.8Ghz):

    • 2.8 (opti): Draw Window and Swap: ~17 ms.
    • 2.8 (opti): Draw Window: ~12 ms.
    • 2.79 : Draw Window and Swap: ~37 ms.
    • 2.79 : Draw Window: ~33 ms.

Remember that 60fps is 16ms and remember that less time on the UI means more time for drawing the viewport.

Note that theses optimisations have likely introduced some regression. Report them to ME (hypersomniac) on IRC, NOT IN BUG TRACKER.

  • General development
    • BLF: Use Batch API instead of IMM. ab9adf9cdc3.
    • BLF: Add Batching capabilities. 963e48e1dfb.
    • BLF: Perf: Divide by 6 the amount of verts sent to the GPU. 7144fdf2856.
    • BLF: Perf: Add a kerning cache table for ascii chars. f9691bae840.
    • BLF: Perf: Do not call FT_Set_Char_Size every time. 4dc0c923fb7.
    • DRW: Opti: Make cursor use batch instead of immediate API. 7a94d4362af.
    • GPUFramebuffer: Fix assert triggering another assert. fb1463ff2bc.
    • GPUShader: Add specialized widget base shader. ba9c2746b6c.
    • GPUShader: Cleanup: Remove unused uniform_interface. 3bb720a7de4.
    • GWN: VertBuff: Add GWN_vertbuf_vertex_count_set. 8568d38f1b0.
    • GWN: Imm: Add immVertex4f. c48b6fae9ae.
    • GWN: Batch: Add GWN_batch_uniform_4fv_array 3c48a21833f.
    • GWN: Perf: Bypass glUseProgram(0) 8b74741b9e1.
    • UI: Fix: Center vertical scrollbar circles. 72c57a755e6.
    • UI: Perf: Add BLF batching for File browser and UI blocks. f44d3e83cc0.
    • UI: Perf: Batch icons drawcalls together. c77870fc78f.
    • UI: Perf: Batch Trias with widgets. e1d6e524b3c.
    • UI: Perf: Do not use implicit Attrib fill. ddbde6d1c0c 873c23456b4.
    • UI: Perf: Group drawcalls inside ui_draw_panel_dragwidget 0acf655f9de.
    • UI: Perf: Group fill/border/emboss batches together. d93e7e64303.
    • UI: Perf: Make icon_draw_texture use GWN_draw_primitive. 637993fafe0.
    • UI: Perf: Optimize widgetbase_draw. 2cbd7cc269f.
    • UI: Perf: widgetbase: Replace imm usage by a batch cache. f6ad5380409.
    • Clay: Remove warning. 205fe8afd70.
    • EEVEE: Fix bad framebuffer configuration 8301b264524.
  • Next week

I'm still not satisfied by the performance of the nodetree drawing which could be at least x2 or x3 faster (it gets slow pretty quickly and having animated values in it and a viewport opened could lead to serious slowdown). Other immportant area type could could benefits from special treatments. Widget base could be batch together too and reduce the drawcalls count drastically again easily. I'm moving to Amsterdam for code quest so I'll have some preparation time too.

Week 65: 2nd - 8th April

  • Info

Moved to Amsterdam, got sick, and continued on the UI optimisation side. I think all low hanging fruits have been picked and UI is more than twice as fast than it was in 2.79 (depends on configs but it's a win everywhere I could test).

  • Next Week

Super secret meetings :)

Week 66: 9th - 15th April

  • Info

This week was focused on planning the codequest. For Eevee I plan to focus on usability and reliability first. Detailed planning is yet to come.

Codewise, I just fixed a few bugs and minor perf. issue. I also remove the one pass shadow rendering method in eevee. It was really slower than just rendering individual faces.

  • General development
    • BLF: Fix assert when drawing very small chars. 020c4e19f2b.
    • BLF: Fix broken shadows on certain hardware. a74e782f5b3.
    • GPU Codegen: Fix assert caused by GC of failled shaders. 3cb42e59174.
    • DRW: Hair: Opti: Use GWN_PRIM_LINE_STRIPS instead of LINES 3cfca15b506.
    • WM: Fix a crash (assert) when creating a new window. 04e363376ba.
    • UI: Fix some drawing order issues. 3dce5b2ef9f.
    • DRW: Deferred Shader Compilation: Don't recreate ogl context. d7aa51a50c6.
    • GPUSelect: Remove glFinish() that was causing bad perf issue. bf854b28515.
    • Eevee: Shadows: Transition to individual face rendering. dd6fcd2f210d8d1f637b1b 61a22262d19.
  • Next week

Define priorities and week by week planning then start working on things.