Note: This is an archived version of the Blender Developer Wiki (archived 2024). The current developer documentation is available on developer.blender.org/docs.

User:Jbakker/projects/CyclesOpenCL2019

Cycles OpenCL Compiler Optimization

Current situation

Currently Cycles supports OpenCL and it is in a workable state.

When using AMD OpenCL devices the compilation time of the Cycles kernel can take a minute to compile on a decent machine.

When a user uses an OpenCL device for rendering the compilation of the kernel happens when the user starts rendering. The user must wait until the compilation is completed in order that the rendering starts. This process is also blocking, what is annoying. Durign this project we will optimize this process.

Approach

  • Make a base line for OpenCL compilation and rendering. All modifications will be compared to this baseline.
  • Research how LuxRender and ProRender are organizes their opencl kernels and source code. And compare to Cycles
  • Research on technologies like Spir/Spir-V for offline compilation and optimalization.
  • Make a final list of experiments to perform
  • Perform the experiments
  • Decide on experiments we want to integrate into Cycles.
  • Design of how to implement the experiments
  • Review design
  • Implementation of design
  • Code review
   Note: Experiments will be not be production quality code as they will only are used to prove if that path will lead to performance improvements
   During the final implementation the code will be production quality code.
   Note: Results of the comparison (commit hash, timings and image results) will be stored (Ideas will be PostgreSQL and IPFS). This will 
   be scripted so we can execute and reexecute tests (including baseline). This will warn us on performance changes coming from master

Experiments

This is an incomplete list of experiments. During the project this list will be finalized.

  • Experiment on source code organization. For example
    • Reshuffling OpenCL functions
      • Combine split_shadow_blocked_dl with split_shadow_blocked_ao. Both uses the function shadow_blocked but with other parameters. Expected reduction of compilation speed between 4 and 8 seconds.
      • Refactor World based AO as material based AO.
      • Single kernel for shader evaluation.
    • Prefilter non OpenCL code
    • Organize source to be kernel specific (minifying)
      • By default split kernels in separate programs? (revert change https://developer.blender.org/rC626bc0971b5af2fac1eefefc6331194c3d70f14d) or add blacklisting/whitelisting on top of it.
      • Use program split supported by compile directives per kernel; this will make sure that recompilation of static kernels will not happen.
      • Bundle static kernels into a separate program.
      • Use a filter for unrolling include statements per kernel type. Maintainability?
    • Reshuffling shader nodes ([1], [2])
    • Bisearch nodes experiment ([3])
    • ... others
  • Experiment with other technologies
    • Spir/Spir-V online/offline compilation
    • .. others
  • Multi threaded compilation ([4])

Hardware

This is the hardware setup that will be used for initial testing

  • AMD Radeon Pro WX 7100 Graphics (using AMD RADEON PRO drivers)
  • AMD Ryzen 1700
  • 16 GB Memory

Scope

  • We concentrate on Split kernel OpenCL. Blender has a debug option for Mega kernel. It works but is unusable slow on AMD hardware.
  • We use SingleFile OpenCL compilation. OpenCL supports include statements, but it is not reliable on all platforms due to character encoding issues.

Selected Scenes

  • empty scene (a scene with only a camera)
  • bwm
  • fishycat
  • barbershop
  • victor
  • classroom
  • koro
  • pabellon_barcelona
 NOTE: The files have been packed so we can add it to IPFS. 
 IPFS is used to make sure that the input files are not altered
 between test.