Cycles OpenCL Compiler Optimization

Current situation

Currently Cycles supports OpenCL and it is in a workable state.

When using AMD OpenCL devices the compilation time of the Cycles kernel can take a minute to compile on a decent machine.

When a user uses an OpenCL device for rendering the compilation of the kernel happens when the user starts rendering. The user must wait until the compilation is completed in order that the rendering starts. This process is also blocking, what is annoying. Durign this project we will optimize this process.

Approach

Make a base line for OpenCL compilation and rendering. All modifications will be compared to this baseline.
Research how LuxRender and ProRender are organizes their opencl kernels and source code. And compare to Cycles
Research on technologies like Spir/Spir-V for offline compilation and optimalization.
Make a final list of experiments to perform
Perform the experiments
Decide on experiments we want to integrate into Cycles.
Design of how to implement the experiments
Review design
Implementation of design
Code review

   Note: Experiments will be not be production quality code as they will only are used to prove if that path will lead to performance improvements
   During the final implementation the code will be production quality code.

   Note: Results of the comparison (commit hash, timings and image results) will be stored (Ideas will be PostgreSQL and IPFS). This will 
   be scripted so we can execute and reexecute tests (including baseline). This will warn us on performance changes coming from master

Experiments

This is an incomplete list of experiments. During the project this list will be finalized.

Experiment on source code organization. For example
- Reshuffling OpenCL functions
  - Combine split_shadow_blocked_dl with split_shadow_blocked_ao. Both uses the function shadow_blocked but with other parameters. Expected reduction of compilation speed between 4 and 8 seconds.
  - Refactor World based AO as material based AO.
  - Single kernel for shader evaluation.
- Prefilter non OpenCL code
- Organize source to be kernel specific (minifying)
  - By default split kernels in separate programs? (revert change https://developer.blender.org/rC626bc0971b5af2fac1eefefc6331194c3d70f14d) or add blacklisting/whitelisting on top of it.
  - Use program split supported by compile directives per kernel; this will make sure that recompilation of static kernels will not happen.
  - Bundle static kernels into a separate program.
  - Use a filter for unrolling include statements per kernel type. Maintainability?
- Reshuffling shader nodes ([1], [2])
- Bisearch nodes experiment ([3])
- ... others
Experiment with other technologies
- Spir/Spir-V online/offline compilation
- .. others
Multi threaded compilation ([4])

Hardware

This is the hardware setup that will be used for initial testing

AMD Radeon Pro WX 7100 Graphics (using AMD RADEON PRO drivers)
AMD Ryzen 1700
16 GB Memory

Scope

We concentrate on Split kernel OpenCL. Blender has a debug option for Mega kernel. It works but is unusable slow on AMD hardware.
We use SingleFile OpenCL compilation. OpenCL supports include statements, but it is not reliable on all platforms due to character encoding issues.

Selected Scenes

empty scene (a scene with only a camera)
bwm
fishycat
barbershop
victor
classroom
koro
pabellon_barcelona

 NOTE: The files have been packed so we can add it to IPFS. 
 IPFS is used to make sure that the input files are not altered
 between test.

Building Blender

Develop

Process

More

Contents

User:Jbakker/projects/CyclesOpenCL2019

Cycles OpenCL Compiler Optimization

Current situation

Approach

Experiments

Hardware

Scope

Selected Scenes

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Building Blender

Develop

Process

More

Contents

User:Jbakker/projects/CyclesOpenCL2019

Cycles OpenCL Compiler Optimization

Current situation

Approach

Experiments

Hardware

Scope

Selected Scenes