From BlenderWiki

Jump to: navigation, search
 Please visit the new OpenCL page

OpenCL Status

Cycles has a split OpenCL kernel since Blender release 2.75. It's an alternative approach to what is used on CPU (so called megakernel). The idea behind splitting the kernel is to have multiple smaller kernels which are not only simpler and faster to compile, but also have better performance. Initial split kernel patch was done by AMD. Further work was also funded by AMD.

The OpenCL Split Kernel now supports nearly all features. Baking works, but uses the mega kernel. Volumetrics, SSS, Branched path tracing, HDR lightning and Denoising are fully supported.

With current drivers, all production files from the official cycles benchmark pack, including the huge one from Gooseberry, render now pretty fast.

Timings.png

other differences with CUDA

With latest drivers, AMD cards using OpenCL can use system memory to render scenes that are bigger than the GPU memory. Baking is slower due to the mega kernel being used there until it also uses the split kernel. Also, the first rendering of a scene may have a compile time depending on the feature used. Offering pre-compiled kernels like for CUDA is being discussed.

Testing

Activate OpenCL rendering

  • For AMD cards on Windows and Linux you just need to select your GPU under file -> user preferences -> system. The split kernel will be used by default to give the best performance.
  • To use OpenCL on other platforms, launch Blender with --debug-value 256 (either on the command line or by adding it to your shortcut). It will add a section "debug" in the render options panel. There you can choose the kernel (split or mega) and the platform (set to "all" to enable OpenCL for Intel and NVidia).

Then choose GPU as the device to render in the render option panel.

Benchmarking

To test the performance of your computer, you can download the official Cycles Benchmark files from here https://download.blender.org/demo/test/cycles_benchmark_20160228.zip. Those include production files for films, archviz (exterior and interior), comics, etc.

How to get the best performance

  • Tile size of 64x64 up to 256x256 give the best performance.

Compare your results

Current issues and limitations

There are some known issues which are common to all kernels and platforms:

  • Use latest buildbot to benefit from all speedups and new functionalities.

Supported AMD Cards

For now those limitations are considered a TODO rather than a bug.

AMD on OSX (outdated section, to be updated)

AMD team who's working on OSX drivers for El Capitan (OS X 10.11) did really nice work on improving the driver which is now capable of compiling and running OpenCL megakernel.

OpenCL on other platforms

OpenCL works fine on NVIDIA cards, but performance is reasonably slower (up to 2x slowdown) compared to CUDA, so it isn't really worth using OpenCL on NVIDIA cards at this moment.

Intel OpenCL works reasonably well. It's even possible to use OpenCL to combine GPU and CPU to render at the same time, for until some more proper solution is implemented.

OpenCL in blender outside of cycles

Cycles is not the only area in blender that benefits or can benefit from OpenCL acceleration. The blender compositor also optionally uses OpenCL on some operations and there are plans to increase its usage. Physics simulation and especially Bullet also offer chances to use more OpenCL.

As a technology OpenCL aligns very well with graphics and as a result with blender. Our mission reads “We want to build a free and open source complete 3D creation pipeline for artists and small teams.”. Affordable high performance GPGPU products like AMD discrete graphics cards and APU’s fit really well with that. Now lets make that happen.

Q & A:

Q: Why only talk about AMD’s OpenCL? There are a lot of other implementations out there.

A: While this is true there is no real user visible benefit to using these. On Nvidia hardware cuda outperforms OpenCL and they are stuck on OpenCL 1.1. Intel GPU’s are getting more powerful but are not a good target yet and CPU based opencl provides little or no benefit over CPU based cycles.


Q: Why don’t you just split up cycles so it can run better on AMD hardware?

A: While this would likely help it is not a trivial matter to split up cycles in this way. Also it is not clear that it is going to help and how much. As a resource constrained open-source project this will most likely not be a top priority


Q: There seem to be things that blender can do to make things better why point to AMD for this ?

A: While it is absolutely true that cycles could be made to run better on any and all other OpenCL implementations but there is no compelling user visible reason to do so. Making it run better on Nvidia’s OpenCL is of no use as we have CUDA there. Making it better on AMD’s or Intel’s CPU based OpenCL implementations is no real use as we have a compiled C++ kernel on those platforms.


Q: What about Luxrender, Why does that not have a problem running on AMD's GPU's ?

A: While it is true that luxrender does run ( and it has excellent performance on AMD's hardware) it is not by accident. It is my understanding that luxrender is developed targeting the AMD OpenCL runtime and as a result keeps within the limits it imposes. The cycles kernel is a valid OpenCL program but not all OpenCL implementations can compile and run valid programs. One could attribute this to hardware insufficiencies and while this might be true for AMD's VLIW4 architecture and maybe for Intel's Iris and AMD's GCN architecture should be able to support the kernels the size of cycles. According the the AMD GCN ISA docs it should be possible ""The SALU also can perform operations directly on the Program Counter, allowing the program to create a call stack in SGPRs"" to have proper function calls and thus run (maybe slowly) large programs.