From BlenderWiki

Jump to: navigation, search

Current state of OpenCL in blender (may 2015) (WIP)


Cycles was included into blender with the release of 2.61 in december 2011. The release notes mention: “Cycles has two GPU rendering modes, through CUDA, which is the preferred method for NVidia graphics cards, and OpenCL, which is intended to support rendering on AMD/ATI graphics cards”. Ever since the support or lack thereof in cycles has been a topic of debate. In April of 2015 AMD contributed a set of patches to improve the situation.

Supported AMD devices

Name Code Name Core's GCN CU count
Radeon HD 7730Cape Verde LE3846
Radeon HD 7750Cape Verde PRO5128
Radeon HD 7770 GhzCape Verde XT64010
Radeon HD 7790Bonaire XT89614
Radeon HD 7850Pitcairn PRO102416
Radeon HD 7870 GHzPitcairn XT128020
Radeon HD 7870 XTTahiti LE153624
Radeon HD 7950Tahiti PRO179228
Radeon HD 7950 BoostTahiti PRO2179228
Radeon HD 7970Tahiti XT204832
Radeon HD 7970 GHzTahiti XT2204832
Radeon HD 7990New Zealand2048 x 232 x 2
Radeon HD 8570Oland3846
Radeon HD 8670Oland3846
Radeon HD 8760Cape Verde XT64010
Radeon HD 8770Bonaire XT89614
Radeon HD 8860Pitcairn XT128020
Radeon HD 8950Tahiti Pro179228
Radeon HD 8970Tahiti XT2204832
Radeon HD 8990Malta2048 x232 x 2
Radeon R5 240Oland3205
Radeon R7 240Oland PRO3205
Radeon R7 250Oland XT3846
Radeon R7 250XCape Verde XT64010
Radeon R7 260Bonaire76812
Radeon R7 260XBonaire XTX89614
Radeon R7 265Curaçao PRO102416
Radeon R9 270Curaçao PRO128020
Radeon R9 270XCuraçao XT128020
Radeon R9 280Tahiti PRO3179228
Radeon R9 280XTahiti XT2204832
Radeon R9 285Tonga PRO179228
Radeon R9 290Hawaii PRO256040
Radeon R9 290XHawaii XT281644
Radeon R9 295X2Vesuvius2816 x 244 x 2

Support status

Device Operating system Driver / Toolkit Version Status
Radeon HD 7000 series (southern islands) Windows 7 x64 Catalyst Beta 15.04 Works (Not all features)
Radeon HD 7000 series (southern islands) Linux x64 Catalyst Beta 15.04 Works (Not all features)
Radeon HD 7000 series (southern islands) Mac OS X 10.9.2 Apple 10.10.1 Not usable
Intel Iris Pro Mac OS X 10.9.2 Apple 10.10.1 Crash
Nvidia GTX 400/ 500 (Fermi) All Nvidia all Works (CUDA is faster)
Nvidia GTX 600/700 (Kepler) All Nvidia all Works (CUDA is faster)
Nvidia GTX 750 (Maxwell) All Nvidia all Works (CUDA is faster)
Intel Core / Xeon Windows x64 Intel SDK 2013 Works (C++ is faster)
Intel Core / Xeon Linux x64 Intel SDK 2013 Works (C++ is faster)
Intel or AMD CPU Mac OS X 10.9.2 Apple 10.9.2 Works (C++ is faster)

Current status of Windows / Linux AMD OpenCL drivers

In the past year AMD’s implementation has improved a lot. The cycles kernel ( the bit that gets loaded onto the GPU) is compared to the typical workload really large. A year ago one would have to disable most of what makes cycles good to be even compile it at all. As of a few months ago if you have a recent AMD card and the latest driver you can get away with only disabling some parts. While this in itself is good news and shows progress it does not carry any indication of how long the remaining things will take.

As it stands now we are missing 2 crucial things:

  • The OpenCL compiler tool-chain to just be able to compile the whole of cycles without any having to disable parts. This most may or may not include: register spill improvements. Cycles will grow and while spilling is very undesirable for the typical workload you want to do on a GPU it is unavoidable for us.
  • User control over when to spill. By default a good OpenCL compiler will do everything it can to not have to spill and it will use all resources before doing so. We need it to spill registers into the slow main memory before it gets to the resource limit. This is needed to be able to run enough parallel threads to get the kind of performance our users want from cycles and expect when comparing their AMD product to a comparable Nvidia project. (We rely on this to get enough parallel threads on Nvidia based hardware)

These 2 do not include the expected workarounds and hacks that are part of any language with multiple implementations. Those are expected and will not give serious problems. When we have both we could get to a situation where it would make economic sense to buy AMD products when doing cycles GPU rendering.

Current state of OS X AMD OpenCL drivers

For Apple’s OS X the situation is a bit different but generally comparable.

OpenCL Blender / Cycles roadmap:

  • Improve texture lookup / interpolation, we are currently doing our own interpolation and lookup on platforms that provide enough textures we could switch to using that instead.
  • Implement device fission. This would be good especially when the user does not have a dedicated GPU/accelerator.
  • Use more OpenCL 1.2 features. Nvidia’s opencl is still at 1.1 but it is not a serious target due to CUDA being better there anyway.

OpenCL in blender outside of cycles

Cycles is not the only area in blender that benefits or can benefit from OpenCL acceleration. The blender compositor also optionally uses OpenCL on some operations and there are plans to increase its usage. Physics simulation and especially Bullet also offer chances to use more OpenCL.

As a technology OpenCL aligns very well with graphics and as a result with blender. Our mission reads “We want to build a free and open source complete 3D creation pipeline for artists and small teams.”. Affordable high performance GPGPU products like AMD discrete graphics cards and APU’s fit really well with that. Now lets make that happen.

Q & A:

Q: Why only talk about AMD’s OpenCL? There are a lot of other implementations out there.

A: While this is true there is no real user visible benefit to using these. On Nvidia hardware cuda outperforms OpenCL and they are stuck on OpenCL 1.1. Intel GPU’s are getting more powerful but are not a good target yet and CPU based opencl provides little or no benefit over CPU based cycles.

Q: Why don’t you just split up cycles so it can run better on AMD hardware?

A: While this would likely help it is not a trivial matter to split up cycles in this way. Also it is not clear that it is going to help and how much. As a resource constrained open-source project this will most likely not be a top priority

Q: There seem to be things that blender can do to make things better why point to AMD for this ?

A: While it is absolutely true that cycles could be made to run better on any and all other OpenCL implementations but there is no compelling user visible reason to do so. Making it run better on Nvidia’s OpenCL is of no use as we have CUDA there. Making it better on AMD’s or Intel’s CPU based OpenCL implementations is no real use as we have a compiled C++ kernel on those platforms.

Q: What about Luxrender, Why does that not have a problem running on AMD's GPU's ?

A: While it is true that luxrender does run ( and it has excellent performance on AMD's hardware) it is not by accident. It is my understanding that luxrender is developed targeting the AMD OpenCL runtime and as a result keeps within the limits it imposes. The cycles kernel is a valid OpenCL program but not all OpenCL implementations can compile and run valid programs. One could attribute this to hardware insufficiencies and while this might be true for AMD's VLIW4 architecture and maybe for Intel's Iris and AMD's GCN architecture should be able to support the kernels the size of cycles. According the the AMD GCN ISA docs it should be possible ""The SALU also can perform operations directly on the Program Counter, allowing the program to create a call stack in SGPRs"" to have proper function calls and thus run (maybe slowly) large programs.