OpenCL support for AMD/NVidia GPU rendering is currently on hold. Only a small subset of the entire rendering kernel can currently be compiled, which leaves this mostly at prototype. We will need major driver or hardware improvements to get full cycles support on AMD hardware. For NVidia CUDA still works faster, and Intel integrated GPU's are unlikely to give any speed improvement over CPU rendering.
In Blender 2.65, OpenCL is not available as a choice in the UI by default. The environment variable CYCLES_OPENCL_TEST can be defined to show it, which can be useful for developers that want to test it. The OpenCL kernel is located in 2.65/scripts/addons/cycles/kernel. In the file kernel_types.h specific functionality can be enabled/disabled for testing, without recompiling Blender.
The path tracing kernel is currently a single big kernel, much bigger than typical OpenCL code. There are about 40 shading nodes, 10 BSDF's, etc.
Splitting it up into smaller parts may help, but even then compiling only the shading nodes execution code fails. This would be quite difficult to split up. An alternative would be to compile a kernel for each material in the scene, but I don't have much faith in complex node setups compiling reliably then, and scene startup time would increase considerably.
If at all possible I would like to avoid splitting up the kernel in many pieces, mainly because it makes extending the code much harder (we're only getting started in terms of number of features). And also because I haven't really seen this demonstrated working efficiently in other renderers yet, e.g. NVidia Optix also uses a single kernel.
The immediate issue that you run into when trying OpenCL, is that compilation will take a long time, or the compiler will crash running out of memory. We can successfully compile a subset of the rendering kernel (thanks to the work of developers at AMD improving the driver), but not enough to consider this usable in practice beyond a demo.