From BlenderWiki

Jump to: navigation, search

The Decklink branch

The Decklink branch contains a series of developments that aims at mixing BGE scenes with a live 3D video stream with the lowest possible latency on the video stream.

Several solutions have been tested, which explains the variety of features that are present in this branch. All of them however, can be used in other types of applications:

1. Transparent background

The first attempt was to use the compositing feature that exists in Windows and Linux.

In Linux (a similar feature exists in Windows), if you create an OpenGL context with an alpha channel in the pixel format, the background pixels of the window will be transparent and the desktop underneath will be visible. The idea was to display the video stream in fullscreen on a passive 3D monitor and at the same time run the BGE player in fullscreen, stereo and transparent background, hoping that the two video streams would mix nicely on the display.

It didn't work well because of the poor efficiency of OS compositing and the long latency in displaying the video by existing players. The feature however is still present and can be used to create BGE games with transparent background:

Use the '-a' option on the player to request an OpenGL context with alpha channel in Linux and Windows. In Windows, the compositing feature needs to be explictly requested when creating the OpenGL context. The code exists in Blender but is currently compiled out. To activate it, you need to define 'WIN32_COMPOSITING' and recompile Blender.

2. Video capture with BMD DeckLink cards

The second attempt was to capture the 3D video stream using fast video capture cards, send the video to the BGE and mix it with the scene to produce a stereo composition on the display.

The choice was made to use DeckLink cards from Black Magic Design (https://www.blackmagicdesign.com/), hence the name of the branch.

It didn't work well because there was still too much latency in the DeckLink card when capturing FullHD video stream on HDMI. The overall latency, including the video transfer and compositing delay, was between 100 and 160ms; still too high for the type of applications.

The branch however has a full-featured API to send high speed video stream to the BGE. This is implemented in VideoTexture as a new object 'bge.texture.VideoDeckLink'. The Python API is described in the BGE API documentation but a more detailed document with design details is also available: File:Manual-BGE-VIdeoTexture-Decklink-API.pdf.

Notes:

  • There is a full range of DeckLink cards, from low cost USB devices to very high end PCIe cards. The development was done with the 'DeckLink 4K Extreme' card but testing has been done with a variety of other cards. In general, all cards of the range should be supported.
  • Before a DeckLink cards can be used, you must install the 'Desktop Video' software package version 10.4 or above (the implementation has been tested with 10.4 and 10.5.2). This is described in the DeckLink documentation. The BGE will dynamically link with the DeckLink driver at runtime, no compile-time linking is needed.
  • The video frames captured by the card are usually not straight RGBA images: they have instead a 'compact' pixel format that reduces the bit rate but complicates the decoding. No decoding is done in VideoTexure because that would be too slow. The video frames are sent unchanged to the GPU and the decoding must done in a shader. The shader code for 3 popular pixel formats are listed in the VideoTexture documentation: '2vuy' (=8BitYUV), 'v210' (=10BitYUV) and R10l (=RGB or various size) . You will need to the adapt the shaders to your personal needs (e.g. set the alpha channel based on the pixel color).
  • 3D video frames are sent to the GPU as a single image with the right eye above left eye below. When using this texture in stereo mode, the shader must sample the part of the image that corresponds to the eye being rendered. A new type of uniform is introduced in bge.types.BL_Shader to make this possible: setUniformEyef(name). The uniform takes the value 0.0 for the left eye, 0.5 for the right eye and 1.0 in non-stereo mode.
  • If you run Windows and you have a recent nVidia Quadro GPU, you can benefit from the 'GPUDirect' library to speed up the video transfer: the frames are sent directly to the GPU by DMA. The installation of GPUDirect is described in File:Manual-BGE-VIdeoTexture-Decklink-API.pdf.
  • If you have an AMD card and a Catalyst driver that supports the 'pinned memory' extension, you get the same type of speed up. The extension is automatically recognized and used.

Using VideoDeckLink, it is possible to send FullHD 3D video stream to the BGE in realtime.

3. Video keying with BMD DeckLink cards

The third attempt was to try using the keying feature of the DeckLink cards.

Keying is the operation that consists in alpha-blending an image, known as the key frame, with a video stream. The advantage of keying is that the transit delay on the video is minimal, less than a frame in theory, because the video stream is processed locally on the card. But it turned out that the DeckLink cards exhibits a high transit delay on the HDMI port. There was also a color space bug with 3D keying that made the feature unusable for the project.

The feature still exists in the branch in form of a bge.texture.DeckLink object that can be used to send images to the DeckLink card, with or without keying.

Notes:

  • Only the immediate mode is supported (as opposed to the schedule mode): the key frame is rendered as quickly as possible and stays active until another key frame is sent.
  • DeckLink output is compatible with all types of VideoTexture image sources but it optimized for render-to-buffer source (see next) as it allows direct buffer transfer between the GPU and the DeckLink card.

4. Generic render to buffer

After failing with the DeckLink cards, it was decided to implement a generic method for sending custom BGE renders to external devices as fast as possible without the need to implement the device API in Blender. The solution comprises the following elements:

  • The ImageViewport and ImageRender refresh() method now accepts a generic argument that is any python object that implements the buffer protocol: the render will be sent directly to the buffer without intermediate copy in VideoTexture. Typical use is as follow: from the API of an external video device you must write a python wrapper that exposes the frame buffer of the device via the buffer protocol and pass it to ImageRender.refresh(). On return, the buffer contains the BGE render and can be sent to the device.
  • The native format of the render is RGBA and bottom-up. If the video device requires a BGRA and/or top-down image, two options are provided for that:
    • optional format argument to refresh(): "RGBA" or "BGRA". This triggers the pixel conversion in OpenGL, which is faster than on the host.
    • negative Y scale on camera is now supported: it flips the render vertically without other effects (previously, there was a nasty back/front face swap).
  • Offscreen render is possible via a new RASOffScreen object. Create it with bge.render.offScreenCreate(width,height) to match exactly the frame size of the video device, and pass it to the ImageRender constructor: the render will go to an FBO instead of the viewport. Multisample FBO are supported via RenderBuffers, which is more widely supported than multisample textures.
  • The default BGE viewport can be disabled with bge.logic.setRender(False): the game logic is still executed but the render step is skipped, which eliminates VSync and allows to synchronize the BGE loop to another source (e.g the external video device).
  • Deferred render is possible with ImageRender new render() method: all the render instructions are sent to the GPU except the final pixels reading. The method returns immediately while the render progresses in the GPU. The pixels can be read later with refresh(), which can even be on the next frame. Typical use is to create 2 ImageRenders on the same camera but different FBOs and to use them in ping-pong cycle to read the nth-1 frame while the nth frame is being rendered.

Altogether, these methods allow to synchronize the BGE to an external video device and to send BGE scenes to it as efficiently as possible.