From BlenderWiki

Jump to: navigation, search
Note: This is an archived version of the Blender Developer Wiki. The current and active wiki is available on wiki.blender.org.

Text Engine Reflections

This might be a nice 2.7/2.8 topic! ;)

See also User:Dfelinto#UTF8 and complex-scripting.

Rationals

Currently, Blender uses a simplistic handling of text rendering. It works (mostly) perfectly for simple scripts (writing systems), like “european” languages. However, it obviously fails on multiple aspects for more complex scripts, like RTL ones (arabic, hebrew, farsi, etc.) or the “horrible” Indic ones (Indian languages, devanagari, tamil, etc.) – and probably for eastern ones as well (at least partially):

  • Layout: this is the process of finding what glyphs to actually draw, and where to draw them. It obviously fails for RTL languages, but also for Indic ones (modifiers chars are not handled properly at all!).
  • User interaction/UI dynamics/Text editing: all the interactions between unicode chars, glyphs and graphemes (see definitions below), where to place the cursor, which chars that form a single grapheme, etc., etc.

Definitions

Unicode (Text)
  • (unicode) char: also known as 'codepoint', it’s a basic element, encoded as a single uint32 value.
  • direction: reading direction, usually LTR or RTL, but can also be TTB (classical eastern languages) and BTT.
  • grapheme: a “user” or “visual” char, i.e. a set of chars that form a unique character on the user point of view. Complex scripts often use two or more chars for a single grapheme. Cursor navigation, text selection, etc. should be based on graphemes, and not on chars.
  • glyphs: representation of the elements that are actually drawn (i.e. elements of a font).
  • logical order: order in which chars are placed into a unicode string. This order usually matches the order in which the chars are typed, and may differ from the order in which the user expects to read them.
  • script: a set of unicode chars that share the same rules to layout them (i.e. transform them to be ready for drawing). It’s globally a representation of the writing systems.
  • style: not a unicode topic, set of settings that define the aspect of the rendered text: font, size, kerning, decorations, effects, colors, etc.
  • unicode string: an array of chars.
  • (unicode string) item: a subset of a unicode string where all graphemes share a same style, script and direction. In other words, a piece of text that the layout engine can render 'as a whole', using same rules and data.
  • visual order: order in which glyphs are drawn. In complex scripts, it often differs from the logical order, either globally (RTL languages) or locally (some vowels modifiers in Indic languages are drawn before the base char, while they are always after it in the unicode string).
Text engine elements
  • Breaker: a tool that "parses" the unicode string to determine line, word and/or grapheme boundaries.
  • Itemizer: a tool that cuts a given unicode string into chunks sharing the same properties (on a 'style' basis – font, size, etc. – and/or on a 'unicode' basis – direction and same script). Cuts should not break graphemes!
  • Layout engine: a tool that takes an unicode string item as input, and outputs a list of glyphs with positioning info. It has to have access to style data.
  • Renderer: a tool that takes a list of glyphs with positioning info as input, and actually draws them (on screen, in a bitmap, as 3D objects, whatever).

How to Improve Blender Text Engine

We do not have many options!

We can choose an integrated engine like Pango (or ICU):

Pros
  • Supposed to be easy to use (but see cons below!).
  • All elements of the engine are already written and integrated, so few code to write.
  • Adds a single dependency to Blender.
  • They are quite well documented!
Cons
  • Those engines are relatively heavy, as they often have features we do not need (like a renderer).
  • Those engines often have heavy dependencies as well (glib…).
  • Those engines may (will!) be tricky on compilation level…
  • Those engines are not actually easy to use when you need some advanced control over the layout process.

Or we can build our own engine, based on some existing tools, and writing missing parts:

Pros
  • High flexibility and control over the code.
  • The tools are more small and simple, so compilation and dependencies should be much nicer to handle.
  • Lighter solution (we only have what we need!).
Cons
  • More work! Well, in theory at least.

Current Proposed Solution

I would rather follow the second path. So we need to find/create the tools featuring the various part of our text engine:

Breaker
I found libunibreaker, which already has line and word breaking. Would have to add grapheme breaking (GB rules in http://www.unicode.org/reports/tr29/).
Itemizer
Found none so far, to my surprise. It should not be that hard to code though, basically it should run over the whole string and:
  • Cut:
    • on custom limits (from style data);
    • on script changes;
    • on rtl/ltr (libfribidi ?);
  • Do not cut:
    • inside unicode graphemes (“visual chars”, cf. grapheme breaker);
  • For each grapheme, store:
    • Style data;
    • Script;
    • Direction;
Layout Engine
Obvious solution is harfbuzz and freefont! Not much to say here, only work will be input/output adaptation (and perhaps compilation work :P).
Rendering
Well, it should mostly reuse current Blender code, with some adaptations...

The three first elements would be gathered into a single 'intern' lib, which use should be optional.

Blender Internal Changes

To match that text engine, we may/will have to change some parts of Blender code. For now I foresee:

UI items, i.e. uiHandleButtonData
Each ui item should have a runtime set of data containing:
  • unicode string;
  • arrays of possible line, word and grapheme breaks;
  • global direction (LTR and RTL whould be enough for now!);
  • final gliphs array with positioning info (and implied data, such as bounding box…);
  • mapping data between unicode strings indices and glyphs indices.
  • maybe something like a transform matrix? Still have to investigate further that part of existing code!
  • Style: probably using existing style stuff is enough here.
3D Text object
Here one the main change would be to modify the style handling (which should be item(i.e. string chunk)-based, instead of char-based). The other one is obviously to adapt all “unicode string to textcurve object” code to use the new text engine!
Optionally, nice stuff like vertical layout handling might be nice as well… Not high priority, though.
Text Editor and Console
Did not check any of those yet. However, we could probably not touch them for start, as mono rendering is yet another topic for complex scripts!

Notes

  • Afaik, the "input" part of this topic (i.e. creating unicode strings from user input) is already nicely handled in Ghost…
  • A side but important task would be to get a good font handling system for our UI/styling system too!