User:JacquesLucke/Documents/BasicDataStructures

= Basic Data Structures =

This document will first talk about the importance of easy to use and efficient data structures. Then it will talk about outsourcing data structure development versus making them ourselves. Finally, it will present a few specific data structures which are ready to be used in more places in Blender.

Motivation
Choosing the right data structure for every task is of utmost importance for several reasons.
 * They can automatically maintain invariants (e.g. every element has a different name). That remove whole categories of bugs.
 * Wrong data structures can greatly reduce the performance.
 * They improve the code readability, because it can focus on the actual business logic.

Unfortunately, in practice, the data structure choice is often based on what is the easiest to use, instead of what is the best for the given task. The best example for that is the use of linked lists in Blender. They are used almost everywhere, but are rarely the right choice due to many allocations, poor cache performance and bad debuggability. In the case of ListBase in Blender we also loose type safety.

I believe that developers would prefer to use other, more suitable data structures, if they were as easy to use. Fortunately, C++ can make the use of data structures much more comfortable than C.

Outsourcing versus self-made
There are many existing libraries that implement commonly used data structures. Most notably of course the standard library.

Using `std::vector` is already much better than using `ListBase`. However, I don't think it should be the default used everywhere. That is mainly due to guarantees it (and also other standard library containers) make, that we do not need in most cases, and that negatively impact performance. I think we need a list data structure with small object optimization. That is, it does not call `malloc` unless, the number of elements exceeds a certain threshold.

We could also check other existing libraries, but I believe, for Blender it makes most sense to use its own data structures. The benefits are possibly fewer external dependencies and much more control about what is happening under the hood. I guess, the same decision has been made about C data structures at some point, so I'm not sure if this is worth discussing more thoroughly again.

Basic Data Structures
During the last couple of months, I've been working on some fundamental data structures, that are ready to be used in more places in Blender. All of them have unit tests already. Some of the data structures are inspired by the LLVM code base. The code can be found in `blenlib` in the `functions` branch.

Containers with Small Object Optimization
The most fundamental data structure is `BLI::SmallVector`. On top of that, there are `BLI::SmallMap`, `BLI::SmallSet`, `BLI::SmallSetVector` and `BLI::SmallStack`.

All of them can store up to `N` elements before calling `malloc` for the first time. That property makes them the ideal data structure for most use cases.

References to Containers
Many functions work with arrays of data, but don't actually need to e.g. extend the array. They do not care about whether the data is stored in `std::vector`, `BLI::SmallVector`, `std::initializer_list` or just a plain C array. Instead of doing conversions or using templates, it is best to have a special data structure that can wrap arbitrary buffers that are owned by another structure. For that purpose there are `BLI::ArrayRef`, `BLI::MappedArrayRef`, `BLI::StringRef` and `BLI::StringRefNull`.

A major benefit of using these structures is that they reduce coupling. Furthermore, they make functions more comfortable to use.