Source/Nodes/InitialFunctionsSystem

= Functions System =

This document is the next step after the initial planning document. Since then I've been working on refining the architecture. The current implementation of the aspects described below are in the functions branch.

Goals
The function system is the first step towards the vague goal named "Everything Nodes". Its main purpose is to provide a way to combine different functions in at runtime. Every function should live as much as possible in a black box and only communicate with its environment in predefined ways. Other than that, it should be side effect free.

Not everything Blender can do, fits in this concept. However, a lot of functionality does. For example, from the 100s of nodes Animation Nodes has, at least 90% can work within this very constrained environment. Due to these constraints, it becomes very easy to use the generated functions within contexts, that have the same or fewer constraints. Some examples for concepts with the same or fewer constraints are modifiers, constraints, drivers and compositor nodes.

Core Architecture
The relevant code for this section can be found in `source/blender/functions/core/`. The two most important classes are `FN::Type` and `FN::Function`. Both have a similar structure in that they serve as container backend specific data.

A type object is heap allocated and reference counted. Each type object represents a type like "Float", "Vector" or "Geometry". To compare types, it is enough to compare their pointers. Each type has a name. However, this name does not serve as an identifier. Besides that, it can store type extensions, that are implemented by different backends. Those can be added dynamically.

A function object is heap allocated and reference counted as well. It has a name, that can't be used as identifier. It has an immutable signature that is assigned on construction. This signature object contains the input and output parameter types and names of the function. Furthermore, a function can have multiple bodies for different backends.

The type extensions and function bodies are owned by the container they are in. So when e.g. a function is freed, all its bodies are freed as well.

A third, less fundamental, but still very important class, is `FN::DataFlowGraph`. This can be thought of like a node tree. Every node in this graph wraps a function. Every input socket in this graph has to be connected to some output socket. Also, every link has to be between two sockets of the exact same type. This graph is heap allocated and reference counted as well. A graph can be frozen, afterwards it is not possible to change it anymore. This way, the graph can be used in multiple different places, without having to copy it.

Closely related is the `FN::FunctionGraph` class. It has a reference to a data flow graph. Furthermore, it stores a list of input and output sockets in that graph. This way it is possible to define multiple functions based on the same data flow graph.

Backends
The classes introduced so far, can't do much on their own. They mostly serve as containers for the actual implementations provided by different backends.

Tuple Call
This is the most significant backend for Blender. The term "tuple call" describes the calling convention used to call a generated function from C or C++. Since, the signature of functions is not known when Blender is compiled, we can't use the same calling conventions normal C code uses.

Calling Convention
Instead of passing in parameters one by one, all parameters are put into a tuple data structure. The output of the function is put into a new tuple. This is similar to how python functions are called.

The main difficulty here is to implement the `Tuple` data structure. There are a couple of requirements:
 * It has to hold values of different types.
 * It has to be possible to allocate it on the stack.
 * It has to be possible to wrap it using LLVM with little overhead.
 * It has to keep track of which values are initialized.
 * It has to keep references to some type objects, so that they don't get freed.
 * When types are known statically, access should be very fast and comfortable.
 * When types are not known statically, it must still be possible to work with the tuple (insert, remove, copy, ...).

After some experimentation, I decided to split the class into two parts. Now there is `Tuple` and `TupleMeta`. The meta object stores references to the types in the tuple as well as their sizes. A tuple itself only stores the reference of the meta object and two buffers. One buffer contains the actual data. Every value is stored at an offset specified by the meta object. The other object stores a bool for every value that indicates, whether the buffer segment is initialized.

Type Extension
To make this work, we also need some runtime type information. This information is implemented by a type extension currently called `CPPTypeInfo`. An instance of this class implements a couple of functions. For example `size_of_type`, `construct_default(void *)`, `copy_to_uninitialized(void *src, void *dst)`, `relocate_to_initialized(void *src, void *dst)`. For most types, these functions are build automatically by a templated subclass. However, with this approach, new types can be generated at runtime if necessary.

In addition to the input and output tuple, a tuple call body also gets an `ExecutionContext` passed into it. This currently only contains some information about the current call stack. In the future, it might also contain hints for which outputs don't have to be computed to save computation time.

Function Graph Execution
Now we know, how a single function, that has a tuple call body is executed. To be able to combine multiple functions into one, a function graph has to be turned into a tuple call body. This is currently done using the `void fgraph_add_TupleCallBody(SharedFunction &fn, FunctionGraph &fgraph)` function.

I have not found a execution mechanism that checks all my requirements yet. Those are:
 * Low setup cost (especially important when files with large function graphs have to be loaded).
 * Low performance overhead during execution.
 * Low memory overhead for intermediate results.
 * Only execute the sub-functions (wrapped by individual nodes) that have to be executed.
 * Only compute the outputs of functions, that are required.
 * Support lazy execution of certain inputs.
 * No deep recursion. This is especially important when the data flow graph is a long chain of nodes.
 * Avoid copies of data as much as possible.
 * Support for maintaining call stack information.

For testing and learning purposes, I implemented three different mechanism for now. Those are correct, in the sense that they can calculate the right result, but another one is necessary.
 * 1) Recursive input computation: This implementation computes the output sockets one by one, without reusing any previously computed values. This has very low setup cost and low memory overhead. However, the same sub-function might be computed many times, resulting in high performance overhead. Also currently this implementation is recursive.
 * 2) Byte code interpreter: This implementation has higher setup cost, because the byte code has to be generated. Currently the byte code generation can have high recursion depth. During execution, no recursion is used. The current implementation does more copies than necessary.
 * 3) Lazy Evaluation: I started experimenting with a function body type called `LazyInTupleCallBody`. This is similar to the normal tuple call body, but also supports deferred computation of some socket inputs (depending on other inputs). This function graph evaluation implementation supports using such function bodies. Unfortunately, it is not very good in other aspects.

LLVM
The previously described `TupleCallBody` is great for many kinds of functions whose execution time is large compared to the overhead. However, some functions only add individual numbers or do other small operations. In those cases, the overhead of using tuples is large. A better approach is to compile such functions at run time. Using LLVM to do optimizations and conversions to machine code is the logical choice, because it is production proven and is already integrated in Blender's build system.

The LLVM function backend provides another body type currently called `LLVMBuildIRBody`. Instances of it implement a single function that generates the corresponding LLVM IR (intermediate representation).

The backend also provides another type extension called `LLVMTypeInfo`. An instance of it can generate code for certain operations like relocation, copy, ...

Building the LLVM IR for a whole function graph is relatively straight forward. The overhead of the IR generation is not extremely important, because most time is spent during the optimization/compilation process anyway. So for these kinds of functions, the setup cost is high, but the final execution time will be low.

Conversions
Functions that only have an `LLVMBuildIRBody` cannot be executed directly from C or C++ code. Also, functions that are implemented as `TupleCallBody` cannot be simply integrated with the IR generation. Therefore, it is necessary to define conversion functions for different body types. Currently, there are multiple such conversions: The `derive_TupleCallBody_from_LLVMBuildIRBody` function is especially important, because in it the actual compilation happens.

Frontends
So far, only backends have been presented. To allow users, to specify their own functions, frontends have to be created. A frontend converts some user representation of a function into an actual `Function` instance, that can be used elsewhere. Currently, only a single frontend exists, but many more are possible.

Data Flow Nodes
This frontend allows the user to use nodes to create new functions. The frontend consists of two parts:
 * User interface: This part describes how nodes look like and how the user interacts with them. Currently, this part is implemented in Python.
 * Function generation: The heart of this part is the conversion from the `bNodeTree` instance to a `FN::FunctionGraph` instance.

User Interface
The relevant code for that is currently placed in `release/scripts/startup/function_nodes` but can easily be moved somewhere else. The main architectual difficulty is that nodes should not be static (this is in contrast to the shader/compositing nodes that already exist). Instead, it should be easy to change the number and types of sockets. Furthermore, nodes not only change, when settings are modified, but also when certain links are made.

Sometimes new sockets are created when a link is made (similarly to how group input and output nodes work). But sometimes, just some socket types change based on connected sockets. This is commonly called type inferencing.

When developing Animation Nodes, I learned that systems that enforce that the node tree stays in a valid state, greatly improve usability. That also includes the removal of links between types that are not the same and can't be converted implicitly. Allowing complex behavior like the one described is difficult, when every node manages its own sockets itself. A better approach is to let the node declare what inputs and outputs it wants to have. The framework is responsible for creating/changing the actual sockets.

A simple example for a node that only has a static set of sockets could look like so: A more complex node, that changes according to the connected types, could look like so: A simple function input node, that can have a variable number of outputs, can be declared easily as well:

Within the frontend (but not in the core), data types are identified by their name. More importantly, they are not identified by the `bl_idname` of the sockets. Different data types can use the same underlying socket type. Currently, the different available data types are defined like so:

Function Generation
The function generation is complicated by the fact, that the user generated node tree does not match the data flow graph exactly. That is, because a single node in the UI, is allowed to expand to multiple nodes in the backend. This helps, because the implementation of the same node might be very different depending on some setting. However, for the user, these different implementations belong into the same node.

Another aspect are implicit type conversions. For example, users are allowed to connect integer and float sockets. However, in the data flow graph, only links between matching types are allowed (to simplify further processing). For that reason, a link might expand into a node, that does the conversion.

Yet another aspect are unlinked input sockets in the user interface. Those are not allowed in the data flow graph for the same reason that implicit conversions are forbidden. So, all unlinked inputs must be converted into a node, that outputs the value of the sockets.

All three aspects are handled similarly. There is `GraphInserter` that contains information on how to insert nodes, links and sockets. Individual node/socket/link types have to register an inserter function.

Node Inserters
A node inserter is identified by the `bl_idname` of the node it inserts. The inserter is a function that currently takes three arguments.
 * A `Builder` that simplifies building a `DataFlowGraph` and also contains a mapping between original sockets, and their corresponding sockets in the generated graph.
 * A `BuilderContext` that e.g. knows the node tree that is currently used.
 * The actual `bNode` instance.

The inserter has two tasks:
 * Insert one or more nodes using the `builder` and link them appropriately.
 * Let the builder know, which newly generated sockets correspond which original sockets.

In some cases, the internal function has exactly the same signature as the node in the UI. In this case, an utility can be used:

Socket Inserters
A socket inserter is identified by a data type name. I'm still not absolutely sure, what the socket inserter should do, because it depends on how the system will be integrated with the depsgraph. Currently, a socket inserter just loads a value from the socket and stores it in a tuple.

This part of the frontend is currently specialised for the tuple call backend. It should be easy to replace it with something else in the future.

Link Inserters
Custom link inserters only have to be used for implicit conversions.

Getting Started
Everything described above is very work in progress and will probably change soon. However, I'd still like to invite other developers to try to work with the framework. Below are guides that describe roughly what needs to be done to implement certain things.

Add New Node
Three separate things have to be implemented for a new node:
 * 1) The user interface (`release/scripts/startup/function_nodes/nodes`).
 * 2) The actual function in the backend (`source/blender/functions/functions`).
 * 3) The conversion from the UI node to the actual function(s) (`source/blender/functions/frontends/data_flow_nodes/inserters/nodes`).

Example implementations for all three parts are shown above. It should be fairly straight forward to find more examples in the code.

Add New Type
A new has to be added in three conceptually places as well:
 * 1) The user interface (`release/scripts/startup/function_nodes/types.py`).
 * 2) The type backend (`source/blender/functions/types`).
 * 3) The conversion (`source/blender/functions/frontends/data_flow_nodes/inserters/sockets`).

Use a function from C
Currently the C interface to the function system is defined in `FN-C.h`. The steps to using a user generated function are as follows:
 * 1) Get the `FnFunction` object based on a node tree.
 * 2) Get the `FnTupleCallBody` from that function.
 * 3) Allocate the input and output `FnTuple` objects.
 * 4) Insert the input values into the input tuple.
 * 5) Call the function.
 * 6) Read the output values.
 * 7) Destruct the allocated tuples.
 * 8) Free the function.

For the time being, only functions that have exactly signature you need can be used. In the future, an adapter can be used that changes the function slightly, so that you can use functions with different signatures uniformly.

A function can be generated from a node tree like below. The types are borrowed, because then they don't have to be freed again. For these fundemental types, this is possible, because we keep another reference to them at another place all the time. If the node tree does not have the signature you asked for, NULL will be returned.

The tuples can be stack or heap allocated. Note that stack allocated tuples still have to be destructed.

Usage
There are a few places in Blender that can use functions currently (in the `functions` branch). The ones that are easiest to test are the "Function Points" and "Function Deform" modifiers. Those will probably removed later, but make testing simple node trees easy.

Additionally, a function can be used in the Displace modifier instead of a vertex group. Lastly, there is a new driver variable type, that can evaluate a function (but I haven't tested it in a while, might not work anymore).

In the node editor, use the Ctrl+A shortcut to open a search menu that contains all nodes.

Next Steps

 * Improved problems/errors panel that can show warnings generated by functions.
 * Function adapter that allows using functions with slightly different signatures than expected.
 * Easier creation of new functions from the place, where the function will be used.
 * Extend the signature with usage hints. The could e.g. tell the caller that a certain output should be passed as input the next time.
 * Add more types and nodes (especially find a good way to define a Geometry/Mesh type).