User:Ankitm/GSoC 2020/Proposal IO Perf

= Improving IO Performance For Big Files =

Name
Ankit Meel

Contact
@ankitm on devtalk, d.b.o & chat

Synopsis
Among the 3D formats available, some are simple in theory, yet effective for a lot of different use cases and supported by a multitude of software in the industry. The challenge they offer is the number of iterables. Stanford PLY, for example, quickly gets over a million vertices. STL being a lossy format, has to be stored with extra details, making it enormous. Importing such models faster and doing so in the memory limits is the aim of this project.

Benefits
It will cut the import time by several folds, thus improving user experience. Also, it enables the baseline models with 4 GB RAM (requirements page) to process huge models and not run out of memory. For Blender, it provides a basic structure to facilitate implementing other file formats in the future, instead of addons being written from scratch, in Python again.

Deliverables
As an extended goal, if time permits:
 * Working importers and exporters for OBJ, PLY & STL (for ASCII formats).
 * Providing a cross-platform way to assess the performance of both C++ and Python codes.
 * Also documenting the performance at various iterations in the logs on the wiki.
 * Since there is no change on the user interface side, no additional documentation is needed. However, sufficient external documentation and internal comments for the code is expected.
 * If possible and decided after discussion with the UI team, add a progress bar, or an entry in the logger in Blender.
 * PLY and STL both in Binary format.

Project Details
Please find examples of all three file types in the appendix.

I applied subdivision modifier with Catmull-Clark nine times (6+3) on the default cube, on factory settings, and exported it to PLY, STL-Binary and STL-ASCII. Here are the stats. Also find some graphs in the doc on Google Drive.

The biggest penalty in terms of time in the process are the loops, which get very time consuming when there are 6,291,456 vertices and 1,572,864 faces. I used py-spy for profiling.

Following the precedent of multiple scientific libraries being written in C++ and using Cython to link the python wrapper, writing all the IO operations in C++ is feasible. The  already contains the Alembic, AVI, Collada, and USD files, so the newer ones will also be put there. Also,  will keep the operators' linkage and handle the per-file-format preferences that are shown in the file browser.

I am reading the current approaches to iterate over mesh, textures, color, etc., in the files mentioned above. So I expect to keep things uniform and thus maintainable. The endian property in binary files would be handled similar to that in. In week 7, during refactoring, the python addon is to be removed, keeping everything in one language and thus easing debugging, further improvements, etc.

Since Valgrind won’t work on macOS 10.14, I’d be using Instruments.app. If necessary, high performance C++ profiler would be used for finer details.

Optimizations Plans:

 * Reading the file in chunks instead of all at once, using streams. Loop over all the lines only once.
 * Minimising copying of variables & using pointers to pass them around.
 * Using the knowledge about the format to read the data, instead of reading it once and later do conversions.
 * Minimising flush operations to the disk from the stream.
 * Separating lower level file reading operations in a separate layer for easy experimentation.
 * Using a minimal, bare bone data structure to store one vertex/ face/ any other property so it doesn't add up to a much bigger number later.

Addressing memory mapping now, it isn’t a magic pill that improves performance in all cases. Many modern SSDs and networks provide read speed, which no longer is the bottleneck in the import process. It has to be decided only after actual profiling, not simply applying memory map to the problem while making the bare minimum task that is to be done, more complex. If the bottleneck turns out to be mesh processing, not the disk, I will look into distributing the file/ line reading process on multiple cores.

Project Schedule
The best time that I can work in is right now, which I am using to read the existing code of modifiers, iterations on mesh, modifiers, and the previous attempt. The college is closed and likely would remain so for at least 4-5 weeks. If it opens, it will overlap with the community bonding period, which I’ve already done (-: So that will not interfere with the rest of the timeline. The order of tasks, weekly is expected to remain as: I will further improve any remaining tasks after the GSoC is over. After which I intend to stay to help with bug triaging & fixing and also learning new things in Blender.

Bio
I am Ankit Meel, a pre-final year undergraduate student in Electrical Engineering at Indian Institute of Technology, Kanpur. You can see some of my hobby projects in photography and illustration at my Bēhance profile. It hasn’t been updated lately, I hope the next update will be a Blender product.

I was introduced to C and C++ in the second semester, three years ago and have been using them since. Using Python, I’ve completed multiple assignments in machine learning, signal processing; I also made a facial expression classifier for images, as a summer project. Other than that, I have done front end development, server setup in NodeJS, socket programming using Python, and numerical analysis methods in Octave.

I’ve also gotten some exposure to Objective-C while doing a paper cut for alias redirection on macOS,. In, I made an attempt for icon theming support. I have been active in Blender for over eight months now, and am also a member of the moderators group and coordinate with the bug triaging team and almost all other developers while triaging reports.

Interesting reads

 * https://link.springer.com/article/10.1007/s41095-015-0021-5
 * https://accademia.stanford.edu/mich/
 * http://graphics.stanford.edu/data/3Dscanrep/
 * https://web.archive.org/web/20161204152348/http://www.dcs.ed.ac.uk/teaching/cs4/www/graphics/Web/ply.html

Appendix

 * PLY ASCII : Plane.
 * STL ASCII : Plane (a, b, c, d in brackets added)
 * STL Binary : Plane

https://en.wikipedia.org/wiki/STL_(file_format)#Binary_STL