From BlenderWiki

Jump to: navigation, search

Optimizing Startup times with blender

Since blender 2.5, many scripts are loaded at startup, which is one of the reasons blender 2.5x is noticeably slower then 2.4x which cached menu headers from python scripts.

Startup Times Compared (2.56a-svn vs 2.49b)

Release builds, stripped binary.

  • 2.5x cold start: 8.548sec
  • 2.5x cached start:0.413sec
  • 2.49b cold start: 3.642sec
  • 2.49b cached start: 0.083sec

So who to blame? - lets check 2.56 without python

  • 2.5x cold start with no python compiled: 3.449 sec
  • 2.5x cached start with no python compiled: 0.112 sec

tested with...

time blender -b

Module Loading

To better understand how to optimize this I tried profiling blenders startup by profiling BPy_init_modules(), which loads all python scripts.

kcachegrind on a debug build of blender didnt show any 1 function which should be optimized for speedups, mostly time was divided between many functions, %5-10 each, PyEval_EvalFrameEx/PyEval_EvalCodeEx took the most, overall %24.

The ui/ dir contains 43 modules, total 649kb in size.

So I looked into speeding up startup by manipulating the /ui/ directory. Here are my results, all while files were in disk cache (second run):

  • 0.090 sec, ui/ dir unchanged.
  • 0.077 sec, ui/ combine files into a single file.
  • 0.064 sec, ui/ function body removed.

Some other tests...

  • 0.089 sec, ui/ dir unchanged, but no register calls.
  • 0.089 sec, ui/ dir unchanged, env variable set: PYTHONOPTIMIZE=1

I was considering automatically generating stub classes which would swap in the real modules when called however from these tests it is clear that there are only minor speed gains to be had from manipulating modules.

Python Startup

So I tested the time it took to initialize python (Py_Initialize(); PyEval_InitThreads();)

  • 0.0091 without site-packages
  • 0.042 with site-packages

I've noticed site-packages imports sysconfig which intern parses a 1200 line header file with regex, It would be good if python could eventually build this into python rather then parsing files on load.

nevertheless the times here are all quite small. SO. I ran some tests without disk cache.

Cold Startup Times (loading ui/)

Using this command before startup.

sync ; echo 3 > /proc/sys/vm/drop_caches

With pre-compiled bytecode (default)

  • 1.852 sec, ui/ dir unchanged.
  • 0.818 sec, ui/ combine files into a single file.
  • 0.535 sec, ui/ function body removed.

Without pre-compiled bytecode (removed *.pyc)

  • 1.868 sec, ui/ dir unchanged.
  • 0.669 sec, ui/ combine files into a single file.
  • 0.469 sec, ui/ function body removed.

Without pre-compiled bytecode and PYTHONDONTWRITEBYTECODE=1

  • 1.727 sec, ui/ dir unchanged.
  • 0.628 sec, ui/ combine files into a single file.
  • 0.536 sec, ui/ function body removed.


Interestingly its faster to do cold startups with no precompiled bytecode, assume checking both files is slower then it is to generate new butecode

Python (Py_Initialize(); PyEval_InitThreads();)

  • 0.904 without site-packages
  • 1.030 with site-packages

Other tests

Cold Startup with Naive Prefetch

#!/bin/sh
# clear cache
sync ; echo 3 > /proc/sys/vm/drop_caches

# prefetch modules
cat `find  release/scripts/ui -name "*.py*"` > /dev/null &
cat `find  release/scripts/modules/bpy -name "*.py*"` > /dev/null &

# start blender
./blender -b

The time I got for this was 1.4 second compared to 1.852 sec.

However there is probably some better way to prefetch scripts I don't know about :)

Python 3.2 vs 3.1

Some stats (best of 3), standard r35253.

sync ; echo 3 > /proc/sys/vm/drop_caches ; time ./bin/blender -b

times:

Py3.2 cold start 7.744
Py3.1 cold start 7.065
Py3.2 warm start 0.287
Py3.1 warm start 0.301

calls to open: Py3.1, 2347 calls to open(), 1932 of them fail. Py3.2, 4756 calls to open(), 4331 of them fail.

sys.path search overhead

I did some more tests by prepending "/usr/bin" 'sys.path' to see if adding more directories makes it slower, it did but only slightly for warm and cold startups. each time I added dir like /usr/lib or /usr/bin (each with over 3k files in it) it was approx 5-10% slower.

I assume this starts to depend on the filesystem and os drivers, assume once the dirs cached further calls to open a file are not all that slow.

Incidentally, the order of the sys.path is important, if the blender paths are added to the end of sys.path there are 6445/6019(failed) calls to open() rather then 4756/4331.

Useing zipimport

Zipping the dir to ui.zip, this helps to avoid slow disk reads being a bottleneck on a cold start.

Cached: 0.223 sec (vs 0.090)
Cold Start: 1.033 sec (vs 1.852)

Conclusions

  • Python scripts are the main problem with blender startup times (~2.5x slower cold, ~3.5x cached)
  • Only modest speed gains can be made to optimizing our api calls.
  • Loading modules at startup is quite fast (when in disk cache), time lost is accumulated from many modules, re-arranging modules has only minor gains, any significant speed gains here are probably only to skip loading some modules on startup.
  • main slowdown comes from reading un-cached files (cold startup).
  • zipimport could be used to avoid slow cold start (~1.8x faster), but is ~2.5x slower once loaded into memory.

note, all tests done with r35132, optimized blender, python 3.2
AMD Phenom(tm) II X6 1055T Processor stepping 00
, disk model number ST3750640AS

Module load times

Useful to include here, this is from a cold start.

1.9886 sec netrender
0.1584 sec space_userpref
0.0954 sec properties_data_armature
0.0907 sec add_mesh_torus
0.0662 sec properties_game
0.0635 sec animsys_update
0.0548 sec properties_data_modifier
0.0535 sec properties_material
0.0513 sec keyingsets_builtins
0.0506 sec properties_particle
0.0485 sec properties_data_bone
0.0466 sec wm
0.0422 sec properties_texture
0.0410 sec properties_data_empty
0.0401 sec space_filebrowser
0.0397 sec properties_animviz
0.0368 sec properties_object_constraint
0.0357 sec properties_data_camera
0.0345 sec properties_data_curve
0.0337 sec properties_data_metaball
0.0319 sec space_dopesheet
0.0279 sec space_view3d
0.0266 sec properties_data_mesh
0.0225 sec object
0.0217 sec image
0.0203 sec properties_object
0.0167 sec console_python
0.0165 sec fcurve_euler_filter
0.0156 sec properties_world
0.0149 sec properties_data_lattice
0.0143 sec uvcalc_follow_active
0.0125 sec space_sequencer
0.0118 sec space_logic
0.0116 sec properties_render
0.0115 sec properties_data_lamp
0.0113 sec console_shell
0.0092 sec properties_scene
0.0086 sec properties_physics_softbody
0.0082 sec space_console
0.0081 sec space_info
0.0079 sec object_randomize_transform
0.0067 sec object_align
0.0064 sec uv
0.0059 sec space_image
0.0056 sec space_view3d_toolbar
0.0056 sec nla
0.0051 sec properties_physics_smoke
0.0032 sec properties_physics_field
0.0028 sec space_nla
0.0024 sec uvcalc_smart_project
0.0018 sec presets
0.0017 sec space_text
0.0017 sec space_graph
0.0016 sec sequencer
0.0015 sec properties_physics_fluid
0.0015 sec mesh
0.0013 sec space_node
0.0013 sec properties_physics_cloth
0.0012 sec space_time
0.0011 sec space_outliner
0.0010 sec screen_play_rendered_anim
0.0008 sec vertexpaint_dirt
0.0000 sec space_userpref_keymap
0.0000 sec properties_physics_common