From BlenderWiki
[edit] Description
Network renderer from inside Blender
[edit] Download
In svn branch 2.5
[edit] Instruction
Note that all these instructions are for the beta test version and are bound to change A LOT.
- On one machine, start a Master server.
- Start Blender, switch render engine to Network Render using the dropdown in the Info window header (next to Scene)
- Select Master from the render mode drop down (Scene properties, Network Settings panel, probably at the bottom)
- Optional: In Scene settings, specify the IP address of the interface to listen on as well as the port. Leave at [default] if you want the server to listen on all network interface.
- Press Start (it will open a blank render window). The render status line will reflect the actions of the server
- On other machines, start render slaves
- Start Blender, switch render engine to Network Render
- Select Slave from the render mode drop down
- Optional: In Scene settings, specify the IP address of the master server as well as the port. Leave at [default] if you want the slaves to automatically detect the master from the broadcast.
- Press Start (it will open a blank render window). The render status line will reflect the actions of the slave
- To send job to the cluster, on your workstation:
- Open the blend file to be rendered. Confirm your render settings (size, etc)
- Save the file (it sends the last saved file at this point)
- Select Network Render engine
- Select Client from the render mode drop down
- Press Send Job to server to dispatch the animation job
- Whenever you want, press Anim to gather the finished frames. Finished frames with "appear" automatically while it will pause on ongoing frames.
- You can also hit Render on any frame of the animation and it will fetch the result from the cluster.
- In the simplest example, you can just press "Animation on network" and wait for the frames to come in. Total render time should be close to inverse proportional to the number of nodes (minus transfer times).
- You can stop server and node instances by pressing Esc, as if cancelling a normal render.
- Full multilayer render results are used, so the final results should be exactly the same as a local render. You don't have to specify this as output in the original file, it's done on the nodes automagically.
More info for testers here: http://blenderartists.org/~theeth/temp/netrender_log.txt
Testers are invited to contact me on irc (#blendercoders) or by email.
[edit] Notes and Known Bugs
- Hit esc or close blender to shut down master or slaves.
- No shared network space required between nodes
- You can dispatch many different files, all results can be retrieved independently (save the file after the dispatch if you want to close it and retrieve later).
- There is very little network error management, so if you close the master first, stuff will break. Same if you enter an invalid address.
YES, I know the current workflow is far from being ideal, especially from a professional render farm point of view. I expect Matt to whip me and suggest better stuff. Optimally, I'd like if users could just press "Anim on network", it would automatically dispatch to the network and wait for results, like a local render. All "pro" features should be optional.
[edit] Load Balancing
Primary balancing is done by calculating usage of the cluster every 10s for each job, averaged over time. The next job dispatched is the one with lowest usage (the one that is using the lesser number of slaves). The priority of a job acts as a divisor, so a job of priority 2 would use a percentage of the cluster as if it were 2 jobs and not just one (ie: a job of priority 1 and one of priority 2 sharing slaves will use respectively 33% and 66% of the processing power). On top of that, there's a set of exceptions and first priority rules.
Exceptions:
- A single job cannot use more than N% of total slaves, unless it's the only job. That prevents slow job from starving faster ones. This is set at 75% for now, but should be customizable.
First Priorities. Jobs that fit in the following
- Less than N frame dispatched (prioritize new jobs). The goal of this is to catch errors early.
- More than N minutes list last dispatch. To prevent high priority jobs from starving others.
[edit] To do
- Send job from memory
- Don't depend on render engine choice for visibility
- "Expert" render manager
- Better defined communication protocol
- The option to calculate simulations (cloth, smoke, ...) on a node which would then send point cache to server for dispatch to render
- Pack textures on upload
- Dispatch single frame as tiles
[edit] Technical Details
Uses HTTP (hackisly) as application protocol
Header values
- job-id: id of a job group
- job-frame: frame in a job group (can be "frame" or "frame_start:frame_end" when announcing a job)
- job-result: result of a frame render (constants exists for DONE or ERROR)
- job-chunks: number of frames to dispatch to a slave at once
- slave-blacklist: slave id separate by whitespace. When announcing a job, don't send it to those slaves.
- slave-id: used to identify the slave when it does a request
HEAD
- status (job-id, job-frame) http error if frame doesn't exist (if it's canceled). Used by slave to check for cancellation while rendering
GET
- version returns the version of the server (exit client if version mismatch)
- status (job-id) returns status for a particular job or all job if id is empty or unspecified
- slave return a list of all slaves and there status (json like reply)
- job returns a new job to a node (job-id and job-frame). Body is the path to the files needed
- file returns the file corresponding to a job-id
- render (job-id, job-frame) returns the render result if DONE. http.NO_CONTENT if not done, http.EXPECTATION_FAILED if error on render
- log (job-id, job-frame, job-result) returns a log file for the specific frame. Same errors as render.
POST
- job (job-id, job-frame, slave-blacklist) announce a job to the server. Body is a series of file path for the files needed by the job. If server doesn't have access, it will return an error and client as to reply with "PUT file" to upload the files
- slave announce a new slave connection to the server. Body is name of slave on first line, stats on second.
- cancel (job-id) cancel a job or all jobs if job-id is empty
PUT
- file (job-id, job-frame) send a job file to the server (job-frame can be "frame" or "frame_start:frame_end")
- render (job-id, job-frame, job-result) send a render result to the server
- log (job-id, job-frame, job-result) send a log file to the server
job-id is the hash of the blend file (therefore unique per file)
Need to add notes here about where files are saved on master and node and how balancing algo works (or doesn't).
[edit] Feature List
- DONE: support paths instead of files
- DONE: client-server-slave: restrict job to specific nodes
- DONE: client-server-slave: view node machine stats
- DONE: client-server-slave: reporting error logs back to manager (all stdout and stderr from nodes)
- DONE: Cancel jobs
- DONE: Restart error frame
- DONE: Disable crash report on windows
- DONE: Dispatch more than one frame at once (a sequence of frames)
- DONE: Blacklist slave that errors on frame after reset
- DONE: Multiple paths on job announce
- DONE: Delay job until all files accounted for
- DONE: Frame range restrictions (ie: send point cache files only when needed for the range of frames)
- DONE: Send partial logs to master
- Set slaves to copy results on network path
- client-server: archive job (copy source files and results)
[edit] API Feature Wishlist
This is a list of blender code I would need to make netrender better. Some of them are bugs, some are features that should (hopefully) eventually be there.
- RNA subtypes in Python (PROP_DIRPATH would be especially nice)
- API access to jobs, to be able to run masters and slaves in the background as well as render job notifiers on the client.
- Render result from multilayer image in memory







