From BlenderWiki

Jump to: navigation, search

Introduction

There may be the need, for someone, to obtain an offline copy of the Blender Manual Wiki Pages, the most "up to date", the better. Maybe you don't have a cheap internet connection in the place where you use or learn Blender, or maybe you are in the need of use Blender Manual on a train with your notebook.

If you have this need, one way could be to "mirror" the wiki locally on your system: this could be done with utilities such as wget and similar. This approach allows you to "sync" with the wiki web site when you are online, and be able to browse a copy of the entire web site offline. Up to now, there are more than 250 separate web pages to read through...

But most of the times it would be better to have a single "book" instead of a few hundreds of html files. Having such book-style document, probably a single document like a PDF, would allow you to read the Blender Manual as a regular book, but it could also give you the chance to print it out on paper, and it would be easy to share all that content easily, too, since it's a single file.

I was in the need of such a document, and couldn't find anything around that was up to date or easy to update. So I worked out a method to obtain it, and shared the result with the Blender community. Up to now I tried to update it about once a month, and you can find a link to the most updated PDF right in this page on the small section on the right, it's named something like "PDF Manual (NEW!)" :)

I am reporting here the result of my "efforts" (it was really fun, after all :)) for anyone that wishes to learn how to do this, and hopefully improve it.

The HTMLdoc method

I came across HTMLdoc (http://www.htmldoc.org/): this great GPL software can load html files or download web pages, convert them to a single pdf, ps or html file, allowing you to set-up the conversion in many ways. With this software, you have to setup a "book" file, which contains all the needed settings (quality, options etc) and an ordered list of html files, or links to web pages to convert in a single output document.

It was exactly what I was looking for!

This software did a good job, but since the html files I feeded it with were wiki pages, there was a lot of stuff really not content-related like headers, footers, side "utility" sections, links to prev/next page, and so on. That stuff is perfect if you read the manual online (or on a mirror instance) but they are not "real content". And so there was a lot of unuseful stuff in the PDF output of HTMLdoc.

So I ended up writing a small and hackish PHP (http://www.php.net) script, a language I'm used to, which mainly does some preprocessing in order to remove everything that is NOT manual content. Then HTMLdoc doese a better job and the PDF output is quite nice.

The last issue i had to face with was the lack of bookmarks in the PDF, then I found a solution using Jpdfbookmarks (http://jpdfbookmarks.altervista.org/): this is a free CLI and GUI java tool to import/export and edit PDF bookmarks.

In the next section you will find details about the method, and all you need to do update the PDF yourself. Hopefully :)

Details

The workflow, as I do it now, can be described as:

  • grab the URL of the main wiki page, where there's a link to every page of the manual
  • with a PHP script, parse that page text to collect all the URLs in the right order, and save them in a simple local text file (links.txt), one URL, one row.
  • with a PHP script, parse that links.txt file and download an html file from any URL
  • after downloading, the PHP script parses the text of each html page looking for "known patterns" that allow us to "delete unneeded text", "find and replace" strings, and "fix" small things which could give problems later (more details below)
  • after this, the PHP script saves the "fixed" html pages locally in a defined folder, and also creates the "instructions file" for HTMLdoc, which now needs to know the local (!) html files location from which build a better PDF.
  • then I run HTMLdoc feeding it with the "instructions file" just created with PHP while downloading. It takes a while because HTMLdoc still has to download every image html pages are referring to!
  • when HTMLdoc finishes, and the PDF is produced, in a target folder/filename, there's one more (optional?) step: add PDF bookmarks.
  • This is where Jpdfbookmarks comes in: it's a bit difficult to explain but with this handy utility, I can edit single bookmarks, add or remove them, change the page they're pointing to, if needed, and also "apply" a whole structure of PDF bookmarks (as a simple text file) to a PDF which has no bookmark, and "dump" the whole PDF bookmarks structure of a PDF (which already has them) out to a simple text file.
    The first time, I had to "build" a bookmarks text file and make every bookmark point to its page nearly by hand, and it was soo boring :). But now when I update the PDF I "dump" the previous release bookmark text file from the previous PDF, and "apply" it to the new PDF, then I only have to "fix" the bookmarks for new pages, deleted pages, longer or shorter sections, end so on. Still boring, but the result is great!