From BlenderWiki

Jump to: navigation, search
Note: This is an archived version of the Blender Developer Wiki. The current and active wiki is available on wiki.blender.org.

Lip Sync

Synopsis

The main goal of this project is improving the lip sync animation workflow for users, by supporting with modern automation algorithms and a revised workflow, based on the audio engine but with the main focus on the user interface and usability. It should be a feature that artists actually use, so it's very important to work together with artists.

Benefits for Blender

Lip sync is an important part of computer animation. It is crucial that animators have all the necessary tools to perfectly animate the facial expressions of characters to the speaking the character does. This projects intends to provide artists with all necessary tools.

Deliverables

  • Basic UI and workflow implementation: Lip Sync Animation via Audio window
  • Testing files (rig and sound files) and user documentation, working together with artists
  • Evaluation of speech libraries
  • Using a library or implementing an own phoneme detection algorithm
  • Constant improvements on the UI and workflow based on artist input
  • Coarticulation improvements (optional)
  • Emotion detection (optional)

Project Details

The top priority for this project is to help the artists working on lip sync. In contrast to previous applications, this project will start with the user interface, before any phoneme detection algorithms will be developed. It's very important to work together with artists, so I'll try to get at least two artists as supporters for the project to give constant feedback on the user interface and workflow. I especially want to assist project gooseberry artists.

User interface wise I thought about something similar to the lip sync interface of most common lip sync tools, which would bring back an advanced audio window, with the wave display and a new phoneme editor. After talking to several artists and other developers it might be best to link the phonemes to the visemes via (driver) bones with a range between 0 (viseme completely off) to 1 (viseme completely on) with a specific name per phoneme. However some had concerns that this might be a too much restricted workflow, so we will try to get it working so that any way of facial rigging can use the feature.

One possible workflow might end up like this: First the animator creates the viseme poses for the given (fixed or dynamically configurable) set of phonemes and driver bones to control them with a specific name per phoneme. Then he opens the audio window which is split into a wave display and a phoneme editor, he can now run the phoneme detection algorithm (this will be developed after the general workflow works though) and then edit its result or directly start the editing (add, change, move, remove, set strength, etc.). The rig is then automatically animated via these phonemes. I want to make the system as convenient as possible, so that the artist can run the phoneme detection algorithm again without his changes getting lost for example.

I've talked to Tom (aka LetterRip) a bit and here. About every lip sync tool out there has a kind of similar UI. So you always have at least a timeline with the audio and phonemes next to it. For blender I think it might also be nice to have a spectrum analysis shown and also curves with the strength of the visemes. A vertical interface (most are horizontal) is also interesting and can be implemented too (so the user can choose what he prefers).

Another thing Tom showed me is the book "stop starring" with very useful info on facial animation. A part of it is coarticulation (phonemes/visemes influencing each other) which leads to phoneme to viseme grouping and collapsing. So for example there are three phonemes in sequence, the middle one is normally spoken with a closed mouth, the others with the mouth open, so you have the mouth open for the normally closed one too. How blender/the interface could help with this has still to be researched and for sure is easier when the basic UI and workflow already work. This feature is quite advanced already and might be considered at a later point of time in the project, if there is time.

Originally I planed to implement phoneme detection algorithms on my own. However, it's not always necessary to reinvent the wheel, so the first step is to have a look at existing libraries, like CMUsphinx:

http://cmusphinx.sourceforge.net/

I already talked to a CMUsphinx developer and he told me that it's no problem to get the phoneme and time info of the recognition. I also already got pocketsphinx with audaspace running in a Qt test application: http://pastebin.com/1saq7KR8

Using a library for the phoneme detection should enable me to spend more time on the interface.

In case there is no library really suitable for phoneme detection, here are two papers about the phoneme detection in audio files, which are targeted directly for automated lip-sync and not general phoneme detection as open source speech recognition libraries:

If it happens that the project is finished too early, there is the possibility of getting emotional information out of the sound file and also adding this as a feature for animation. This is also something that has to be researched in more detail, but I consider this as a bonus tasks if there's still time at the end of the project and only if the workflow for lipsync is already very good and all the artists are already happy (which is unlikely to happen :D).

Project Schedule

As summer holidays start in the beginning of July in Austria, I'll be busy with university in the beginning of the official GSoC time, but as every year I'll start working right away in the community bonding period. When the summer holidays begin I will start working full time on the project.

A basic user interface and workflow should be ready by the beginning of July when summer holidays start here, but latest at the midterm evaluations. After that I'll work on phoneme detection, while constantly improving the workflow based on artist feedback.

I have no clue how long the implementation of the phoneme detection algorithm really takes if we have to do it by ourselves. If a library can be used that should be easier and from a first evaluation/use of CMUsphinx this already looks promising. Anyway, you should already know that I prefer to finish things I start, so even if GSoC was over I'd finish the project.


Game Engine Audio

Synopsis

In the last audio projects for blender starting from version 2.5, blender's audio system got modernized, regaining all the features from the 2.4x and previous series and some more, like placing speakers in the 3D scenes for 3D audio animations. I recently started separating the audio library audaspace from blender to be released as standalone library. The hope behind this step is to get a more widespread user and especially developer base for the library, as I am stil the only audio developer for blender for 5 years now. The benefit for blender is obvious, that all the features that will be added to the external library will also be available in blender and I will always keep an eye on the special audio needs of blender. The current status of the external library is, that the major refactor is done and it's now time to reintegrate it into blender, which is already done and working on Linux. Jens Verwiebe and Jürgen Hermann are currently helping out to also get the library working on Mac and Windows. As the big refactor is done, it is now time to add features again and this is what I would like to do this summer.

Benefits for Blender

Blender's audio system is ready to be taken to the next level. Game developers will benefit from a lot more features that ease the integration of awesome sound scenery into their games.

Deliverables

  • Native Audio Backends: We keep having problems with the audio backends, so the plan is to implement new native backends for each operating system. Alsa for Linux, Microsoft's Core Audio for Windows and Apple's Core Audio for Mac. This should also resolve recent bug reports that OpenAL prevents the computer from sleeping with OpenAL or SDL on Windows and Mac, as these are bugs/missing features inside those libraries.
  • Audio Input: There is currently no way to access the microphone or other sound sources within the python API, this feature should add proper interfaces to access them.
  • Effects API: This API enables adding and removing effects to/from playing sounds, like fading, filtering or similar.
  • Play Groups: Play Groups represent categories like background music, speech, sound effects, etc. where the volume and possibly other settings can be adjusted separately for a group of sounds.
  • Dynamic Music: A feature to adapt he background music of the game based on what's currently happening, or flags API wise. For example: a game has some random background music while the player runs around and then he starts fighting against a bunch of enemies and the music gets more exciting. This interface should then use predefined loopable music samples and transition nicely between them changing the mood of the background music according to the action that is currently going on.
  • Random Sounds: As repeating sounds all over again sounds boring and is easily noticed by the player (for example footsteps) so this functionality should handle a list of sounds to be played instead of a single one, with the ability to choose sounds randomly or sequential when it is triggered.
  • Reverabation Engine: A powerful reverbation engine allows the simulation of different environments like the echos in hallways or bathrooms with tiles on the walls.
  • Environmental Audio: Enables the game engine to have different sound modifications (filters, reverbation, etc.) which are spatially dependent on sound source and listener position. For example a gunshot in a tiled bathroom sounds different than in a room with sound absorbing wall.
  • Binaural Audio: This is a technique using HRTFs to get 3D positional audio effects with headphones.
  • Resource Manager: A python class to handle loading of sounds for those games that don't use the sound actuators directly. While smaller samples that are played back frequently should be stored uncompressed in memory, music should be live streamed from a file. This class should handle these use cases and prevent samples from being loaded multiple times.
  • Adaptive Filters: Filters like echo cancelation and similar that can be implemented using LMS style algorithms. These are useful for in-game voice communication or cancelation of sounds that are played back while recording audio from the microphone.
  • Speech Recognition: Newest generation consoles are using speech recognition as part of the user interface. There are open source libraries like pocketsphinx that do speech recognition as well and can be added as another input device for games with the BGE as well.
  • Text To Speech: The opposite of speech recognition is TTS, another feature that might come in handy for game developers, especially in the open source scene where it's not always possible to hire professional speakers for all spoken samples. A promising library to use is espeak.
  • Language Settings: Speech samples should be based on which language the game is played in, so this feature is adding localization features to the audio engine.
  • Spectral Analysis: Adding FFT functionality to the engine allows for spectral analysis of sounds, which in turn enables more complex functionalities, like equalization.

Project Details

I added details to the above deliverables already, as it is easier to just have one feature list.

Project Schedule

As summer holidays start in the beginning of July in Austria, I'll be busy with university in the beginning of the official GSoC time, but as every year I'll start working right away in the community bonding period. When the summer holidays begin I will start working full time on the project.

Most of the features above are independent of each other and some are easier, some are more difficult to implement. The order of the feature list, is one possible schedule, but it can be rearranged, based on the wisehs of the community and mentor(s). I don't think that all of the features can be implemented within the GSoC time frame, but we can choose which features are mandatory for the success of the project.

Bio

My name is Jörg Müller, I'm 25 years old, from Austria and a master's student of telematics at the Graz University of Technology. You can contact me via mail nexyon [at] gmail [dot] com or directly talk to me on irc.freenode.net, nickname neXyon.

In 2001 shortly after getting internet at home I started learning Web Development, HTML, CSS, JavaScript, PHP and such stuff. Soon after that in 2002 I got into hobbyist game programming and as such into 2D and later 3D graphics programming. During my higher technical college education I got a quite professional and practical education in software development also including project management also doing projects for external companies. During this time I also entered the linux and open source world (http://golb.sourceforge.net/). Still into hobbyist game development I stumbled upon the Blender game engine and was annoyed that audio didn't work on my system. That's how I got into Blender development, I am an active Blender developer since summer 2009 now, where I mainly worked on the audio system as the only (unfortunately) developer.

In my spare time apart from that I do sports (swimming, running, other gymnastics), play music (piano, drums) and especially like playing video games and going to the cinema.

  • 09/2003 - 06/2008 Higher technical college, department electronic data processing and business organisation
  • 07/2008 - 12/2008 Military service
  • 01/2009 - 02/2010 Human medicine at the Medical Unversity of Graz, already doing curses of Telematics at the Graz University of Technology
  • 03/2010 - 01/2013 Telematics bachelor at the Graz University of Technology
  • 01/2013 - today Telematics master at the Graz University of Technology