A real time lip-sync system the indie way
We are working on our first project, a first-person 3D narrative exploration game set in a solarpunk world. I'm going to outline the four major creative decisions that led to the need for a lip-sync system in our game.
The first choice we made, and a key difference from most narrative exploration games out there is that we chose to have NPCs that you could talk to.
The second decision was wanting unique, diverse, stylized characters. Original characters are a lot of work. You have to concept, model, texture, rig, integrate, ensure they work with your locomotion system, and that's just the surface of it, there is a lot of iteration and back and forth between these steps and it all takes time.
The third choice is that we chose to give our characters faces. Seems innocuous enough, you know, people have faces.
The final choice is the ambition to have a fully voiced dialogue, while that can be expensive we feel that an actor's contribution can really bring a character to life, and make the experience for the player more immersive.
Those are pretty hard choices, perhaps even ill-advised, to make as a micro-studio as they immediately trigger needs for many creative steps and tech systems.
The combination of having NPCs, with a face, and fully voiced creates the need for an automated lip-sync solution, as it is pretty unnerving to hear a character talking but not moving their lips. So you have to find the workflows and pipelines that will speed up and automate those steps. I'll go through these steps next.
At a high level, the system is fairly simple, we look at the dialogue audio the character is playing at any given time, measure the frequency and amplitude of the audio, and map that frequency to specific Visemes.
To build the Visemes Map for our characters we looked at this website for the references https://wolfpaulus.com/lipsynchronization/ and chose the Viseme Model with 12 Mouth Shape, so we needed to get our characters to quickly have those Mouthshapes. Time to get Rigging.
BONES AND FACE RIG
An indie studio with limited resources will have a hard time finding a rigger that is 100% dedicated to creating amazing facial rigs, but if you were careful to create a half-decent topology for the face, with edge-loops where they should be (there are plenty of References and tutorials out there to do this properly), you should be in a great position to automate this with some easy to use plugins.
We build our models in Blender, since it's a great, open-source, and best of all FREE DCC, with tons of documentation and a thriving community. For the face rig, we felt that a tool that would give us Blendshapes that could be used for animating OR capturing with LiveLink would be the best option.
Fortunately, there is such a tool and it's called FACEIT. It can give you ARKit compatible blendshapes in minutes, below you see the actual process, the video is not sped up in any way, we go from model to animation in under 3 minutes.
You'll just need to hit Bake ARKit Shape Keys, and you'll end up with a neat set of shapekeys.
For the skeleton, we have opted to use Autorig-Pro to create Rigs and animate our characters. You can easily create Unreal Engine Compatible Rigs, and it doesn't require a huge amount of knowledge about rigging (it doesn't replace the value of having an amazing rigger on your team, but it does a good job of getting you good enough result out of the box). It's fairly well documented, it's not hugely expensive. When you use autorig-pro you'll need to export your model through their interface, to make sure you get all the proper UE4 setup.
When you import your model to unreal you need to make sure that you are importing Morph Targets.
That ensured all the Blendshapes created in blender by the FACEIT plugin were brought into unreal.
You can then manipulate these Morph Target weights and see the effects on the character.
For the Lipsync tool, we then manipulate these values until we get each one of the Mouthshapes we need for our Viseme Map (this is a bit laborious).
Getting the Audio Data
Once we mapped the visemes of the characters we just needed to find out when they should be blending into each mouth shape, and we do that by sampling the audio in real-time, at 100ms intervals.
If you are using Unreal's Internal Audio engine to play your sound cues, then UE4 comes in with a nifty Audio Visualization that you just need to enable, to get that data.
Which gives you these neat nodes.
But Alas, we are using FMOD for our dialogue, which speeds up the way we handle the thousands of dialogue files, so that didn't actually help us in this case out of the box. We needed to get that information directly from FMOD, but, that is not available out of the box.
If you are an indie studio that is 99% working with blueprint, then you're in for a ride as now you need to build that functionality yourself. A little google search and voilá.
So into Visual Studio land we went and manage to get the functionality we needed to build in C++.
With the Spectrum data and amplitude of the wave, we managed to map the values to visemes. And there it is, our characters can talk!
I hope this can be helpful to anyone looking to implement a similar solution, if you want to know more, reach out.
Thank you Notes Thomas Viñas for the inspiration from this