Skip to main content
Version: 5.x
Supported on
Snapchat

2D Animated TTS

This guide dives into how you can utilize the phoneme info in the Text To Speech and covers several examples such as 2D Animated TTS Face and 2D Animated TTS Conversation.

Examples include several helper scripts that animate 2D mouth images based on what phoneme is said at what time. With Automatic Voice Style Selector (Auto Style Selector), the voice style will change based on the context.

The main asset used for both examples is Text To Speech Module. We can find it in the Asset Browser panel.

Currently TTS supports US English with two voices, six different voice styles for both voices and the ability to tweak the pace of TTS speech playback.

Animated TTS Face

The 2D Animated TTS Face asset is available in the Lens Studio Asset Library. Import the asset to your project, create new Orthographic camera and place the prefab under it.

Phoneme Controller​

The Phoneme Animation is driven by the main helper script PhonemeController (attached to Lips With Eyes TTS [EDIT_ME] scene object). You can see a Text To Speech Module and an Audio Component connected to the Phoneme Controller script component through inputs.

TTS will generate TTS AudioTrackAsset, which can be attached to an Audio Component as the Audio Track asset to play. For more information related to Audio Component, please check out Audio Component and Audio Component API.

With Use Audio Output enabled, TTS audio will play with Audio Output. Click on Helpers and select Audio Output. With the Sample Rate dropdown list, you can then adjust the Sample Rate of the TTS audio.

To learn more about Audio Output, please visit the Audio Output example guide.

Selecting Voice Style

Phoneme Controller script allows to configure different options for TTS Voice: Voice Name, Voice Style and Voice Pace.

With Auto Style Selector enabled, TTS audio will return with a voice style based on the context.

  • Voice Name - TTS supports two voices: Sasha and Sam.
  • Voice Style - TTS supports six voice styles for Sasha and Sam.
  • Input Mode - There are two different input modes for the animations:
    • Image - An image to replace the base texture on.
    • Mesh - A material to replace base texture on.

For this example, we set Input Mode to Image and a Mouth Screen Image scene object as an Image input.

  • Voice Pace - TTS supports playback speed: 0.75X, 1.0X, 1.25X, 1.5X.

Animation Textures

Currently Phoneme supports twelve different mouth shapes. You can replace the textures with your own assets to see different results.

Let's a look at the main functions for generating and playing the TTS Audio and phoneme animation.

To do so unpack the imported asset in the Asset Browser and open the PhonemeController.js script.

Speak: pass input text to generate TTS.

function speak(text) {
var options = getOptions();
script.tts.synthesize(
text,
options,
phonemeTTSHandler,
phonemeTTSErrorHandle
);
}

phonemeTTSHandler: will be called once the audio generation is completed, and receives four parameters: Audio Track Asset, WordInfos, PhonemeInfos and Voice Style.

  • Audio Track Asset: TTS Audio File.
  • WordInfos: we can also get word infos for timing details for word pronunciation.
  • PhonemeInfos: phoneme infos to animate textures.
  • Voice Style: TTS Voice Style.
function phonemeTTSHandler(audioTrackAsset, wordInfos, phonemeInfos, voiceStyle) {
...
}

Once phonemeTTSHandler is called, it will play the TTS audio as well as sync the different textures based on the phoneme info and timeline with the UpdateEvent.

phonemeTTSErrorHandler will be called if there is an error and you will receive a message of the error code and its description.

function phonemeTTSErrorHandler(error, description) {
...
}

Setting Text with Keyboard

Example includes some UI elements to allow user input.

The TTS audio and mouth animation will auto play on Lens start (or when Preview panel is reset). Click the Tap to edit button to edit the text with the keyboard. Click the Tap to play button to play the TTS audio as well as the mouth animation.

UI Controller script attached to UI Controller [EDIT_ME] scene object is used to handle the UI interaction.

To learn more about Keyboard, please visit the Native Keyboard guide.

To hide the Tap to edit button for image or video capture, use the SnapImageCaptureEvent and SnapRecordStartEvent.

script.createEvent('SnapImageCaptureEvent').bind(function () {
script.keyboardButtonImage.getSceneObject().enabled = false;
});

script.createEvent('SnapRecordStartEvent').bind(function () {
script.keyboardButtonImage.getSceneObject().enabled = false;
});

Blink Animation is achieved with Lips Eye Material and Tween Helper Scripts. You can find the Lips Eye Material in the Asset Browser.

Left-click on the material to select it. The Open Close field represents the open or close state of the eyes. Modify the int from 1 to 0. You can see the eye closed in the Preview panel. With the material, only the upper half of the texture will be replaced by the Eye_Close Texture.

You can check the Material Editor page to learn more details about the material.

With the Tween Value script located on the Mouth Screen Image object, you can set up a ping pong loop the Open Close Eye value between 0 to 1 to make the blink animation.

Animated TTS Conversation Example

The 2D Animated TTS Conversation Example asset is available in the Lens Studio Asset Library. Import the asset to your project and place the prefab under the Camera Object.

In this example, you can modify lines for the two robot characters. Based on the context, the characters will speak with different voice styles. Click Tap to play button to play the conversation.

You can edit the lines within the Conversation [EDIT_ME] javascript file. Unpack imported asset if needed, locate the script file and double click to edit it.

global.characters = ['Character1', 'Character2'];
global.messages = [
new Message(global.characters[0], 'What a bummer'),
new Message(global.characters[1], 'Cheer up my friend! What a lovely day.'),
new Message(
global.characters[0],
"No the weather is really bad here. It's raining again"
),
new Message(
global.characters[1],
"Don't worry the sun will shine on us again and it will be great!"
),
];

This Example utilizes same PhonemeController as in previous example. This script is attached to Pipes TTS [EDIT_ME] scene objects as shown on image below.

The characters are attached to the user's head. The limbs are dynamically simulated with real-world physics effects. The characters have Physics Body Component and user's head has Physics Collider Component. With the Physics System, characters are bound to the user's head with Physics Constraint Component in the Physics Objects -> Robot Right -> Body -> Constraint object.

To learn more about Physics, check out Physics page.

Previewing Your Lens

You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.

Don’t forget to turn on the sound on your device!

Was this page helpful?
Yes
No