Skip to main content
Version: 5.x

Text To Speech

Text To Speech is available in Lens Studio Asset Library. Import the asset to your project, create new Orthographic camera and place the prefab under it.

The Text To Speech template demonstrates how you can use the Text To Speech in the Lenses. The template contains several helpers that you can use to create text to speech experiences.

Guide

The template has three different examples shows how to use Text To Speech:

  • Greeting Example: Generating a simple greeting Text To Speech audio based on location and weather information.
  • Feeling Example: Generating a Text To Speech audio based on the button selection.
  • Automatic Voice Style Example: Generating a Text To Speech audio with different voice styles based on the context.

When we open the template, we can find the examples in the Scene Hierarchy panel.

Text To Speech Module

The main asset used for Text To Speech is Text To Speech Module. We can find it in the Asset Browser panel.

Currently TTS supports US English with two voices, six different voice styles for both voices and the ability to tweak the pace of TTS speech playback.

TTS Controller

The Text To Speech is driven by the main helper script TTSController under the TTS Controller object. We attach the Text To Speech Module and an Audio Component to the TTSController Script Component for generating and playing TTS Audio in Lens Studio.

TTS will generate TTS AudioTrackAsset, which can be attached to an Audio Component as the Audio Track asset to play. For more information related toAudio Component, please check out Audio Component and Audio Component API.

With TTS Controller, you can choose different options for TTS Voice: Auto Voice Style Selector, Voice Name, Voice Style and Voice Pace.

Auto Voice Style Selector

With Auto Voice Style Selector enable, the TTS Audio will has different voice styles based on the context.

Voice Name

TTS supports two voices: Sasha and Sam.

Voice Style

TTS supports six voice styles for Sasha and Sam.

Voice Pace

TTS supports playback speed:0.75X, 1.0X, 1.25X, 1.5X.

Preview TTS

You can preview TTS voice by enabling the Preview TTS checkbox and filling in the Preview Text. The preview text will automatically be spoken on Lens start.

Greeting Example

Now let’s take a look at the greeting example. In this example, we can generate Text To Speech Audio from a Screen Text Object. In addition, we will animate the text based on the timing details for word pronunciation.

TTS Input Text

In the Scene Hierarchy panel, click on the Orthographic Camera -> Greeting UI -> TTS Input Text. We can see here the text input is “Hello there! Now it is WeatherCondition in City!”. Here we are using Dynamic Text as the text input. The WeatherCondition and City will change based on the location and weather information.

Feel free to try different inputs for TTS! Text Input supports text in English only. Non English characters will be stripped.

To get real time data for Dynamic Text, push your Lens to Snapchat.

Greeting Controller

Click on the TTS Examples -> Greeting object, we can see the main script for this example, GreetingController Script Component in the Inspector Panel.

  • TTS Input Text: use TTS Input Text Component as TTS text input
  • Tap to Speech Behavior: disable Tap Behavior script when playing TTS Audio
  • Speaker Hint: start tween animation when start to play TTS Audio.

Now let’s take a look at the main functions for generating and playing the TTS Audio.

Speak: pass input text to the getTTSResults global function in the TTS Controller.

function speak(text) {
global.getTTSResults(text, TTSCompleteHandler, TTSErrorHandler);
}

OnTTSCompleteHandler: will be called once the audio generation is completed, and receives five parameters: Audio Track Asset, WordInfos, Audio Component , Phoneme Info and Voice Style.

  • Audio Track Asset: TTS Audio File.
  • WordInfos: we can also get word infos for timing details for word pronunciation.
  • Audio Component: a reference to the Audio Component in the TTS Controller.
  • PhonemeInfos: phoneme infos to animate textures.
  • Voice Style: TTS Voice Style.
function TTSCompleteHandler(audioTrackAsset, wordInfos, audioComponent, phonemeinfo, voiceStyle) {
...
}

OnTTSErrorHandler: will be called if there is an error: receives a message of the error code and its description.

function TTSErrorHandler(error, description) {
...
}

Tap to Speech Behavior

In the Scene Hierarchy panel, click on the TTS Examples -> Greeting -> Tap to Speech Behavior object. In the Inspector panel, we can see here we are calling the speakWithTTSinput function in the  GreetingController to generate TTS Audio when tapping on the screen.

With the speakWithTTSInput function, we disable the Tap Behavior script when playing TTS Audio, check the TTS Input Text is valid, and pass the TTS Input Text to the speak function to generate the TTS Audio.

function speakWithTTSInput() {
script.tapBehavior.enabled = false;
if (script.ttsText.text == '' || script.ttsText.text.includes('Unknown')) {
speak("Hello there! I can't get the weather information");
return;
}
speak(script.ttsText.text);
script.ttsText.text = 'Generating TTS audio...';
}

Play TTS Audio  

When audio generation is completed, call playTTSAudio. We pass Audio Track Asset to the Audio Component and play the audio.

function TTSCompleteHandler(audioTrackAsset, wordInfos, audioComponent, phonemeinfo, voiceStyle) {
...
playTTSAudio(audioTrackAsset, audioComponent);
}
function playTTSAudio(audioTrackAsset, audioComponent) {
audioComponent.audioTrack = audioTrackAsset;
audioComponent.play(1);
...
}

Set up Speaker Hint UI when start to play TTS Audio

In the Scene Hierarchy panel, click on the Orthographic Camera -> TTS UI -> Speaker Hint Screen Image. We can see here is a TweenScreenTransform Script Component.

function playTTSAudio(audioTrackAsset, audioComponent) {
...
global.tweenManager.startTween(script.SpeakerHint, "AUDIOPLAYTWEEN");
...
}

Animate Text Component when playing TTS Audio

When audio generation is completed, we save the Word Infos to the timeline array.

var timeline = [];
function TTSCompleteHandler(audioTrackAsset, wordInfos,audioComponent, phonemeinfo, voiceStyle) {
for (var i = 0; i < wordInfos.length; i++) {
timeline[i] = {time: wordInfosTimeToSecond(wordInfos[i].startTime), word:wordInfos[i].word};
}
...
}
function wordInfosTimeToSecond(time) {
return time/1000;
}

The words the synthesized audio was generated for (as text might be expanded during the synthesize process, there might be a slight variation between the input text and the words returned).

The time information in the Start Time and the End Time is in milliseconds when the word started/ended in the audio.

We also set ttsStart boolean to true and reset TTS Input Text to empty.

var ttsStart = false;
function TTSCompleteHandler(audioTrackAsset, wordInfos,audioComponent, phonemeinfo, voiceStyle) {
...
playTTSAudio(audioTrackAsset, audioComponent);
}
function playTTSAudio(audioTrackAsset, audioComponent) {
...
script.ttsText.text = "";
...
ttsStart = true;
}

With ttsStart boolean being true, we start to animate the text based on Word Infos stored in the timeline in the UpdateEvent. And reset the variables when Audio ends.

var updateTime = 0;
var currentCount = 0;
script.createEvent("UpdateEvent").bind(function(eventData) {
...
if (!ttsStart) {
return;
}
updateTime = updateTime + eventData.getDeltaTime();

if (updateTime >= timeline[currentCount].time) {
script.ttsText.text = script.ttsText.text +" " + timeline[currentCount].word;
currentCount++;
}

if (currentCount == timeline.length) {
ttsStart =false;
currentCount = 0;
updateTime = 0;
timeline = [];
script.tapBehavior.enabled = true;
}

});

Feeling Example

Now let’s take a look at the second example. Enable the Feeling Example in the ``Scene Hierarchy panel and Reset the Preview!

With this example, we can choose the subjects and emotions to make up a simple sentence with 3D buttons. And pass the sentence as the text input to generate TTS audio. Once the TTS Audio is successfully generated, we can then play/pause/resume the Audio with the play button.

Player UI Controller

In the Scene Hierarchy panel, click on the TTS Examples -> Feeling -> UI Object -> Player UI object. When the Play button is pressed and the player state is the Reset State, we will call the speak function to generate the TTS Audio based on the text result from the Words Combination Controller’s button selections.

function buttonPressed(text) {
if (playerState == PlayerState.RESET) {
...
speak(text);
}
...
}

Here we also set up button press animation and the Loading Icon UI for Loading State.

function buttonPressed(text) {
if (playerState == PlayerState.RESET) {

setButtonPress();
setLoading();
...
}
...
}

Same with the first example. We pass the text to the getTTSResults global function in the TTS Controller.

function speak(text) {
global.getTTSResults(text, TTSCompleteHandler, TTSErrorHandler);
}

OnTTSCompleteHandler: will be called once the audio generation is completed, and receives five parameters: Audio Track Asset, WordInfos, Audio Component , Phoneme Info and Voice Style.

  • Audio Track Asset: TTS Audio File.
  • WordInfos: we can also get word infos for timing details for word pronunciation.
  • Audio Component: a reference to the Audio Component in the TTS Controller.
  • PhonemeInfos: phoneme infos to animate textures.
  • Voice Style: TTS Voice Style.
function TTSCompleteHandler(audioTrackAsset, wordInfos, audioComponent, phonemeinfo, voiceStyle) {
...
}

OnTTSErrorHandler: will be called if there is an error: receives a message of the error code and its description.

function TTSErrorHandler(error,description) {
...
}

Play TTS Audio

When audio generation is completed and the Total Audio Time from the Word Infos is valid, we call setTimeline. We pass Audio Track Asset and Total Time for TTS Audio to the Audio Component.

var ttsAudio;
function TTSCompleteHandler(audioTrackAsset, wordInfos,audioComponent, phonemeinfo, voiceStyle) {
ttsAudio = audioComponent;
var totalTime = wordInfosTimeToSecond(wordInfos[wordInfos.length-1].endTime);
if (totalTime) {
setTimeline(audioTrackAsset,totalTime);
}
}
function setTimeline(audioTrackAsset,totalTime) {
ttsAudio.audioTrack = audioTrackAsset;
ttsAudio.play(1);
...
}

Set up UI when start to play TTS Audio

Here we also set up Speaker Hint, timeline text, timeline animation and the Playing icon UI for Playing State.

function setTimeline(audioTrackAsset,totalTime) {
...
script.timelineText.text = getTimeFormat(totalTime)+" || 00:00";
global.tweenManager.startTween(script.SpeakerHint, "AUDIOPLAYTWEEN");
currentTotalTime = totalTime;
setPlaying();
}

Animate Timeline Object when playing TTS Audio

When audio generation is completed, we update currentTotalTime from Word Infos - Total Time for TTS Audio.

var currentTotalTime = 0;
function setTimeline(audioTrackAsset,totalTime) {
...
currentTotalTime = totalTime;
setPlaying();
}

With playerState being Playing State, we start to animate the timeline Object in the UpdateEvent. And reset the players when the audio ends.

var updateTime = 0;
script.createEvent("UpdateEvent").bind(function(eventData) {
...
if (playerState!==PlayerState.PLAYING) {
return;
}
updateTime = updateTime + eventData.getDeltaTime();
script.timelineMaterial.mainPass.progressBar = map(updateTime,0,currentTotalTime,0,1);
if (updateTime >= currentTotalTime) {
resetPlayer();
}
});

Pause/Resume the Audio when playing TTS Audio

With playerState being Playing State, if the player button is pressed, we will pause the audio. If the audio is paused, we will resume the audio.

function buttonPressed(text) {
if (playerState == PlayerState.RESET) {
...
} else if (playerState == PlayerState.PLAYING) {
setPause();
} else if (playerState == PlayerState.PAUSED) {
setResume();
}
}
function setResume() {
if (ttsAudio.isPaused()) {
ttsAudio.resume();
setPlaying();
}
}
function setPause() {
if (ttsAudio.isPlaying()) {
playerState = PlayerState.PAUSED;
ttsAudio.pause();
script.playerButton.mainPass.baseTex = script.playTexture;
setButtonRelease();
}
}

For more information related to Audio Component, please check out Audio Component.

Automatic Voice Style Example

Now let’s take a look at the third example. Enable the Auto Voice Style Example in the Scene Hierarchy panel and Reset the Preview!

With this example, we can swipe left and right to pass different sentences as the text input to generate TTS audio. The audio will be returned with different voice styles based on the content. Different voice style icons will light up based on the voice style result.

UI Discrete Picker

With UI Discrete Picker under Orthographic Camera -> Auto Voice Style Example -> UI Panel -> UI Discrete Picker object , you can then swipe and change text options as TTS input.s

Auto Voice Style Controller

Same with the first two examples. We pass the text to the getTTSResults global function in the TTS Controller. With OnTTSCompleteHandler, we will get Voice Style for the TTS Audio.

function TTSCompleteHandler(
audioTrackAsset,
wordInfos,
audioComponent,
phonemeInfo,
voiceStyle
) {
for (var i = 0; i < wordInfos.length; i++) {
timeline[i] = {
time: wordInfosTimeToSecond(wordInfos[i].startTime),
word: wordInfos[i].word,
};
}

if (script.IconController.api.onVoiceStyleDetected) {
script.IconController.api.onVoiceStyleDetected(voiceStyle);
}

playTTSAudio(audioTrackAsset, audioComponent);
}

Icon Controller

Then we pass the Voice Style to Icon Controller under Orthographic Camera -> Auto Voice Style Example -> Icon Panel object to change the base color of the icon.

Previewing Your Lens

You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.

Don’t forget to turn on the sound on your device!

Was this page helpful?
Yes
No

AI-Powered Search