Text To Speech
Text To Speech
is available in Lens Studio Asset Library. Import the asset to your project, create new Orthographic camera and place the prefab under it.
The Text To Speech template demonstrates how you can use the Text To Speech in the Lenses. The template contains several helpers that you can use to create text to speech experiences.
Guide
The template has three different examples shows how to use Text To Speech:
- Greeting Example: Generating a simple greeting Text To Speech audio based on location and weather information.
- Feeling Example: Generating a Text To Speech audio based on the button selection.
- Automatic Voice Style Example: Generating a Text To Speech audio with different voice styles based on the context.
When we open the template, we can find the examples in the Scene Hierarchy
panel.
Text To Speech Module
The main asset used for Text To Speech is Text To Speech Module
. We can find it in the Asset Browser
panel.
Currently TTS supports US English with two voices, six different voice styles for both voices and the ability to tweak the pace of TTS speech playback.
TTS Controller
The Text To Speech is driven by the main helper script TTSController
under the TTS Controller
object. We attach the Text To Speech Module
and an Audio Component
to the TTSController Script Component
for generating and playing TTS Audio in Lens Studio.
TTS will generate TTS AudioTrackAsset
, which can be attached to an Audio Component
as the Audio Track asset to play. For more information related toAudio Component
, please check out Audio Component and Audio Component API.
With TTS Controller, you can choose different options for TTS Voice: Auto Voice Style Selector, Voice Name, Voice Style and Voice Pace.
Auto Voice Style Selector
With Auto Voice Style Selector enable, the TTS Audio will has different voice styles based on the context.
Voice Name
TTS supports two voices: Sasha and Sam.
Voice Style
TTS supports six voice styles for Sasha and Sam.
Voice Pace
TTS supports playback speed:0.75X, 1.0X, 1.25X, 1.5X.
Preview TTS
You can preview TTS voice by enabling the Preview TTS checkbox and filling in the Preview Text. The preview text will automatically be spoken on Lens start.
Greeting Example
Now let’s take a look at the greeting example. In this example, we can generate Text To Speech Audio from a Screen Text Object. In addition, we will animate the text based on the timing details for word pronunciation.
TTS Input Text
In the Scene Hierarchy
panel, click on the Orthographic Camera -> Greeting UI -> TTS Input Text
. We can see here the text input is “Hello there! Now it is WeatherCondition in City!”. Here we are using Dynamic Text
as the text input. The WeatherCondition
and City
will change based on the location and weather information.
Feel free to try different inputs for TTS! Text Input supports text in English only. Non English characters will be stripped.
To get real time data for Dynamic Text, push your Lens to Snapchat.
Greeting Controller
Click on the TTS Examples -> Greeting object
, we can see the main script for this example, GreetingController Script Component
in the Inspector Panel
.
- TTS Input Text: use TTS Input Text Component as TTS text input
- Tap to Speech Behavior: disable Tap Behavior script when playing TTS Audio
- Speaker Hint: start tween animation when start to play TTS Audio.
Now let’s take a look at the main functions for generating and playing the TTS Audio.
Speak
: pass input text to the getTTSResults
global function in the TTS Controller.
function speak(text) {
global.getTTSResults(text, TTSCompleteHandler, TTSErrorHandler);
}
OnTTSCompleteHandler
: will be called once the audio generation is completed, and receives five parameters: Audio Track Asset
, WordInfos
, Audio Component
, Phoneme Info
and Voice Style
.
Audio Track Asset
: TTS Audio File.WordInfos
: we can also get word infos for timing details for word pronunciation.Audio Component
: a reference to the Audio Component in the TTS Controller.PhonemeInfos
: phoneme infos to animate textures.Voice Style
: TTS Voice Style.
function TTSCompleteHandler(audioTrackAsset, wordInfos, audioComponent, phonemeinfo, voiceStyle) {
...
}
OnTTSErrorHandler: will be called if there is an error: receives a message of the error code and its description.
function TTSErrorHandler(error, description) {
...
}
Tap to Speech Behavior
In the Scene Hierarchy
panel, click on the TTS Examples -> Greeting -> Tap to Speech Behavior object
. In the Inspector panel
, we can see here we are calling the speakWithTTSinput
function in the GreetingController to generate TTS Audio when tapping on the screen.
With the speakWithTTSInput
function, we disable the Tap Behavior script
when playing TTS Audio, check the TTS Input Text
is valid, and pass the TTS Input Text
to the speak
function to generate the TTS Audio.
function speakWithTTSInput() {
script.tapBehavior.enabled = false;
if (script.ttsText.text == '' || script.ttsText.text.includes('Unknown')) {
speak("Hello there! I can't get the weather information");
return;
}
speak(script.ttsText.text);
script.ttsText.text = 'Generating TTS audio...';
}
Play TTS Audio
When audio generation is completed, call playTTSAudio
. We pass Audio Track Asset
to the Audio Component
and play the audio.
function TTSCompleteHandler(audioTrackAsset, wordInfos, audioComponent, phonemeinfo, voiceStyle) {
...
playTTSAudio(audioTrackAsset, audioComponent);
}
function playTTSAudio(audioTrackAsset, audioComponent) {
audioComponent.audioTrack = audioTrackAsset;
audioComponent.play(1);
...
}
Set up Speaker Hint UI when start to play TTS Audio
In the Scene Hierarchy
panel, click on the Orthographic Camera -> TTS UI -> Speaker Hint Screen Image
. We can see here is a TweenScreenTransform Script Component
.
function playTTSAudio(audioTrackAsset, audioComponent) {
...
global.tweenManager.startTween(script.SpeakerHint, "AUDIOPLAYTWEEN");
...
}
Animate Text Component when playing TTS Audio
When audio generation is completed, we save the Word Infos
to the timeline
array.
var timeline = [];
function TTSCompleteHandler(audioTrackAsset, wordInfos,audioComponent, phonemeinfo, voiceStyle) {
for (var i = 0; i < wordInfos.length; i++) {
timeline[i] = {time: wordInfosTimeToSecond(wordInfos[i].startTime), word:wordInfos[i].word};
}
...
}
function wordInfosTimeToSecond(time) {
return time/1000;
}
The words the synthesized audio was generated for (as text might be expanded during the synthesize process, there might be a slight variation between the input text and the words returned).
The time information in the Start Time
and the End Time
is in milliseconds when the word started/ended in the audio.
We also set ttsStart
boolean to true and reset TTS Input Text
to empty.
var ttsStart = false;
function TTSCompleteHandler(audioTrackAsset, wordInfos,audioComponent, phonemeinfo, voiceStyle) {
...
playTTSAudio(audioTrackAsset, audioComponent);
}
function playTTSAudio(audioTrackAsset, audioComponent) {
...
script.ttsText.text = "";
...
ttsStart = true;
}
With ttsStart
boolean being true, we start to animate the text based on Word Infos
stored in the timeline
in the UpdateEvent
. And reset the variables when Audio ends.
var updateTime = 0;
var currentCount = 0;
script.createEvent("UpdateEvent").bind(function(eventData) {
...
if (!ttsStart) {
return;
}
updateTime = updateTime + eventData.getDeltaTime();
if (updateTime >= timeline[currentCount].time) {
script.ttsText.text = script.ttsText.text +" " + timeline[currentCount].word;
currentCount++;
}
if (currentCount == timeline.length) {
ttsStart =false;
currentCount = 0;
updateTime = 0;
timeline = [];
script.tapBehavior.enabled = true;
}
});
Feeling Example
Now let’s take a look at the second example. Enable the Feeling Example in the ``Scene Hierarchy panel
and Reset the Preview!
With this example, we can choose the subjects and emotions to make up a simple sentence with 3D buttons. And pass the sentence as the text input to generate TTS audio. Once the TTS Audio is successfully generated, we can then play/pause/resume the Audio with the play button.
Player UI Controller
In the Scene Hierarchy
panel, click on the TTS Examples -> Feeling -> UI Object -> Player UI object
.
When the Play button is pressed and the player state
is the Reset State
, we will call the speak
function to generate the TTS Audio based on the text result from the Words Combination Controller
’s button selections.
function buttonPressed(text) {
if (playerState == PlayerState.RESET) {
...
speak(text);
}
...
}
Here we also set up button press animation and the Loading Icon UI for Loading State
.
function buttonPressed(text) {
if (playerState == PlayerState.RESET) {
setButtonPress();
setLoading();
...
}
...
}
Same with the first example. We pass the text to the getTTSResults
global function in the TTS Controller.
function speak(text) {
global.getTTSResults(text, TTSCompleteHandler, TTSErrorHandler);
}
OnTTSCompleteHandler
: will be called once the audio generation is completed, and receives five parameters: Audio Track Asset
, WordInfos
, Audio Component
, Phoneme Info
and Voice Style
.
Audio Track Asset
: TTS Audio File.WordInfos
: we can also get word infos for timing details for word pronunciation.Audio Component
: a reference to theAudio Component
in the TTS Controller.PhonemeInfos
: phoneme infos to animate textures.Voice Style
: TTS Voice Style.
function TTSCompleteHandler(audioTrackAsset, wordInfos, audioComponent, phonemeinfo, voiceStyle) {
...
}
OnTTSErrorHandler
: will be called if there is an error: receives a message of the error code and its description.
function TTSErrorHandler(error,description) {
...
}
Play TTS Audio
When audio generation is completed and the Total Audio Time from the Word Infos is valid, we call setTimeline
. We pass Audio Track Asset
and Total Time
for TTS Audio to the Audio Component
.
var ttsAudio;
function TTSCompleteHandler(audioTrackAsset, wordInfos,audioComponent, phonemeinfo, voiceStyle) {
ttsAudio = audioComponent;
var totalTime = wordInfosTimeToSecond(wordInfos[wordInfos.length-1].endTime);
if (totalTime) {
setTimeline(audioTrackAsset,totalTime);
}
}
function setTimeline(audioTrackAsset,totalTime) {
ttsAudio.audioTrack = audioTrackAsset;
ttsAudio.play(1);
...
}
Set up UI when start to play TTS Audio
Here we also set up Speaker Hint, timeline text, timeline animation and the Playing icon UI for Playing State
.
function setTimeline(audioTrackAsset,totalTime) {
...
script.timelineText.text = getTimeFormat(totalTime)+" || 00:00";
global.tweenManager.startTween(script.SpeakerHint, "AUDIOPLAYTWEEN");
currentTotalTime = totalTime;
setPlaying();
}
Animate Timeline Object when playing TTS Audio
When audio generation is completed, we update currentTotalTime
from Word Infos - Total Time for TTS Audio.
var currentTotalTime = 0;
function setTimeline(audioTrackAsset,totalTime) {
...
currentTotalTime = totalTime;
setPlaying();
}
With playerState
being Playing State
, we start to animate the timeline Object in the UpdateEvent
. And reset the players when the audio ends.
var updateTime = 0;
script.createEvent("UpdateEvent").bind(function(eventData) {
...
if (playerState!==PlayerState.PLAYING) {
return;
}
updateTime = updateTime + eventData.getDeltaTime();
script.timelineMaterial.mainPass.progressBar = map(updateTime,0,currentTotalTime,0,1);
if (updateTime >= currentTotalTime) {
resetPlayer();
}
});
Pause/Resume the Audio when playing TTS Audio
With playerState
being Playing State
, if the player button is pressed, we will pause the audio. If the audio is paused, we will resume the audio.
function buttonPressed(text) {
if (playerState == PlayerState.RESET) {
...
} else if (playerState == PlayerState.PLAYING) {
setPause();
} else if (playerState == PlayerState.PAUSED) {
setResume();
}
}
function setResume() {
if (ttsAudio.isPaused()) {
ttsAudio.resume();
setPlaying();
}
}
function setPause() {
if (ttsAudio.isPlaying()) {
playerState = PlayerState.PAUSED;
ttsAudio.pause();
script.playerButton.mainPass.baseTex = script.playTexture;
setButtonRelease();
}
}
For more information related to Audio Component, please check out Audio Component.
Automatic Voice Style Example
Now let’s take a look at the third example. Enable the Auto Voice Style Example in the Scene Hierarchy
panel and Reset the Preview!
With this example, we can swipe left and right to pass different sentences as the text input to generate TTS audio. The audio will be returned with different voice styles based on the content. Different voice style icons will light up based on the voice style result.
UI Discrete Picker
With UI Discrete Picker
under Orthographic Camera -> Auto Voice Style Example -> UI Panel -> UI Discrete Picker
object , you can then swipe and change text options as TTS input.s
Auto Voice Style Controller
Same with the first two examples. We pass the text to the getTTSResults
global function in the TTS Controller. With OnTTSCompleteHandler
, we will get Voice Style
for the TTS Audio.
function TTSCompleteHandler(
audioTrackAsset,
wordInfos,
audioComponent,
phonemeInfo,
voiceStyle
) {
for (var i = 0; i < wordInfos.length; i++) {
timeline[i] = {
time: wordInfosTimeToSecond(wordInfos[i].startTime),
word: wordInfos[i].word,
};
}
if (script.IconController.api.onVoiceStyleDetected) {
script.IconController.api.onVoiceStyleDetected(voiceStyle);
}
playTTSAudio(audioTrackAsset, audioComponent);
}
Icon Controller
Then we pass the Voice Style
to Icon Controller
under Orthographic Camera -> Auto Voice Style Example -> Icon Panel
object to change the base color of the icon.
Previewing Your Lens
You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.
Don’t forget to turn on the sound on your device!