Voice UI
This sample project is available on Lens Studio Home Page.
The Voice UI template demonstrates how you can use the Speech Recognition to incorporate voice navigation command detection based on basic natural language understanding into Lenses. The template contains several helpers that you can use to create voice experiences.
To get to know more about Speech Recognition, please check out Speech Recognition Guide to see more detailed explanations about the concepts and scripting. We recommend learning the Speech Recognition Template first to get a general understanding before diving into this template.
Guide
The template shows how to use voice navigation command detection with Speech Recognition: Voice Enabled UI Example - detects a list of in-Lens navigation commands based on basic natural language understanding on top of transcription.
Voice navigation command List
This is the list of in-Lens navigation commands we currently support: "next", "back", "left", "right", "up", "down", "different", "first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth", "ninth", "tenth".
Voice Navigation Command is different from Keyword. We don’t have to say the exact word to trigger the voice command. Take “Back
” as an example. We can say “go back”, “go to the previous one”, or “the one before” etc. To learn more about keyword detection, please check Speech Recognition Template.
- Transcription is now available only for the English language, its limitations include for example new names for things, slang words or acute accents.
- Do not play sound or speech from the Lens while activating the microphone to capture sound.
- Try to avoid background noise and far device distance while activating the microphone to capture sound.
- If the microphone is muted for more than two minutes, the transcription won't continue after unmuting, and you’ll need to reset the
Preview
panel to enable it. - If you were previously logged on to MyLenses and are having trouble seeing the preview in Lens Studio, logout from MyLenses and login to MyLenses again.
Tip: Here is how to logout and login to MyLenses.
VoiceML Module
The main asset used for Speech Recognition is VoiceML Module
. We can find it in the Asset Browser
panel. We attach it to the scripts in each example to configure settings for Speech Recognition.
In the bottom of the Preview panel
, click the microphone. Test with your voice to see the blue vertical volume meter in action to ensure you are not muted. Then try speaking anything to see the transcription and scene object react to the voice events.
This template comes with a preview video named Preview with Audio
. This preview video contains audio for Lens Studio to run VoiceML on so that you can test the template without using your microphone. However, the audio itself will not be played to your computer's speaker.
Voice Enabled User Interface Example
With this example, we can use both touch gesture and voice command to choose the buttons from the User Interface in the Lens.
If the voice event - On Listening Enabled
is successfully called. We can see the Listening Icon. Now try to speak to the microphone.
Click on Voice Enabled UI Example
in the Scene Hierarchy
panel. In the Inspector panel
, we can then see here we are using SpeechRecognition.js
in the Script Component. This is the main script we are going to use for this example. Now let’s go through the details.
Notice here for the first section, we attach the VoiceML Module
to the Speech Recognition Script Component
for Speech Recognition configuration and voice input in Lens Studio.
Basic Setting for Transcription
Navigation Command Detection is based on the transcription with Speech Recognition. Now let’s go through some basic settings for transcription in the next section.
Transcription
: final transcription.Live Transcription
: live and slightly less accurate transcription before we get the final, more accurate transcription.
Transcription Text
Here you can set the transcription result to a Screen Text with Transcription Text
enabled. With Transcription Text
enabled, we can see the transcription result. The transcription result will be used for navigation command detection.
In this example, the Screen Text Object is under Orthographic Camera -> Voice Enabled UI -> Transcription Text
.
Speech Context
We can also add speech contexts to the transcription and boost some of the words for specific transcription scenarios. Use this when transcribing words which are rarer and aren’t picked up well enough by the Lens. The higher the boost value will be, the more likely the word to appear in transcription.
With useSpeechContext
setting enabled, we can then attach the Speech Contexts object to it.
In the Scene Hierarchy
panel, click on the Voice Enabled UI Example -> Speech Contexts object
. In the Inspector panel
, we can see here we have a Speech Context Script Component
attached to the object!
Notice here we are using a different list of Speech Contexts. Boost words for all the commands, which will be used in Voice Navigation Command Detection. Feel free to add or remove phrases as needed.
Add New Phrase to Speech Context
With Speech Context Script Component
, we can add words to the phrases and set a boost value for the phrases. To add a new word, click on the Add Value field and input a new word you want to add.
Notice here the phrases should be made of lowercase a-z letters. The phrases should be within the vocabulary.
Out ofVocabulary
When an OOV(out of vocabulary) phrase is added to the Speech Context, the Voice Event - On Error Triggered
will be triggered. We will see the error message in the Logger. Here we take a random word “az zj” as an example.
Reset Preview. Speak with the Microphone button enabled in the Preview panel
. We can then see the error message in the Logger
panel.
Add New Speech Context
Or we can add a new Speech Context Script Component
with a different boost value.
The range for boost value is from 1-10, we recommend you’ll start with 5 and adjust if needed (the higher the value is, the more likely the word will appear in transcription)
Use Command
Now goes back to the Speech Recognition Script Component
. Here we can see Use Command
is enabled for VUI!
CommandHandler
With Use Command
enabled, we attach the Command Handler Object to the CommandHandler Field. In the Scene Hierarchy
panel, click on the Voice Enabled UI Example -> CommandHandler object
.
When debug boolean is enabled, we will see the command printed in the Logger.
Now let’s take a look at the Command Handler Script
. Here the functions will be called when the different command is detected.
//This function will be called when next command is detected
function nextCommand() {
...
}
//This function will be called when back command is detected
function previousCommand() {
...
}
//This function will be called when any number command is detected
//"first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth", "ninth", "tenth"
function numberCommand(number) {
...
}
//This function will be called when left command is detected
function leftCommand() {
...
}
//This function will be called when right command is detected
function rightCommand() {
...
}
//This function will be called when up command is detected
function upCommand() {
...
}
//This function will be called when down command is detected
function downCommand() {
...
}
//This function will be called when different command is detected
function differentCommand() {
...
}
Here in this example, we will pass different command results to different VUI Controller functions. So in the Script Component, we also have a field for the VUI Controller.
//@input Component.ScriptComponent vUIController {"label":"VUI Controller"}
//This function will be called when next command is detected
function nextCommand() {
...
if (script.vUIController.api.setNextSelection) {
script.vUIController.api.setNextSelection();
}
}
Error Code for Command Responses
There are few error codes which NLP models (either keyword or command detection) might return:
#SNAP_ERROR_INCONCLUSIVE
: if two or more different commands found#SNAP_ERROR_INDECISIVE
: if no command detected#SNAP_ERROR_NONVERBAL
: if we don’t think the audio input was really a human talking#SNAP_ERROR_SILENCE
: if too long silence- Anything starting with
#SNAP_ERROR_
: Errors that are not currently defined in this document and should be ignored
The Lens can only detect one voice command each time.
Reset Preview. Try to say the commands which are not from the command list, like “Snap”. With the Microphone button enabled in the Preview panel
. We can then see the command error messages in the Logger.
VUI Controller
In the Scene Hierarchy
panel, click on the Orthographic Camera -> VUI Controller object
. In the Inspector panel
, we have the VUI Controller Script Component
. With the VUI Controller, we can map the voice navigation command to UI functions. In the VUI Controller Script Component
, we use the UI Parent field to create the reference to the UI Discrete Pickers.
Let’s take a look at the setNextSelection
function as an example. Here we are mapping “Next
” Command with the set selection
functions in the UIDiscretePicker.
function setNextSelection() {
for (var i = 0; i < enabledUiParent.length; i++) {
var count = counts[i];
var layoutType = layoutTypes[i];
if (
enabledUiParent[i].api.getCurrentSelection &&
enabledUiParent[i].api.setCurrentSelection
) {
var currentIndex = enabledUiParent[i].api.getCurrentSelection();
if (currentIndex < count - 1) {
enabledUiParent[i].api.setCurrentSelection(currentIndex + 1);
} else {
if (layoutType == 3) {
enabledUiParent[i].api.setCurrentSelection(0);
} else {
print('Warning: This button is the last button');
}
}
}
}
}
UI Discrete Picker
The voice commands can be used for UI Discrete Pickers. It is under Orthographic Camera -> VUI Controller -> UI Panel - Widget Example -> UI Discrete Picker Carousel
.
We have five different layouts. Try to change the layout from the dropdown list to test the touch gesture and voice navigation commands!
- UI Discrete Picker Vertical
- UI Discrete Picker Horizontal
- UI Discrete Picker Grid
- UI Discrete Picker Circle
- UI Discrete Picker Carousel
For some of the UI Discrete Pickers, not all the voice commands are available. For example, UI Discrete Picker Vertical doesn’t have a left button. Here a warning message will be printed in the logger.
Event Callbacks for UI Discrete Picker
With UI Discrete Picker, we can enable Edit Event Callbacks
for the buttons. Here we are using the Custom Function to enable different glasses. When the button selection is changed with either touch input or voice command. We will call the Custom Function - setSelectionObject
function in the EnableChildByIndex Script Component
to sync the glass selection with the button selection.
We can find the EnableChildByIndex Script Component
under Camera -> Effects -> Head Binding -> Glasses Object
.
Voice Events
Now let’s go back to the Speech Recognition Script Component
. Let's take a look at Edit Behaviors
. Notice here if we enable Edit Behaviors
, we can then attach Behavior Scripts to different voice events and trigger different visuals. Here we have 5 different voice events.
On Listening Enabled
: Trigger when the microphone is enabled.On Listening Disabled
: Trigger when the microphone is disabled.On Listening Triggered
: Trigger when changed back to listening mode.On Error Triggered
: Trigger when there is error in the transcription.On Final Transcription Triggered
: Trigger when it's a full result, or partial transcription.On Command Detected
: Trigger when any command is detected.
with Use Command
and Edit Behaviors
enabled, a new field will be added to the Voice Event Callbacks - On Command Detected
. Same with other Voice Events, we can add a list of Behavior Scripts, which will be triggered when any command is detected.
In Lens Studio Preview the microphone button is not simulated on the Screen. When we reset the preview, once the Speech Recognition is initialized, On Listening Enabled
will be triggered automatically. To learn how to use the microphone button, try to preview the Lens in Snapchat! Please check the details in Previewing Your Lens.
Debug Message for Voice Events
Notice here if we enable the Debug
in Voice Event Callbacks. In the Logger, we can then see the Voice Event being printed out when it gets triggered.
Send Behavior Triggers with Voice Events
For each Voice Event Callback, we can assign multiple Behavior Scripts. Try to click on the Add Value
field to attach new Behavior Scripts.
Behavior and Tween
In the example, when the voice navigation command is detected, you might notice here we are also triggering two tween animations:
TWEEN_SETSCREENTRANS_DETECTED
-Orthographic Camera -> Voice Enabled UI -> NLP Command Text Object -> First Script Component
.TWEEN_SETCOLOR_DETECTED
-Orthographic Camera-> Voice Enabled UI -> NLP Command Text Object -> Second Script Component
.- And we also pause the Listening Icon animation.
Previewing Your Lens
You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.
Once the Lens is pushed to Snapchat, you will see hint: TAP TO TURN ON VOICE CONTROL
. Tap the screen to start VoiceML and the OnListeningEnabled
will be triggered. Press the button again to stop VoiceML and the OnListeningDisabled
will be triggered.