Version: 4.55.1

Keyword Detection

This template allows to detect certain keywords in the audio input and trigger effects when they are detected. The template generates melspectrogram image and predicts the probability of keywords with Machine Learning model.

Creating a Model

While the template comes with two ML models, each with different keywords available

you can detect any keywords by importing your own machine learning model. We’ll go through an example of what this might look like below.

To learn more about Machine Learning and Lens Studio, take a look at the ML Overview page.

Prerequisites

To create a model, you’ll need a:

Machine learning training notebook: Download our example notebook here
Data set: in example we are using torch.utils.data.Dataset version of a SPEECHCOMMANDS dataset

This dataset comes with 35 different keywords which you can pick a subset from.

Training Your Model

There are many different ways you can train your model. For our example, we will use Google Colaboratory. To see other ways of training, take a look at the ML Frameworks page of the guide section.

Head over to Google Colaboratory, select the Upload tab, and drag the python notebook into the upload area.

The notebook is well documented with information about what each section of the code is doing. Take a look at the notebook to learn more about the training process itself!

Once your notebook has been opened, you can choose which labels you want the model to segment out. Then, you can run the code by choosing Runtime > Run All in the menu bar. This model takes about 2 hours to train.

Downloading your Model

You can scroll to the Train Loop section of the notebook to see how your machine learning model is coming along.

Once you are happy with the result, you can download your .onnx model!

When using a data set to train your model, make sure that you adhere to the usage license of that dataset.

Keyword Detection Controller

The keyword detection is driven by the Keyword Detection Controller object. This object contains the AudioSpectogram script which provides a function to calculate the MelSpectogram of an audio and generates spectrogram texture.(You can find more information about this terms in the training notebook) The KeywordDetectionController script then takes the generated texture, and passes it into an ML model.

Choosing your Keywords

There are two different ML models provided in the template, depending on which keywords you are looking to use. In the Resources panel, you will find a folder called ML [TRY_SWAPPING] with a Models and Labels folder. Each model has a corresponding label (keywords that you can use).

To find the list of keywords you can detect for each model, double click on one of the scripts in the Labels folder. Once you’ve decided which keywords you want to use, we can set up the KeywordDetectionController to use them.

First we will set the model we want to use by changing the Model field.

Then, we’ll modify the Labels [SWAP LABELS] object and choose the labels corresponding to that model.

Choosing audio to analyze

With the detection set up, we can next choose the Audio Track we want to analyze. By default the template comes with an audio of a person saying “one two three”. You can select the Audio Track field in the KeywordDetectionController script to change what the Lens is listening to.

Select the Audio from Microphone [link] to start processing user voice.

Then, make sure to enable Microphone input in the bottom left corner of the Preview panel so that your voice will be passed into the Lens

Responding to Detected Keywords

Responding with Behavior

At the bottom of the KeywordDetectionController, you’ll find a checkbox to Use Behavior. This checkbox will tell the KeywordDetectionController to call a Custom Behavior Triggers when a keyword is detected.

With this trigger in mind, let’s see an example of how we can respond to a detected keyword.

Take a look at the example animation One, Two, and Three under the Example Numbers object to see how the animation is called when the user says the related keyword.

For example, in the One object, you can see that there is an animation that is played in response to the trigger KEYWORD_ONE.

There’s a lot more that Behavior can do, such as play sounds, texture animation, enable objects and more! Take a look at the Behavior guide for more information.

If you want to respond with your own script, you can listen to the behavior keyword:

global.behaviorSystem.addCustomTriggerResponse(triggerName, callback);

Alternatively, if you want to use Visual Script Editor, you can use the provided custom node: AddCustomTriggerResponse under Scripts > Behavior > Behavior Nodes. Simply drag them into your script graph and connect the response you want to do.

Related Guides

Was this page helpful?

Yes

Creating a Model​

Prerequisites​

Training Your Model​

Downloading your Model​

Keyword Detection Controller​

Choosing your Keywords​

Choosing audio to analyze​

Responding to Detected Keywords​

Responding with Behavior​

Related Guides​