Skip to main content
Version: 5.x

Audio Classification Template

Audio Classification is available in Lens Studio Asset Library. Import the asset to your project, create new Orthographic camera and place the prefab under it.

The Audio Classification template allows you to classify audio input from the device's microphone into one or several classifications out of a total of 112 classes.

Animation showcasing the Audio Classification detecting multiple different classes

Some of the top classes returned in the model include:

  • Human Sounds
  • Music Sounds
  • Animal Sounds
  • Natural sounds
  • Sounds of things

You can find the list of available classes in the Labels.js file. As shown in the image below is what your behavior script example will utilize. Showing the Rooster and Chicken classes

Guide

When opening the Template, click on the Audio Classification Controller [EDIT ME] Scene Object in the Scene Hierarchy panel to view each component attached to it.

If you have already learned about the Audio Classification template, you can learn how to quickly set-up this template utilizing behavior scripts.

Audio Spectrogram script

The Audio Spectrogram script reads data from audio track and generates a spectrogram from the audio samples that are captured when the Lens is running.

To modify spectrogram settings, select the Enable Advanced checkbox to set to true.

You can find more information about this in the Keyword Detection Template [LINK]

Audio Classification Controller

The Audio Classification Controller script configures and runs the Machine Learning (ML) model by passing the spectrogram data as input. This script is what drives the main experience of the template and you will use it to set up different responses.

To learn more about MLComponents, please visit MLComponent [LINK]

Image of the Audio Classification Controller script

Audio Classification Controller inputs

Listed below are the inputs that are used in the Audio Classification Controller script.

Model SettingsAllows to set up ML model settings and inputs.
ModelML model asset. In this template, this asset is a remote asset. This is a predefined model created by the VoiceML team and it takes float array of size 64*64*1 as an input and will output the array of size 1*1*112.
Input AudioAudio track to read data from, can be either Microphone Audio[LINK] or Audio File.
LabelsScript component with Labels.js file that has script.api.labels object property.
ExtendedWhen enabled, the detected classes will be extended with their ancestor (parent) classes (for example: for ‘Guitar` extended result would include such classe/s as : "Plucked string instrument", "Musical instrument", "Music",)
ResponsesThis section allows to set up responses when a certain class is detected. Responses can use the Use Behavior and Prefix fields to help define the response.
Use BehaviorAllows you to send a custom Behavior trigger [LINK] when a certain class is detected.
PrefixA string that would be added to the class name For this case, the custom trigger name would consist of prefix+className. Can be left empty.
Print Result ToWhen enabled, it will print the result array of classes to the Text Component.
Class TextText Component to set text to.
Placeholder TextText to display if none of the classes is detected.
Call Api FunctionWhen enabled, this will allow you to call an api function from your custom script and accept an array of classes names as parameters.
Script With ApiScript Component with your script.
Function nameName of the function.

You can plug in different audio files for testing, but don’t forget to set Input Audio to Microphone Audio when publishing a lens.

Example script with API

//Example script with api
script.api.onClassDetected = function (classes) {
print('Result : ' + classes.join(','));
};

The template comes with an extended script created for this example. UIControllerScript allows to control the color of several screen images based on the detected class as well as change text color correspondingly.

If you open the script in Script Editor, you can see the details of implementation.

Image showing each of the details of how the template is implemented

Set up with Behavior script

Now that you have an understanding of how the Audio Classification controller functions, you can set up some quick interactive behaviors when a certain class is detected. For this template, you will set up a response to handle if you were to mimic the cluck of a rooster. Once detected, you will set up some visual responses once a corresponding sound is detected.

  1. In the Scene Hierarchy panel, delete the Orthographic camera Scene Object.

  2. Delete the UI example Scene Objects.

  3. Add a new FaceImage in the Scene Hierarchy panel.

    Image showing a new FaceImage attached to the camera

  4. Find and locate a series of images or items from the Asset Library to use.

  5. Click on +, navigate to Helper Script and select Behavior.

  6. Set the Audio Classification Controller [EDIT ME] Scene Object to call a behavior trigger once class is detected.

  7. Configure the behavior script to do something once class is detected.

    Image that shows that the classification has been set up to listen for rooster and chicken classes

  8. Disable previously created images in the Scene Hierarchy panel.

Previewing Your Lens

You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.

Was this page helpful?
Yes
No

AI-Powered Search