Skip to main content
Version: 5.x
Supported on
Snapchat
Spectacles
This feature may have limited compatibility and may not perform optimally.

Audio Classification

The Audio Classification asset enables you to classify audio input from the device's microphone into one or several classifications among a total of 112 classes.

Animation showcasing the Audio Classification detecting multiple different classes

Some of the top classes returned in the model include:

  • Human Sounds
  • Music Sounds
  • Animal Sounds
  • Natural Sounds
  • Sounds of Things

You can find the list of available classes in the Labels.js file. As shown in the image below, your behavior script example will utilize these classes. Showing the Rooster and Chicken classes

Guide

Importing the Audio Classification Asset

Audio Classification is available in Lens Studio via the Asset Library. To get started:

  1. Import the Audio Classification asset from the Asset Library into your project.
  2. Once imported, locate the imported asset package Audio Classification in the Asset Browser.
  3. Find the main prefab Audio Classification__PUT_IN_ORTHO_CAM and drag the Prefab into your Scene Hierarchy under an Orthographic camera. If you don't have an Orthographic camera, you can add one by clicking the + button in the Scene Hierarchy and typing in Orthographic Camera, then selecting it.
Audio Classification Imported Assets Hierarchy

Audio Spectrogram Script

The AudioSpectrogram script reads data from the audio track and generates a spectrogram from the captured audio samples when the Lens is running.

To modify spectrogram settings, enable the Advanced checkbox.

Audio Classification Controller

The AudioClassificationController script configures and runs the Machine Learning (ML) model by passing the spectrogram data as input. This script powers the main experience of the asset and allows you to set up different responses based on the detected audio classes.

Image of the Audio Classification Controller script

Audio Classification Controller Inputs

Listed below are the inputs used in the AudioClassificationController script:

NameDescription
Model SettingsConfigure ML model settings and inputs.
ModelML model asset. In this asset, the model is remote and preconfigured by the VoiceML team. It takes a float array of size 64*64*1 as input and outputs an array of size 1*1*112.
Input AudioAudio track to read data from, either Microphone Audio or an Audio File.
LabelsScript component containing the Labels.js file, which provides the script.api.labels object property.
ExtendedWhen enabled, detected classes are extended with their ancestor (parent) classes (for example, for ‘Guitar’, the extended result might include "Plucked string instrument", "Musical instrument", "Music").
ResponsesSet up responses triggered when a certain class is detected. The Use Behavior and Prefix fields help define a custom response.
Use BehaviorTriggers a custom behavior when a specific class is detected.
PrefixA string added to the detected class name (forming prefix + className). This can be left empty.
Print Result ToWhen enabled, prints the result array of classes to the assigned Text Component.
Class TextThe Text Component where the detection result is displayed.
Placeholder TextThe text to display if no classes are detected.
Call Api FunctionWhen enabled, allows you to call an API function from your custom script, with the detected class names array as parameters.
Script With ApiThe Script Component that contains your custom script.
Function NameThe name of the API function to be called.

You can plug in different audio files for testing, but remember to set Input Audio to Audio From Microphone before publishing your Lens.

Example Script with API

// Example script with API
script.onClassDetected = function (classes) {
print('Result : ' + classes.join(','));
};

The asset includes an extended script named UIController that controls the color of several screen images based on the detected class and adjusts the text color accordingly.

Image showing the implementation details of the asset

Use Behavior Script to Trigger Custom Responses

The Audio Classification Controller automatically sends custom triggers based on detected audio classes. You can set up a Behavior script to react to these triggers—performing actions such as showing or hiding a Face Image—without writing any additional code.

Step 1: Prepare and Clean Up the Scene

  1. Remove the UI Example objects from the Scene Hierarchy since you will replace them with interactive elements.
  2. In the Scene Hierarchy panel, click the + button, navigate to the Face category, and select Face Image.
  3. Optional: Browse the Asset Library to choose the images or assets you want to use and import them into your project if needed, then drag them into the Material of Face Image to replace the default image.

Step 2: Configure the Audio Classification Controller

  1. Select the Audio Classification Controller object in your Scene Hierarchy.
  2. In the Inspector under the Responses section:
    • Check the Use Behavior option. This tells the controller to send custom trigger messages.
    • In the Prefix field, enter a string that will prefix your audio class names. For example, entering AUDIO_DETECTION_ means that when the classifier detects a class (such as "Chicken"), it will send a custom trigger named AUDIO_DETECTION_Chicken, rooster.

The list of available audio classes can be found in the provided ClassLabels.js file. The controller automatically appends the corresponding class label to the prefix.

Step 3: Set Up the Behavior Script

  1. In the Scene Hierarchy, click the + button, navigate to Scripts, and select Behavior. A new Scene Object with the Behavior script attached will be created.
  2. In the Behavior script’s Inspector:
    • Set the Triggering Event Type to On Custom Trigger.
    • In the fields that appear, enter the exact name of the custom trigger you want to respond to. For example, if you expect the Audio Classification Controller to trigger AUDIO_DETECTION_Chicken, rooster when it detects the "Chicken, rooster" sound, enter AUDIO_DETECTION_Chicken, rooster here. (If you expect multiple custom triggers, you can enable list mode and add each expected trigger.)
    • Next, set up the desired Action (for example, toggling the visibility of your Face Image or swapping to a different image) that should occur when the trigger is received.

Step 4: Finalize and Test Your Setup

  1. Review your Scene Hierarchy and disable the Face Image object to ensure it starts hidden.
  2. Run your scene. When the audio classification controller detects an audio class that matches one of the labels (with the prefix included), it automatically calls:
    • For example, if a “Chicken” sound is detected and the prefix is AUDIO_DETECTION_, the controller sends AUDIO_DETECTION_Chicken, rooster.
  3. The Behavior script is configured to listen for that custom trigger and will then perform the pre-configured action.

Audio Classification Setup

Previewing Your Lens

You’re now ready to preview your Lens! To test your Lens on Snapchat, follow the Pairing to Snapchat guide.

Was this page helpful?
Yes
No