Object Detection
The Object Detection Template allows you to instantiate and place UI elements on the screen based on the bounding boxes of the objects of a certain class based on a Machine Learning model output.
Tutorial
Guide
If you already have an object detection model, you can skip down to the Importing Your Model section below. You can skip to the Customizing Your Lens Experience section if you’d like to use the example car or food detection.
Creating a Model
While the template comes with a car detection and food detection example model for the ML Component, you can make any kind of object detection by importing your own machine learning model. We’ll go through an example of what this might look like below.
To learn more about Machine Learning and Lens Studio, take a look at the ML Overview page.
Prerequisites
To create a model, you will need;
- Machine learning training code: code that describes how the model is trained (this is sometimes referred to as a notebook). Please find our example notebook here.
- Data set: collection of data that our code will use to learn from (in this case we will use the COCO data set).
: This dataset comes with a couple examples of classes that you can swap. Also the provided training notebook uses a generalized classes that consist of couple more specific classes in order to perform better on the particular dataset
Training Your Model
There are many different ways you can train your model. For our example, we will use Google Colaboratory. To see other ways of training, take a look at the ML Frameworks page for more information.
Head over to Google Colaboratory, select the Upload
tab, and drag the python notebook into the upload area.
The provided example is using the COCO dataset for training the model. Running the notebook will install all the necessary libraries and mount google drive.
You can configure your training by editing some parameters like iteration count. It also provides all available COCO dataset classes.
With our files added, you can run the code by choosing Runtime > Run All
in the menu bar. This process may take a while to run, as creating a model is computationally intensive.
When using a data set to train your model, make sure that you adhere to the usage license of that dataset.
Downloading your Model
You can scroll to the Train Loop section of the notebook to see how your machine learning model is coming along. Once you are happy with the result, you can download your .onnx
model!
Importing your Model
Now that we have our model, we’ll import it into Lens Studio.
You can drag and drop your .onnx
file into the Resources
panel to bring it into Lens Studio.
Setting up MLComponent
If you are using the built in ML models, you can skip this section.
Next, we’ll tell the template to use this model. In the Objects
panel, select ML Component
. Then, in the Inspector
panel, click on the field next to Model
, and then in the pop up window, choose your newly imported model.
Next, we’ll set up the input for the ML Component to pass in the image most similar to how our model was trained.
The model that comes with the template uses the following input settings:
-
Input shape is 128 x 256 with 3 channels (for RGB) - using default settings for model here.
-
DeviceCameraTexture is used as the input texture, as in we pass the camera feed to the model.
-
Input Transformer Settings
-
Stretch is turned off, because the detector works better if objects on the input texture preservers their original proportions.
-
Horizontal and Vertical alignments are set to Center
-
Rotation is set to none
-
Fill color is set to black
-
These transform settings will take the original input texture and add paddings where needed depending on the device settings to fit the aspect of the input placeholder (in this case we have size 128 x 256 , aspect = 0.5).
As for the outputs - we will keep all the default settings since the MLController script will process the raw output data in the MLControllerScript
.
Trying Example Models
Although the default ML model is set for car detection, the template also comes with a food detection model. You can swap it with the food detection model found under Example Assets/Food Detection[TRY_SWAPPING]
in the Resources
panel, by inputting the ML model into the Model
field on MLController[EDIT_ME]
object.
MLController
If you are using the built in ML models, you can skip this section.
The MLController[EDIT_ME]
object contains the ML Component and the MLController script which controls the MLComponent and processes its output.
MLController Script
By default all you will need to do is link your ML Controller to the MLController script, but you can find more model specific parameters by ticking Advanced
.
The Output Cls
and Output Loc
need to have the same name as the ones in your ML model. This should be left as it is if you are using the provided notebook for training.
Output Loc - name of the MLComponent output that provides unprocessed detection locations.
Output Cls - name of the MLComponent output that returns scores(probability) of the detections.
The Output Loc and Output Cls should have the same name as the ones in your ML model, you can find out the output names on the ML Component.
Confidence Threshold - defines the minimum score of unprocessed detections that will be taken into account. If the detection threshold is lower--they will be skipped.
TopK - is the amount of detections with highest score to keep
Loader - is the UI element that will show up when the ML model is being loaded.
Machine Learning model is used to create proposals of bounding boxes of a certain class based on the input image. This script applies a Non-maximum suppression algorithm to filter and post process those detections based on some criteria. If the Intersection over union (IOU) of detected boxes is higher than the Confidence Threshold value, they will be considered the same box.
Customizing Your Lens Experience
Object Detection Controller
The Object Detection Controller contains the ObjectDetectionController
script which takes the processed detection boxes from the MLController
script, instantiates the corresponding amount of detection boxes and controls their Screen Transform
components.
The Counter object - is the Text Component used to output the amount of objects detected at the current moment of time.
The Object To Copy - is the object to duplicate. It has to have a ScreenTransform Component. By default it is set to the Detection Box[EDIT_CHILDREN]
sceneObject.
Smoothing determines the smoothing applied to the detection box anchor positions. The higher the number, the smoother and slower detection screen transforms move. If smoothing equals 0.0
- this means no smoothing.
Hint Controller is the HintController script that controls the hint displayed when an object is not detected.
You can optionally fine tune additional settings that help smooth detection boxes positions on the screen by ticking the Advanced checkbox:
Matching Threshold sets the breakpoint ratio of the intersection of two processed detection boxes over their union, that determines whether two detection boxes should be considered as different or the same.
Lost Frame Threshold determines the amount of frames the instantiated visual element will be kept when current detection is defined as lost. A larger number of Lost Frame Threshold means it takes longer for a box to be removed after an object has been lost. Having a smaller number would provide more instantaneous updates and a larger number would create a more smoothed result.
Detection Box
Detection Box [EDIT_CHILDREN]
is the object that is duplicated by the ObjectDetectionController script for each detected object.
The Object Detection model provides us the information about the detection boxes positions in screen space. The Detection Box object is using ScreenTransform Component.
Using Screen Transform allows you to create a complex layout of 2d elements (Screen Images and Screen Text) within its Screen Transform. Anchors of the Detection Box screenTransform will be driven by a script and all the children Screen Transforms will adapt accordingly to their setup.
To see how your detection box would respond to different size objects, click on the Detection Box [EDIT_CHILDREN]
object and manipulate its anchors to see how it affects children:
Detection Box object provided in the template has several children objects :
Small Hint (swap the texture on the Image Component of this object). This image uses the Pin To Edge
and Fix Size
option so that its size and position stays constant on the screen.
Frame . The frame represents the visual surrounding the detection boxes. It is built from 8 parts - one for each edge, and one for each corner. They are using different combinations of Pin to Edge
settings.
Refer to the Screen Transform guide to set up your custom children layout.
You can also swap textures used for the frame in the resources. In the Resources
panel, - Right click a texture and select - relink to new source
. You can also modify the size of each frame by changing the Padding setting of its ScreenTransform Component.
Hint
To customize the hint shown when an object is not detected select Hint [EDIT_CHILDREN]
object in the Objects
panel.
HintController
script has an api that allows to call functions that show and hide the HintSceneObject
from other scripts. To avoid hint from constantly popping up in case if detections are a bit noisy and can disappear for a couple frames, you can set a MinLostFrames
parameter. If there are no detections found for this amount of frames - hint will show up.
Enable HideOnCapture
checkbox if you don’t want the hint to appear on the final snap.
Expand the Hint [EDIT_CHILDREN]
objects hierarchy to modify what the hint displays.
Select the Big Hint [EDIT_ME]
object to change the image displayed. Swap the Texture
parameter of the Image component of the Big Hint [EDIT_ME]
object in the Inspector
panel.
Similarly, change the text of the TextComponent of the Hint Text [EDIT_ME]
object on the Inspector
panel.
Previewing Your Lens
You’re now ready to preview your Lens! To preview your Lens in Snapchat, follow the Pairing to Snapchat guide.
Related Guides
Please refer to the guides below for additional information: