MobileNet SSD Examples

These examples demonstrate how to deploy a MobileNet SSD V1 model using the NPU of an embedded platform such as a Maivin. The examples below are split into two parts; model inference on a single image and a model inference publisher. These examples have been tested on the TFLite files found in this SSD-TFLite repository, and in ML-Zoo.

Note

In a future release of the EdgeFirst middleware, the Model Service will be able to run MobileNet SSD V1 models natively, again, testing on the above examples.

Image Inference

This example will run inference from a MobileNet SSD V1 model on a sample picture. It has been modified from the Python script provided in the "ssd-tflite" repository to deploy the model with the OpenVX delegate to run on the NPU. Furthermore, minimum dependencies are used to ensure no additional venv/pip installations are needed. The only dependencies required are tflite_runtime, numpy, and pillow which should already come pre-installed in the Maivin's BSP. Lastly, the model outputs are then drawn onto the image for visualization.

For a quick demonstration, go to the "ssd-tflite" repository and download the following files.

Download our Python Script for running the example.

Once the files have been downloaded, SCP the files into the embedded platform.

Run the script with the command python3 run-tflite.py. The script should print the inference time in milliseconds and the model detections as follows.

$ python3 run-tflite.py
Vx delegate: allowed_cache_mode set to 0.
Vx delegate: device num set to 0.
Vx delegate: allowed_builtin_code set to 0.
Vx delegate: error_during_init set to 0.
Vx delegate: error_during_prepare set to 0.
Vx delegate: error_during_invoke set to 0.
WARNING: Fallback unsupported op 32 to TfLite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
W [HandleLayoutInfer:281]Op 162: default layout inference pass.
Time: 8 ms
Found objects:
   2 bicycle 0.74609375 [0.22455344 0.15004507 0.7796775  0.79142797]
   3 car 0.72265625 [0.13997436 0.603076   0.29882038 0.910733  ]
   18 dog 0.6328125 [0.34737465 0.17417745 0.9401951  0.4078675 ]
   3 car 0.60546875 [0.11568993 0.57076305 0.27453592 0.84135157]

Warning

When running the script, you may see some warning messages.

WARNING: Fallback unsupported op 32 to TfLite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

This is due to the NMS embedded in the model falls back to using the CPU. However, most of the backbone processes are being performed in the NPU.

This can be quickly tested by removing the delegate specification when loading the model. This will use the CPU which is ~30x slower.

INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Time: 225 ms
Found objects:
    2 bicycle 0.74609375 [0.23085383 0.1440031  0.77203083 0.7495213 ]
    3 car 0.72265625 [0.14111352 0.603076   0.29995954 0.910733  ]
    18 dog 0.6328125 [0.34737465 0.17417745 0.9401951  0.4078675 ]
    3 car 0.60546875 [0.11568993 0.57076305 0.27453592 0.84135157]

Box Format

The bounding boxes should be in the normalized format [ymin, xmin, ymax, xmax].

Furthermore, a new image should be saved img_vis.jpg showing the model output visualizations.

The following breakdown of the script describing the steps of the model inference is provided below.

Load the model specifying the external delegate to use the device's NPU.

model_path = "ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_18.tflite"
delegate = "/usr/lib/libvx_delegate.so"

ext_delegate = load_delegate(delegate, {})
ip = Interpreter(model_path=model_path, experimental_delegates=[ext_delegate])

OpenVX Delegate

The OpenVX delegate is specified with experimental_delegates=[ext_delegate]. To use the CPU, remove this specification.

Allocate tensors to allocate memory and sets up input/output tensor bindings.
```
ip.allocate_tensors()
```
Call invoke() once at the start as a model warmup since the first call may take up to 9 seconds to run.
```
ip.invoke()
```

Preprocess input image by resizing to the input shape of the model and type-casting the values to the input data type requirements of the model.

image_path = "dog.jpg"

input_det = ip.get_input_details()[0]
_, height, width, _ = input_det.get("shape")
img = np.array(Image.open(image_path).resize((width, height)))

# is TFLite quantized int8 model
int8 = input_det["dtype"] == np.int8
# is TFLite quantized uint8 model
uint8 = input_det["dtype"] == np.uint8
if int8 or uint8:
    img = img.astype(np.uint8) if uint8 else img.astype(np.int8)
else:
    img = img.astype(np.float32)
img = np.array([img])

Query inputs from the model's input details and set the input tensor.

inp_id = ip.get_input_details()[0]["index"]
ip.set_tensor(inp_id, img)

Run model inference by calling the invoke() function.
```
ip.invoke()
```

Query model outputs from the model's output details.

out_det = ip.get_output_details()
out_id0 = out_det[0]["index"]
out_id1 = out_det[1]["index"]
out_id2 = out_det[2]["index"]
out_id3 = out_det[3]["index"]

boxes = ip.get_tensor(out_id0).squeeze()
classes = ip.get_tensor(out_id1).squeeze()
scores = ip.get_tensor(out_id2).squeeze()
num_det = ip.get_tensor(out_id3).squeeze()

These are the postprocessed model outputs which can then be visualized.

Output Decoding

Output decoding which includes NMS postprocessing is already embedded inside the model. The model outputs a maximum of 10 detections. Currently there is no option to set the NMS parameters such as IoU and score thresholds using this model.

Publisher Server

Additionally, this can be all integrated to simulate the model service using any pre-defined SSD model, for this example we will use the pretrained TFLite SSD model. This can be done with the provided Publisher Script

This script is required to run on the target as it will use the DMA Buffer topic to provide the images for the model. Additionally, the script will need to be run using sudo as it needs to access the file descriptor to get the DMA buffer and cannot without sudo.

Disable the current model service with the following command:
```
sudo systemctl stop model
```
You can then run the script using the following invocation:
```
sudo -E python3 boxes2d_publisher.py --model model.tflite --threshold 0.5 --shape 300,300
```
- the --model argument will be the path to the SSD model to be used to perform inference on the model and return boxes.
- the --threshold argument will set the score threshold for a box to be published by the server.
- the --shape argument is the input height,width of the model, comma delimited.
The /rt/model/boxes2d topic will now be published once again and can be subscribed to by any other example.

Once you disable the server, you should restart the model service with the following command.

sudo systemctl restart model