Running Quantized ModelPack on Target

If you have a quantized ModelPack in TFLite format, you can follow the instructions below for running the model on target such as an i.MX 8M Plus EVK using a simple python script.

You can download this Python script script, this sample image IMG_9004.png, and this sample TFLite model for running the example on the target using the command below.

Specify model and image paths

If you have a specific model and a specific image, modify the paths to these files in the script.

model_path = "coffeecup-modelpack-multitask-t-1f54.tflite"
image_path = "IMG_9004.png"

# python3 run-tflite.py
INFO: Vx delegate: allowed_cache_mode set to 0.
INFO: Vx delegate: device num set to 0.
INFO: Vx delegate: allowed_builtin_code set to 0.
INFO: Vx delegate: error_during_init set to 0.
INFO: Vx delegate: error_during_prepare set to 0.
INFO: Vx delegate: error_during_invoke set to 0.
ERROR: Int64 output is not supported
ERROR: Int64 input is not supported
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
warning at CreateOutputsTensor, #90
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
Time: 37 ms
Found objects:
   1 label 0.6562237 [0.46047086 0.19032796 0.6446592  0.4359124 ]

A new image should be saved img_vis.jpg showing the model output visualizations.

Walkthrough of the Run Model script

The run-tflite.py script executes the following steps to run the TFLite model on the i.MX 8M Plus OpenVX NPU.

Load the model specifying the external delegate to use the device's NPU.

model_path = "coffeecup-modelpack-multitask-t-1f54.tflite"
delegate = "/usr/lib/libvx_delegate.so"

ext_delegate = load_delegate(delegate, {})
ip = Interpreter(model_path=model_path, experimental_delegates=[ext_delegate])

OpenVX delegate

The OpenVX delegate is specified with experimental_delegates=[ext_delegate]. To use the CPU, remove this specification.

Allocate tensors to allocate memory and sets up input/output tensor bindings.
```
ip.allocate_tensors()
```
Call invoke() once at the start as a model warmup since the first call may take up to 9 seconds to run.
```
ip.invoke()
```

Preprocess input image by resizing to the input shape of the model and type-casting the values to the input data type requirements of the model.

image_path = "IMG_9004.png"

input_det = ip.get_input_details()[0]
_, height, width, _ = input_det.get("shape")
image = Image.open(image_path)
size = (image.height, image.width)
img = np.array(image.resize((width, height)))

# is TFLite quantized int8 model
int8 = input_details["dtype"] == np.int8
scale, zp = input_details["quantization"]
if int8:
    zp = abs(zp)
    img = (img.astype(np.int16) - zp).astype(np.int8)
img = np.array([img])

Query inputs from the model's input details and set the input tensor.

inp_id = ip.get_input_details()[0]["index"]
ip.set_tensor(inp_id, img)

Run model inference by calling the invoke() function.
```
ip.invoke()
```

Query and dequantize the model outputs.

box_id, score_id, mask_id = None, None, None
outputs = []
for i, out in enumerate(out_det):
    x = ip.get_tensor(out["index"])

    # Output Dequantization
    scale, zero_point = out["quantization"]
    if x.dtype != np.float32 and scale > 0:
        x = (x.astype(np.float32) - zero_point) * scale  # re-scale
    outputs.append(x)

    shape = out.get("shape")
    if len(shape) == 4 and shape[-1] == 4:
        box_id = i
    elif len(shape) == 3:
        if shape[-1] == nc:
            score_id = i
        else:
            mask_id = i

Postprocess outputs and apply NMS.

boxes = outputs[box_id][0]  # shape (n, 4)
scores = outputs[score_id][0]  # shape (n, num_classes)
masks = outputs[mask_id][0]  # shape (h, w)

boxes = np.reshape(boxes, (-1, 4))
scores = np.reshape(scores, (boxes.shape[0], -1))
classes = np.argmax(scores, axis=1).astype(np.int32)

# Prefilter boxes and scores by minimum score
max_scores = np.max(scores, axis=1)
filt = max_scores >= score_threshold

# Prefilter the boxes, scores and classes IDs.
scores = max_scores[filt]
boxes = boxes[filt]
classes = classes[filt]

keep = numpy_nms(boxes, scores, iou_threshold=iou_threshold)
boxes = boxes[keep]
classes = classes[keep]
scores = scores[keep]
masks = resize_mask(masks, size)

Next Steps

In this section you have seen how you can utilize a simple python script to run model inference on a single input image. For example on deploying the model on target on a live camera feed, proceed to Deploying Quantized Models in the EVK or Maivin.