Running Quantized ModelPack on Target
If you have a quantized ModelPack in TFLite format, you can follow the instructions below for running the model on target such as an i.MX 8M Plus EVK using a simple python script.
You can download this Python script script, this sample image IMG_9004.png, and this sample TFLite model for running the example on the target using the command below.
Specify model and image paths
If you have a specific model and a specific image, modify the paths to these files in the script.
model_path = "coffeecup-modelpack-multitask-t-1f54.tflite"
image_path = "IMG_9004.png"
# python3 run-tflite.py
INFO: Vx delegate: allowed_cache_mode set to 0.
INFO: Vx delegate: device num set to 0.
INFO: Vx delegate: allowed_builtin_code set to 0.
INFO: Vx delegate: error_during_init set to 0.
INFO: Vx delegate: error_during_prepare set to 0.
INFO: Vx delegate: error_during_invoke set to 0.
ERROR: Int64 output is not supported
ERROR: Int64 input is not supported
Error in cpuinfo: prctl(PR_SVE_GET_VL) failed
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
warning at CreateOutputsTensor, #90
W [HandleLayoutInfer:332]Op 162: default layout inference pass.
Time: 37 ms
Found objects:
1 label 0.6562237 [0.46047086 0.19032796 0.6446592 0.4359124 ]
A new image should be saved img_vis.jpg showing the model output visualizations.
Walkthrough of the Run Model script
The run-tflite.py script executes the following steps to run the TFLite model on the i.MX 8M Plus OpenVX NPU.
-
Load the model specifying the external delegate to use the device's NPU.
model_path = "coffeecup-modelpack-multitask-t-1f54.tflite" delegate = "/usr/lib/libvx_delegate.so" ext_delegate = load_delegate(delegate, {}) ip = Interpreter(model_path=model_path, experimental_delegates=[ext_delegate])OpenVX delegate
The OpenVX delegate is specified with
experimental_delegates=[ext_delegate]. To use the CPU, remove this specification. -
Allocate tensors to allocate memory and sets up input/output tensor bindings.
ip.allocate_tensors() -
Call invoke() once at the start as a model warmup since the first call may take up to 9 seconds to run.
ip.invoke() -
Preprocess input image by resizing to the input shape of the model and type-casting the values to the input data type requirements of the model.
image_path = "IMG_9004.png" input_det = ip.get_input_details()[0] _, height, width, _ = input_det.get("shape") image = Image.open(image_path) size = (image.height, image.width) img = np.array(image.resize((width, height))) # is TFLite quantized int8 model int8 = input_details["dtype"] == np.int8 scale, zp = input_details["quantization"] if int8: zp = abs(zp) img = (img.astype(np.int16) - zp).astype(np.int8) img = np.array([img]) -
Query inputs from the model's input details and set the input tensor.
inp_id = ip.get_input_details()[0]["index"] ip.set_tensor(inp_id, img) -
Run model inference by calling the invoke() function.
ip.invoke() -
Query and dequantize the model outputs.
box_id, score_id, mask_id = None, None, None outputs = [] for i, out in enumerate(out_det): x = ip.get_tensor(out["index"]) # Output Dequantization scale, zero_point = out["quantization"] if x.dtype != np.float32 and scale > 0: x = (x.astype(np.float32) - zero_point) * scale # re-scale outputs.append(x) shape = out.get("shape") if len(shape) == 4 and shape[-1] == 4: box_id = i elif len(shape) == 3: if shape[-1] == nc: score_id = i else: mask_id = i -
Postprocess outputs and apply NMS.
boxes = outputs[box_id][0] # shape (n, 4) scores = outputs[score_id][0] # shape (n, num_classes) masks = outputs[mask_id][0] # shape (h, w) boxes = np.reshape(boxes, (-1, 4)) scores = np.reshape(scores, (boxes.shape[0], -1)) classes = np.argmax(scores, axis=1).astype(np.int32) # Prefilter boxes and scores by minimum score max_scores = np.max(scores, axis=1) filt = max_scores >= score_threshold # Prefilter the boxes, scores and classes IDs. scores = max_scores[filt] boxes = boxes[filt] classes = classes[filt] keep = numpy_nms(boxes, scores, iou_threshold=iou_threshold) boxes = boxes[keep] classes = classes[keep] scores = scores[keep] masks = resize_mask(masks, size)
Next Steps
In this section you have seen how you can utilize a simple python script to run model inference on a single input image. For example on deploying the model on target on a live camera feed, proceed to Deploying Quantized Models in the EVK or Maivin.