跳到主要内容

LiteRT / TFLite

LiteRT(前身为TensorFlow Lite),是谷歌专为设备端AI打造的高性能推理运行时。通过集成在AI Engine Direct中的LiteRT委托,您只需一行代码即可在Dragonwing设备的NPU上运行现有的量化LiteRT模型(支持Python与C++)。

量化模型

NPU仅支持uint8/int8量化模型。不支持的模型或不支持的层将自动移回 CPU。可以使用 quantization-aware trainingpost-training quantization 来量化 LiteRT 模型。请确保遵循“完全整型量化”的操作步骤。

信息

不想自己量化模型? 可以从 Qualcomm AI Hub 下载一系列的量化模型,或使用 Edge Impulse 来量化已有的或全新的模型。

在 NPU 上运行模型(Python)

要将模型卸载到 NPU,只需加载 LiteRT 委托,并将其传递到解释器中。例如:

from ai_edge_litert.interpreter import Interpreter, load_delegate

qnn_delegate = load_delegate("libQnnTFLiteDelegate.so", options={"backend_type": "htp"})
interpreter = Interpreter(
model_path=...,
experimental_delegates=[qnn_delegate]
)

在 NPU 上运行模型(C++)

要将模型卸载到 NPU,首先需要添加以下编译标志:

CFLAGS += -I${QNN_SDK_ROOT}/include
LDFLAGS += -L${QNN_SDK_ROOT}/lib/aarch64-ubuntu-gcc9.4 -lQnnTFLiteDelegate

然后,实例化 LiteRT 委托并将其传递给 LiteRT 解释器:

// == Includes ==
#include "QNN/TFLiteDelegate/QnnTFLiteDelegate.h"

// == Application code ==

// Get your interpreter...
tflite::Interpreter *interpreter = ...;

// Create QNN Delegate options structure.
TfLiteQnnDelegateOptions options = TfLiteQnnDelegateOptionsDefault();

// Set the mandatory backend_type option. All other options have default values.
options.backend_type = kHtpBackend;

// Instantiate delegate. Must not be freed until interpreter is freed.
TfLiteDelegate* delegate = TfLiteQnnDelegateCreate(&options);

TfLiteStatus status = interpreter->ModifyGraphWithDelegate(delegate);
// Check that status == kTfLiteOk

Python 示例

前提条件

  • Ubuntu 操作系统 已刷入。
  • 具有适当权限的终端访问
  • 如果您之前没有安装过 PPA 包,请按照以下步骤进行安装。
      git clone -b ubuntu_setup --single-branch https://github.com/rubikpi-ai/rubikpi-script.git 
    cd rubikpi-script
    ./install_ppa_pkgs.sh
  • 在开发板上打开终端,或建立 SSH 会话,然后执行以下操作:
    创建一个新的虚拟环境(venv),并安装 LiteRT 运行时和 Pillow:
    python3 -m venv .venv-litert-demo --system-site-packages
    source .venv-litert-demo/bin/activate
    pip3 install ai-edge-litert==1.3.0 Pillow
    pip3 install opencv-python
  • 安装必要的 python3 和 gtk 包。
    sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
    sudo apt install python3-venv python3-full
    sudo apt install -y pkg-config cmake libcairo2-dev
    sudo apt install libgirepository1.0-dev gir1.2-glib-2.0
    sudo apt install build-essential python3-dev python3-pip pkg-config meson

视觉 Transformers

以下说明如何使用 LiteRT 委托在 CPU 和 NPU 上运行 Vision Transformer 模型(从 AI Hub 下载)。

1️⃣ 创建 inference_vit.py 并添加以下参考代码:

import numpy as np
from ai_edge_litert.interpreter import Interpreter, load_delegate
from PIL import Image
import os, time, sys
import urllib.request

def curr_ms():
return round(time.time() * 1000)

use_npu = True if len(sys.argv) >= 2 and sys.argv[1] == '--use-npu' else False

# Path to your quantized TFLite model and test image (will be download automatically)
MODEL_PATH = "vit-vit-w8a8.tflite"
IMAGE_PATH = "boa-constrictor.jpg"
LABELS_PATH = "vit-vit-labels.txt"

if not os.path.exists(MODEL_PATH):
print("Downloading model...")
model_url = 'https://cdn.edgeimpulse.com/qc-ai-docs/models/vit-vit-w8a8.tflite'
urllib.request.urlretrieve(model_url, MODEL_PATH)

if not os.path.exists(LABELS_PATH):
print("Downloading labels...")
labels_url = 'https://cdn.edgeimpulse.com/qc-ai-docs/models/vit-vit-labels.txt'
urllib.request.urlretrieve(labels_url, LABELS_PATH)

if not os.path.exists(IMAGE_PATH):
print("Downloading image...")
image_url = 'https://cdn.edgeimpulse.com/qc-ai-docs/examples/boa-constrictor.jpg'
urllib.request.urlretrieve(image_url, IMAGE_PATH)

with open(LABELS_PATH, 'r') as f:
labels = [line for line in f.read().splitlines() if line.strip()]

experimental_delegates = []
if use_npu:
experimental_delegates = [load_delegate("libQnnTFLiteDelegate.so", options={"backend_type": "htp"})]

# Load TFLite model and allocate tensors
interpreter = Interpreter(
model_path=MODEL_PATH,
experimental_delegates=experimental_delegates
)
interpreter.allocate_tensors()

# Get input and output tensor details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Load and preprocess image
def load_image(path, input_shape):
# Expected input shape: [1, height, width, channels]
_, height, width, channels = input_shape

img = Image.open(path).convert("RGB").resize((width, height))
img_np = np.array(img, dtype=np.uint8) # quantized models expect uint8
img_np = np.expand_dims(img_np, axis=0)
return img_np

input_shape = input_details[0]['shape']
input_data = load_image(IMAGE_PATH, input_shape)

# Set tensor and run inference
interpreter.set_tensor(input_details[0]['index'], input_data)

# Run once to warmup
interpreter.invoke()

# Then run 10x
start = curr_ms()
for i in range(0, 10):
interpreter.invoke()
end = curr_ms()

# Get prediction
q_output = interpreter.get_tensor(output_details[0]['index'])
scale, zero_point = output_details[0]['quantization']
f_output = (q_output.astype(np.float32) - zero_point) * scale

# Image classification models in AI Hub miss a Softmax() layer at the end of the model, so add it manually
def softmax(x, axis=-1):
# subtract max for numerical stability
x_max = np.max(x, axis=axis, keepdims=True)
e_x = np.exp(x - x_max)
return e_x / np.sum(e_x, axis=axis, keepdims=True)

# show top-5 predictions
scores = softmax(f_output[0])
top_k = scores.argsort()[-5:][::-1]
print("\nTop-5 predictions:")
for i in top_k:
print(f"Class {labels[i]}: score={scores[i]}")

print('')
print(f'Inference took (on average): {(end - start) / 10}ms. per image')

2️⃣ 在 CPU 上运行模型:

python3 inference_vit.py

# INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
#
# Top-5 predictions:
# Class boa constrictor: score=0.6264431476593018
# Class rock python: score=0.047579940408468246
# Class night snake: score=0.006721484009176493
# Class mouse: score=0.0022421202156692743
# Class pick: score=0.001942973816767335
#
# Inference took (on average): 391.1ms. per image

3️⃣ 在 NPU 上运行模型:

python3 inference_vit.py --use-npu

# INFO: TfLiteQnnDelegate delegate: 1382 nodes delegated out of 1633 nodes with 27 partitions.
#
# INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
#
# Top-5 predictions:
# Class boa constrictor: score=0.6113042235374451
# Class rock python: score=0.038359832018613815
# Class night snake: score=0.011630792170763016
# Class mouse: score=0.002294909441843629
# Class lens cap: score=0.0018960189772769809
#
# Inference took (on average): 132.7ms. per image

正如所见,该模型在 NPU 上运行速度明显更快,但模型的输出略有变化。此外,此模型并非所有层都可以在 NPU 上运行(“1633 个节点中委托了 1382 个节点,共 27 个分区”)。