LiteRT / TFLite
LiteRT, formerly known as TensorFlow Lite, is Google's high-performance runtime for on-device AI. You can run existing quantized LiteRT models (in Python or C++) on the NPU on Dragonwing devices with a single line of code using the LiteRT delegates that are part of AI Engine Direct.
Quantizing models
The NPU only supports uint8/int8 quantized models. Unsupported models, or unsupported layers will be automatically moved back to the CPU. You can use quantization-aware training or post-training quantization to quantize your LiteRT models. Make sure you follow the steps for "Full integer quantization".
Don't want to quantize yourself? You can download a range of pre-quantized models from Qualcomm AI Hub, or use Edge Impulse to quantize new or existing models.
Running a model on the NPU (Python)
To offload a model to the NPU, you just need to load the LiteRT delegate; and pass it into the interpreter. E.g.:
from ai_edge_litert.interpreter import Interpreter, load_delegate
qnn_delegate = load_delegate("libQnnTFLiteDelegate.so", options={"backend_type": "htp"})
interpreter = Interpreter(
    model_path=...,
    experimental_delegates=[qnn_delegate]
)
Running a model on the NPU (C++)
To offload a model to the NPU, you'll first need to add the following compile flags:
CFLAGS += -I${QNN_SDK_ROOT}/include
LDFLAGS += -L${QNN_SDK_ROOT}/lib/aarch64-ubuntu-gcc9.4 -lQnnTFLiteDelegate
Then, you instantiate the LiteRT delegate and pass it to the LiteRT interpreter:
// == Includes ==
#include "QNN/TFLiteDelegate/QnnTFLiteDelegate.h"
// == Application code ==
// Get your interpreter...
tflite::Interpreter *interpreter = ...;
// Create QNN Delegate options structure.
TfLiteQnnDelegateOptions options = TfLiteQnnDelegateOptionsDefault();
// Set the mandatory backend_type option. All other options have default values.
options.backend_type = kHtpBackend;
// Instantiate delegate. Must not be freed until interpreter is freed.
TfLiteDelegate* delegate = TfLiteQnnDelegateCreate(&options);
TfLiteStatus status = interpreter->ModifyGraphWithDelegate(delegate);
// Check that status == kTfLiteOk
Python Examples
Prerequisites
- Ubuntu OS should be flashed
- Terminal access with appropriate permissions
- If you haven’t previously installed the PPA packages, please run the following steps to install them.
git clone -b ubuntu_setup --single-branch https://github.com/rubikpi-ai/rubikpi-script.git
 cd rubikpi-script
 ./install_ppa_pkgs.sh
- Open the terminal on your development board, or an ssh session to your development board, and:
 Create a new venv, and install the LiteRT runtime and Pillow:python3 -m venv .venv-litert-demo --system-site-packages
 source .venv-litert-demo/bin/activate
 pip3 install ai-edge-litert==1.3.0 Pillow
 pip3 install opencv-python
- Install necessary python3 and gtk packages.
sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
 sudo apt install python3-venv python3-full
 sudo apt install -y pkg-config cmake libcairo2-dev
 sudo apt install libgirepository1.0-dev gir1.2-glib-2.0
 sudo apt install build-essential python3-dev python3-pip pkg-config meson
- Vision Transformers
- Image Classification
- Object Detection
Vision Transformers
Here's how you can run a Vision Transformer model (downloaded from AI Hub) on both the CPU and the NPU using the LiteRT delegates.
1️⃣ Create inference_vit.py and add following reference code:
import numpy as np
from ai_edge_litert.interpreter import Interpreter, load_delegate
from PIL import Image
import os, time, sys
import urllib.request
def curr_ms():
    return round(time.time() * 1000)
use_npu = True if len(sys.argv) >= 2 and sys.argv[1] == '--use-npu' else False
# Path to your quantized TFLite model and test image (will be download automatically)
MODEL_PATH = "vit-vit-w8a8.tflite"
IMAGE_PATH = "boa-constrictor.jpg"
LABELS_PATH = "vit-vit-labels.txt"
if not os.path.exists(MODEL_PATH):
    print("Downloading model...")
    model_url = 'https://cdn.edgeimpulse.com/qc-ai-docs/models/vit-vit-w8a8.tflite'
    urllib.request.urlretrieve(model_url, MODEL_PATH)
if not os.path.exists(LABELS_PATH):
    print("Downloading labels...")
    labels_url = 'https://cdn.edgeimpulse.com/qc-ai-docs/models/vit-vit-labels.txt'
    urllib.request.urlretrieve(labels_url, LABELS_PATH)
if not os.path.exists(IMAGE_PATH):
    print("Downloading image...")
    image_url = 'https://cdn.edgeimpulse.com/qc-ai-docs/examples/boa-constrictor.jpg'
    urllib.request.urlretrieve(image_url, IMAGE_PATH)
with open(LABELS_PATH, 'r') as f:
    labels = [line for line in f.read().splitlines() if line.strip()]
experimental_delegates = []
if use_npu:
    experimental_delegates = [load_delegate("libQnnTFLiteDelegate.so", options={"backend_type": "htp"})]
# Load TFLite model and allocate tensors
interpreter = Interpreter(
    model_path=MODEL_PATH,
    experimental_delegates=experimental_delegates
)
interpreter.allocate_tensors()
# Get input and output tensor details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Load and preprocess image
def load_image(path, input_shape):
    # Expected input shape: [1, height, width, channels]
    _, height, width, channels = input_shape
    img = Image.open(path).convert("RGB").resize((width, height))
    img_np = np.array(img, dtype=np.uint8)  # quantized models expect uint8
    img_np = np.expand_dims(img_np, axis=0)
    return img_np
input_shape = input_details[0]['shape']
input_data = load_image(IMAGE_PATH, input_shape)
# Set tensor and run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
# Run once to warmup
interpreter.invoke()
# Then run 10x
start = curr_ms()
for i in range(0, 10):
    interpreter.invoke()
end = curr_ms()
# Get prediction
q_output = interpreter.get_tensor(output_details[0]['index'])
scale, zero_point = output_details[0]['quantization']
f_output = (q_output.astype(np.float32) - zero_point) * scale
# Image classification models in AI Hub miss a Softmax() layer at the end of the model, so add it manually
def softmax(x, axis=-1):
    # subtract max for numerical stability
    x_max = np.max(x, axis=axis, keepdims=True)
    e_x = np.exp(x - x_max)
    return e_x / np.sum(e_x, axis=axis, keepdims=True)
# show top-5 predictions
scores = softmax(f_output[0])
top_k = scores.argsort()[-5:][::-1]
print("\nTop-5 predictions:")
for i in top_k:
    print(f"Class {labels[i]}: score={scores[i]}")
print('')
print(f'Inference took (on average): {(end - start) / 10}ms. per image')
2️⃣ Run the model on the CPU:
python3 inference_vit.py
# INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
#
# Top-5 predictions:
# Class boa constrictor: score=0.6264431476593018
# Class rock python: score=0.047579940408468246
# Class night snake: score=0.006721484009176493
# Class mouse: score=0.0022421202156692743
# Class pick: score=0.001942973816767335
#
# Inference took (on average): 391.1ms. per image
3️⃣ Run the model on the NPU:
python3 inference_vit.py --use-npu
# INFO: TfLiteQnnDelegate delegate: 1382 nodes delegated out of 1633 nodes with 27 partitions.
#
# INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
#
# Top-5 predictions:
# Class boa constrictor: score=0.6113042235374451
# Class rock python: score=0.038359832018613815
# Class night snake: score=0.011630792170763016
# Class mouse: score=0.002294909441843629
# Class lens cap: score=0.0018960189772769809
#
# Inference took (on average): 132.7ms. per image
As you can see this model runs significantly faster on NPU - but there's a slight change in the output of the model. You can also see that for this model not all layers can run on NPU ("1382 nodes delegated out of 1633 nodes with 27 partitions").
GTK-Based Image Classification App
Here's how you can use a GTK-based desktop application to run an image classification model—downloaded from (AI Hub) on both the CPU and NPU using LiteRT delegates from AI Engine Direct.
The GoogLeNet_w8a8.tflite model, sourced from AI-Hub, leverages TensorFlow Lite with QNN delegate acceleration to enable efficient on-device inference.
Now follow the steps to create Image classification application.
Summary
•  Type: Desktop GUI application
•  Functionality: Image classification using TFLite
•  Modes: CPU and QNN delegate
•  Interface: GTK-based GUI
•  Output: Top predictions with confidence bars
Environment Setup & Imports
In the step the script sets up display-related environment variables (for Linux systems) and imports necessary libraries like OpenCV, NumPy, GTK, and TensorFlow Lite.
import cv2, numpy as np, os, time
from gi.repository import Gtk, GLib, GdkPixbuf
import ai_edge_litert.interpreter as tflite
• GTK is used for the GUI, OpenCV for image handling, and TensorFlow Lite for inference.
Configuration Constants
These constants define paths to the model, label file, and delegate library.
TF_MODEL = "/home/ubuntu/GoogLeNet_w8a8.tflite"
LABELS = "/etc/labels/imagenet_labels.txt"
DELEGATE_PATH = "libQnnTFLiteDelegate.so"
DEVICE_OS = "Ubuntu"
Download TFlite Model
This script checks if a TensorFlow Lite model file exists locally, and if not, downloads it from a specified Hugging Face URL.
import urllib.request
if not os.path.exists(TF_MODEL):
   print("Downloading model...")
   model_url = 'https://huggingface.co/qualcomm/GoogLeNet/resolve/main/GoogLeNet_w8a8.tflite'
   urllib.request.urlretrieve(model_url, TF_MODEL)
Helper Functions
This step sets up the core logic and interface for image classification using LiteRT and GTK.
Softmax Calculation
Ensures numerical stability when converting logits to probabilities.
Pythondef stable_softmax(logits):    
logits = logits.astype(np.float32)    
shifted_logits = np.clip(logits - np.max(logits), -500, 500)    
exp_scores = np.exp(shifted_logits)    
return exp_scores / np.sum(exp_scores)
Label Loader
Loads class labels from a text file.
Pythondef load_labels(label_path):    
with open(label_path, 'r') as f: 
       return [line.strip() for line in f.readlines()]
Image Preprocessing
Prepares the image for model input: resizing, color conversion, and reshaping.
Pythondef preprocess_image(image_path, input_shape, input_dtype):   
img = cv2.imread(image_path)    
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)    
img = cv2.resize(img, (input_shape[2], input_shape[1]))    
img = img.astype(input_dtype)    
return np.expand_dims(img, axis=0)
Inference Execution
This function will:
•	Loads the model (with or without delegate)
•	Prepares input
•	Runs inference