AI Developer Workflow
The AI/ML developer workflow on Qualcomm Linux has two major steps:
Step 1:
Compile and optimize a model
- Compile and optimize the model from the third-party AI framework to efficiently run on RUBIK Pi 3. For example, a Tensorflow model can be exported to a TFLite model.
- Optionally, quantize, fine-tune performance, and accuracy using hardware-specific customizations.
Step 2:
Build an application to use the optimized model to run on-device inference
- Integrate the AI model into the use case pipeline in chi-cdk node.
- Alternatively, use the Android native SDK to call Qualcomm AI SDKs.
Overview
Developers can bring models from ONNX, PyTorch, TensorFlow or TFLite and run them efficiently on Qualcomm AI Hardware – HTP (NPU), GPU, CPU using Qualcomm AI SDKs.
AI hardware
-
Qualcomm Kryo™ CPU – Best-in-class CPU with high performance and remarkable power efficiency.
-
Qualcomm Adreno GPU – Suitable to execute AI workloads with balanced power and performance. AI workloads are accelerated with OpenCL kernels. The GPU can also be used to accelerate model pre/postprocessing.
-
Qualcomm Hexagon Tensor Processor (HTP) – Also known as NPU/DSP/HMX, suitable to execute AI workloads with low-power and high-performance. For optimized performance, pretrained models need be quantized to one of the supported precisions.
AI software
AI stack contains SDKs to harness the power of AI hardware accelerators. Developers can use one of the SDKs of their choice to deploy AI workloads. Pretrained models (except for TFLite models) need to be converted to an executable format with the selected SDK before running them. TFLite models can be run directly using TFLite Delegate.
- TFLite
TFLite models can be executed natively on RUBIK Pi 3 with acceleration using the following Delegates.
Delegate | Acceleration |
---|---|
AI Engine Direct Delegate (QNN Delegate) | CPU, GPU and HTP |
XNNPACK Delegate | CPU |
GPU Delegate | GPU |
- Qualcomm Neural Processing Engine (SNPE) SDK
Qualcomm Neural Processing Engine (SNPE) is a software accelerated runtime for execution of deep neural networks. SNPE offers tools to convert, quantize neural networks and accelerate them on hardware accelerators including CPU, GPU, and HTP.
- Qualcomm AI Engine Direct (QNN)
Qualcomm AI Engine Direct is a software architecture for AI/ML use cases on Qualcomm chipsets and AI acceleration cores. The architecture is designed to provide a unified API and modular and extensible per-accelerator libraries, which form a reusable basis for full stack AI solutions. It provides support for runtimes such as Qualcomm Neural Processing SDK, TFLite AI Engine Direct Delegate.
Compile and optimize model
Users can take one of two paths available to compile and optimize their models.
-
AI Hub
-
AI Software Stack
AI Hub
For quick prototyping of models on Qualcomm AI hardware, AI Hub provides a way to optimize, validate, and deploy machine learning models on-device for vision, audio, and speech use cases.
Refer to the AI Hub documentation on setup and getting started.
Setup
- Setup your Python environment.
Install miniconda on your machine.
Windows: When the installation finishes, open Anaconda prompt from the Start menu.
macOS/Linux: When the installation finishes, open a new shell window.
Set up a Python virtual environment for Qualcomm AI Hub:
source <path>/miniconda3/bin/activate
conda create python=3.8 -n qai_hub
conda activate qai_hub
- Install the AI Hub Python client.
pip3 install qai-hub
pip3 install "qai-hub[torch]"
- Sign in to AI Hub.
Go to AI Hub and sign in with your Qualcomm ID to view information about jobs you create. Once signed in, navigate to Account > Settings > API Token. This should provide an API token that you can use to configure your client.
- Configure the client with your API token using the following command in your terminal.
qai-hub configure --api_token <INSERT_API_TOKEN>
AI Hub workflow
Try a preoptimized model
Navigate to AI Hub Model Zoo to access preoptimized models available for RUBIK Pi 3.
Click Download model to begin model download. The downloaded model is already preoptimized and ready for Develop your own application.
Bring your own model
-
Select a pretrained model in PyTorch or Onnx format.
-
Submit a model for compilation or optimization to AI Hub using python APIs.
When submitting a compilation job, you must select a device or chipset and the target runtime to compile the model. For RUBIK Pi 3, the TFLite runtime is supported.
Chipset | Runtime | CPU | GPU | HTP |
---|---|---|---|---|
QCS6490 | TFLite | INT8,FP16, FP32 | FP16,FP32 | INT8,INT16 |
On submission, AI Hub generates a unique ID for the job. You can use this job ID to view job details.
- AI Hub optimizes the model based your device and runtime selections.
Optionally, you can submit a job to profile or inference the optimized model (using Python APIs) on a real device provisioned from a device farm.
-
Profiling: Benchmarks the model on a provisioned device and provides statistics, including average inference times at the layer level, runtime configuration, etc.
-
Inference: Performs inference using an optimized model on data submitted as part of the inference job by running the model on a provisioned device.
- Each submitted job will be available for review in the AI Hub portal. A submitted compilation job will provide a downloadable link to the optimized model. This optimized model can then be deployed on a local development device like RUBIK Pi 3.
The following is an example of the described workflow taken from the AI Hub documentation. In this example, a MobileNet V2 pretrained model from PyTorch is uploaded to AI Hub and compiled to an optimized TFLite model to run on a RUBIK Pi 3 target.
import qai_hub as hub
import torch
from torchvision.models import mobilenet_v2
import numpy as np
# Using pre-trained MobileNet
torch_model = mobilenet_v2(pretrained=True)
torch_model.eval()
# Trace model (for on-device deployment)
input_shape = (1, 3, 224, 224)
example_input = torch.rand(input_shape)
traced_torch_model = torch.jit.trace(torch_model, example_input)
# Compile and optimize the model for a specific device
compile_job = hub.submit_compile_job(
model=traced_torch_model,
device=hub.Device("QCS6490 (Proxy)"),
input_specs=dict(image=input_shape),
#compile_options="--target_runtime tflite",
)
# Profiling Job
profile_job = hub.submit_profile_job(
model=compile_job.get_target_model(),
device=hub.Device("QCS6490 (Proxy)"),
)
sample = np.random.random((1, 3, 224, 224)).astype(np.float32)
# Inference Job
inference_job = hub.submit_inference_job(
model=compile_job.get_target_model(),
device=hub.Device("QCS6490 (Proxy)"),
inputs=dict(image=[sample]),
)
# Download model
compile_job.download_target_model(filename="/tmp/mobilenetv2.tflite")
To deactivate a previously activated qai_hub environment, use the following command.
conda deactivate
Once the model is downloaded, it is ready for developing your own application. For more details about the AI Hub workflow and APIs, refer to the AI Hub Documentation.
TFLite
TensorFlow Lite (TFLite) is an open-source deep learning framework for on-device inference. TFLite helps developers run their models on mobile, embedded, and edge platforms by optimizing the model for latency, model size, power consumption, etc. Qualcomm supports executing TFLite models natively on Qualcomm Linux hardware through TFLite Delegates as listed below.
Delegate | Acceleration |
---|---|
AI Engine Direct Delegate (QNN Delegate) | CPU, GPU and HTP |
XNNPack Delegate | CPU |
GPU Delegate | GPU |
Reference Guide | API Reference |
---|---|
Reference Guide | C/C++ |
Qualcomm Neural Processing Engine (SNPE)
Qualcomm Neural Processing Engine (also known as Snapdragon Neural Processing Engine or SNPE) is an all-in-one SDK to port ML models to run on Qualcomm hardware accelerators. SNPE offers tools to convert and quantize models trained in PyTorch and TensorFlow and offers a runtime to execute these models on CPU, GPU, and HTP. Learn more about SNPE SDK here.
Setup | Customize | Run Inference | API Reference |
---|---|---|---|
Model deployment | C/C++ |
Install Qualcomm Neural Processing Engine SDK
Qualcomm Neural Processing Engine SDK requires an Ubuntu 22.04 host machine.
If the host machine uses Windows or macOS operating system software, install a virtual machine following the steps provided here. Subsequent steps must be run in the virtual machine running Ubuntu 22.04 LTS.
The Qualcomm Neural Processing SDK workflow has been validated with Ubuntu 22.04 running on bare metal or inside a Virtual Machine in the following configurations.
Ubuntu 22.04 running inside a VM on macOS on Arm architecture is not supported.
Download Method | Prerequisites | Version available |
---|---|---|
Direct Download | No prerequisites for download | v2.22.6 (SDK updated everyquarter) |
Qualcomm Package Manager | A valid Qualcomm ID is needed. Qualcomm Package Manager tool. | SDK updated every month |
Direct Download
Qualcomm Neural Processing Engine SDK is available for direct download here. Once downloaded, extract or unzip the SDK.
The SDK version hosted at the above link is updated quarterly.
unzip 2.22.6.240515.zip
cd qairt/2.22.6.240515/
export SNPE_ROOT=`pwd`
Qualcomm Package Manager
Qualcomm Neural Processing Engine SDK is available for download via Qualcomm Package Manager (QPM). This section demonstrates how to download Qualcomm Neural Processing Engine SDK using QPM.
To download Qualcomm Neural Processing SDK from Qualcomm Package Manager, ensure that you have registered for a Qualcomm ID. If you do not have a Qualcomm ID, you will be prompted to register. Then follow the instructions below to download and install the SDK.
-
Go to Qualcomm Package Manager and log in using your Qualcomm ID.
-
Navigate to the Tools tab and search for AI Stack in the left pane. From the filtered results, click the arrow next to Qualcomm® AI Stack to expand the list of available AI Software SDKs and select Qualcomm Neural Processing SDK from the list.
- Click on Qualcomm Neural Processing SDK to navigate to the next page. Choose Linux and Version 2.22.6.240515 from the drop-down lists and then click Download to download the Qualcomm Neural Processing SDK installer.
-
These instructions use version 2.22.6.240515 as an example.
-
If using the Qualcomm Package Manager desktop tool, the Extract button will be present instead of download. This automatically installs the SNPE SDK.
- Install the downloaded installer (.qik) using the QPM CLI tool.
-
Log in using QPM CLI.
qpm-cli --login <username>
- Activate your SDK license.
qpm-cli --license-activate qualcomm_neural_processing_sdk
- Extract and install the SDK.
qpm-cli --extract <path to downloaded .qik file>
Qualcomm Neural Processing SDK will be installed at: /opt/qcom/aistack/qairt/2.22.6.240515
Set up Qualcomm Neural Processing SDK
Prerequisites:
- Host OS: Ubuntu22.04 LTS
If the host machine uses Windows or Mac OS, install a virtual machine following the steps provided here. Subsequent steps must be run in the virtual machine running Ubuntu 22.04 LTS.
-
Enable SSH in Permissive mode to securely log in to the host device. For instructions, see How to SSH?
-
For the rest of the document, environment variable SNPE_ROOT represents the full path to the Qualcomm® Neural Processing SDK root directory.
If installed via the direct download method, ${SNPE_ROOT} is the path to the unzipped SDK.
unzip v2.22.6.240515.zip
cd ~/qairt/2.22.6.240515
export SNPE_ROOT=`pwd`
If installed via QPM, the SDK is installed under /opt/qcom/aistack/qairt/<version>
export SNPE_ROOT=/opt/qcom/aistack/qairt/2.22.6.240515/
- Python: v3.10
If Python is already installed, ensure that the environment path is updated with the Python 3.10 path. If Python3.10 is not installed on your system, you may install it with the following commands:
sudo apt-get update
sudo apt-get install python3.10 python3-distutils libpython3.10
- System Dependencies:
Run the following command as administrator/root to install system libraries.
sudo bash ${SNPE_ROOT}/bin/check-linux-dependency.sh
- Virtual environment (VENV)
<venv_path> is the path for new virtual environment.
sudo apt-get install python3.10-venv
python3.10 -m venv "<venv_path>"
source <venv_path>/bin/activate
Run the following script to check and install missing dependencies:
python3 -m pip install --upgrade pip
${SNPE_ROOT}/bin/check-python-dependency
Set up ML Frameworks
To convert ML models trained on different frameworks into intermediate representations consumable by the Qualcomm® Neural Processing SDK, you may need to download and install the corresponding frameworks on your host machine.
This Qualcomm ® Neural Processing SDK release is verified to work with the following versions of the ML training frameworks:
Framework | Version |
---|---|
TensorFlow | v2.10.1 |
TFLite | v2.3.0 |
PyTorch | V1.13.1 |
ONNX | V1.12.0 |
# Install tensorflow
pip install tensorflow==2.10.1
# Install tflite
pip install tflite==2.3.0
# Install PyTorch
pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --
extra-index-url https://download.pytorch.org/whl/cpu
# Install Onnx
pip install onnx==1.12.0 onnxruntime==1.17.1 onnxsim==0.4.36
Set up SNPE SDK Environment
Source the environment setup script provided by SNPE SDK to ensure all necessary tools and libraries are available in $PATH for the workflow.
source ${SNPE_ROOT}/bin/envsetup.sh
Model Porting using SNPE
Model Conversion
A pretrained floating point, 32-bit precision model from PyTorch, Onnx, TensorFlow, or TFLite is input to SNPE converter tools (snpe-<framework>-to-dlc) to convert the model to a Qualcomm-specific intermediate representation of the model called a deep learning container (DLC).
In addition to the input model from a source framework, the converters require additional details about the input model, such as the input node name, its corresponding input dimensions, and any output tensor names (for models with multiple outputs).
Refer to converters for all available configurable parameters or see the command line help by running
snpe-<framework>-to-dlc --help
.
required arguments:
-d INPUT_NAME INPUT_DIM, --input_dim INPUT_NAME INPUT_DIM
The names and dimensions of the network input layers specified in the format
[input_name comma-separated-dimensions], for example:
'data' 1,224,224,3
Note that the quotes should always be included in order to handle special
characters, spaces, etc.For multiple inputs specify multiple --input_dim on the command line
like:
--input_dim 'data1' 1,224,224,3 --input_dim 'data2' 1,50,100,3
--out_node OUT_NAMES, --out_name OUT_NAMES
Name of the graph's output Tensor Names. Multiple output names should be provided separately like:
--out_name out_1 --out_name out_2
--input_network INPUT_NETWORK, -i INPUT_NETWORK
Path to the source framework model.
The following example uses an ONNX model (inception_v3_opset16.onnx) downloaded from the ONNX Model Zoo. Download the model as inception_v3.onnx to your workspace. In this example, we download the model to the ~/models directory.
Run the following command to generate the inception_v3.dlc model.
${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-onnx-to-dlc --input_network ~/models/
inception_v3.onnx --output_path ~/models/inception_v3.dlc --input_dim 'x'
1,3,299,299
Model Quantization
To run a model on Hexagon Tensor Processor (HTP), the converted DLC must be quantized. SNPE offers a tool (snpe-dlc-quant) to quantize a DLC model to INT8/INT16 DLC using its own quantization algorithm. More information about SNPE quantization is available here.
The quantization process in SNPE requires two steps:
- Quantization of weights and biases within the model.
Quantization of weights and biases is a static step, i.e., no additional input data is required from the user.
- Quantization of activation layers (or layers with no weights).
-
Quantizing activation layers requires a set of input images from a training dataset as calibration data.
-
These calibration dataset images are input as a list of preprocessed image files in .raw format. The file sizes of these input .raw files must match the input size of the model.
Inputs to snpe-dlc-quant
are a converted DLC model and a plain text file with the paths to the calibration dataset images. This input list holds paths to preprocessed images saved as NumPy arrays in .raw format. The size of the preprocessed image must match the input resolution of the model.
The output of the snpe-dlc-quant
tool is a quantized DLC.
[ --input_dlc=<val> ]
Path to the dlc container
containing the model for which fixed-point encoding metadata should be generated.
This argument is required.
[ --input_list=<val> ] Path to a file
specifying the trial inputs. This file should be a plain text file, containing one or more absolute file paths per line. These files will be taken to constitute the trial set. Each path is expected to point to a binary file containing one trial input in the 'raw' format, ready to be consumed by the tool without any further modifications. This is similar to how input is provided to snpe-net-run application.
[ --output_dlc=<val> ]
Path at which the metadata-included quantized model container should be written. If this argument is omitted, the quantized model will be written at <unquantized_model_name>_quantized.dlc.
Use Netron graph visualization tool to identify the model’s input/output layer dimensions.
For demo purposes, we can evaluate the quantization process with random input files. The input file can be generated using a simple Python script shown below for the inception_v3.onnx model. Save the script as generate_random_input.py in your workspace ~/models/ and run it using python ~/models/generate_random_input.py on your host machine.
The following example Python code creates an input_list that holds paths to calibration dataset images used to quantize the model.
import os
import numpy as np
input_path_list =[]
BASE_PATH = "/tmp/RandomInputsForInceptionV3"
if not os.path.exists(BASE_PATH):
os.mkdir(BASE_PATH)
# generate 10 random inputs and save as raw
NUM_IMAGES = 10
#binary files
for img in range(NUM_IMAGES):
filename = "input_{}.raw".format(img)
randomTensor = np.random.random((1, 299, 299, 3)).astype(np.float32)
filename = os.path.join(BASE_PATH, filename)
randomTensor.tofile(filename)
input_path_list.append(filename)
#for saving as input_list text
with open("input_list.txt", "w") as f:
for path in input_path_list:
f.write(path)
f.write(‘\n’)
The above script generates 10 sample input files saved in the /tmp/RandomInputsForInceptionV3 directory and an input_list.txt file that contains the path to each sample generated.
Now that all needed inputs to the snpe-dlc-quant tool are available, the model can now be quantized.
${SNPE_ROOT}/bin/x86_64-linux-clang/snpe-dlc-quant --input_dlc ~/models/
inception_v3.dlc --output_dlc ~/models/inception_v3_quantized.dlc --
input_list ~/models/input_list.txt
This generates a quantized inception_v3 DLC model (inception_v3_quantized.dlc). By default, the model is quantized for INT8 bit width.
Developers can customize the quantization to use 16-bit instead of default INT8 by specifying [--act_bw 16] and/or [--weight_bw 16] options to the snpe-dlc-quant tool.
Refer to the snpe-dlc-quant tool documentation, or run snpe-dlc-quant --help
to view all available customizations including quantization modes, optimizations, etc.