Navigate the Complexity of Deploying AI at the Edge on Embedded Systems

The presentation will take place in Room 211 on Saturday, March 7, 2026 - 17:00 to 18:00

The proliferation of intelligent devices running on ARM64-based System-on-Chips (SoCs) has made on-device Machine Learning (ML) a necessity. However, the path from a trained model developed on a high-end workstation to high-speed, low-power inference on an embedded target is fragmented and complex.

This session addresses the deployment bottleneck faced by developers, where every hardware vendor (from leaders like NXP, Mediatek, and Qualcomm to common hobbyist platforms) supplies a unique AI pipeline that leverages their respective hardware. This heterogeneity mandates that developers understand vendor-specific drivers—be it proprietary Delegates or specialized Execution Providers (EPs)—to optimally utilize dedicated on-chip hardware accelerators (GPU and NPU).

We will introduce a pragmatic, standardized deployment workflow by focusing on the core technical choices that govern performance at the edge. The session will critically compare the two dominant inference runtimes, such as LiteRT and ONNX Runtime (ORT).

Dealing with hardware constraints is a key aspect. The deep integration with device-native Delegates is not an easy task, but performance-wise, it may be very rewarding.

On the other hand, cross-platform reliability and a wider operational range defined by a standard is definitely a sweet spot and very promising for interoperability.

We highlight that modern SoCs are moving beyond legacy Delegates by implementing a highly optimized, ONNX or LiteRT-native Execution Provider, bridging the gap between portability and dedicated hardware acceleration.

On a sequential level of performance customization for the target embedded system, it would be possible to convert a multiplatform exchange format into, for example, NCNN (Novel Convolutional Neural Network) or the promising Eclipse Aidge.

As a practical example, attendees will gain a clear understanding of the trade-offs, learning how to select the correct runtime and acceleration strategy to evaluate and deploy a high-performance Automatic Speech Recognition (ASR) pipeline across diverse vendor architectures.