Fri. Apr 24th, 2026

Building real-world on-device AI with LiteRT and NPU

Gemini Generated Image ignk8signk.2e16d0ba.fill


Users benefit from instant AI features like real-time video effects, ASR, and motion capture in their mobile apps. However, for developers, running sophisticated models on-device often comes with balancing unique challenges related to managing device thermals, preserving battery life, and preventing frame drops. To deliver fast, responsive AI experiences without compromising performance, LiteRT unlocks Neural Processing Units (NPUs), the hardware specifically built for these workloads.

LiteRT is a cross-platform production-ready framework for on-device AI, offering CPU, GPU, and NPU acceleration across mobile, desktop, and IoT platforms. Designed for performance and scalability, LiteRT simplifies the deployment of high-speed AI features, through a unified API. This abstracts the complexity of integrating with multiple NPU SDKs, allowing developers to target diverse silicon without writing vendor-specific code.

Translating NPU performance into meaningful experiences

LiteRT is already hardened across Google products, popular apps, and even SDKs. Utilized by industry leaders including Google Meet, Epic Games, and Argmax Inc. here is what NPU acceleration looks like in real-world production apps.

Google Meet: By leveraging the mobile NPU, Google Meet successfully deployed an Ultra-HD segmentation model 25x larger than previous versions – without sacrificing inference speed. Crucially, it maintains a consistent power footprint, creating thermal headroom necessary to deliver higher-quality background replacement throughout a typical 20-30 min session.

Epic Games, Inc: High-fidelity, real-time animation experiences demand exceptional efficiency. Epic’s Live Link Face (Beta) app for Android enables creators to capture performances from a single camera, then generate and stream real-time MetaHuman facial animation directly from their devices into Unreal Engine.

Real-time facial solving is computationally intensive and requires consistently low latency. By using LiteRT on the NPU, Epic unlocks dedicated on-device acceleration on supported Android devices, enabling up to 30 FPS performance for real-time MetaHuman animation.

Sorry, your browser doesn’t support playback for this video

Real-time MetaHuman facial animation in Unreal Engine with NPU

Argmax Inc recently launched the Argmax Pro SDK for Android for on-device speech recognition in collaboration with LiteRT. By utilizing LiteRT and AI Pack feature delivery via Google Play, Argmax was able to bring its top-tier accuracy and real-time speed while respecting app size constraints on Android. Crucially, they leveraged LiteRT’s Ahead-Of-Time (AOT) compilation to eliminate costly on-device compilation steps, enabling frontier speech models like NVIDIA Parakeet TDT 0.6B v2 to run with industry-leading latency.

Performance testing across Google Tensor, MediaTek and Qualcomm Technologies SoCs, Argmax Pro SDK showed that upgrading from GPU to NPU delivers over 2x speedup. Beyond the speedups, the power efficiency of NPUs enabled Argmax SDK Enterprise customers like Heidi Health to conduct reliable on-device live transcription for extended sessions while mitigating impact to battery life. Finally, by offloading runtime libraries and models to on-demand downloads via Play’s AI Packs, the device dynamically obtains the model that’s optimized for the specific NPU.

Untitled

Argmax’s Kotlin-first SDK brings top-tier accuracy and real-time speed to Android, with seamless NPU and GPU acceleration by Google LiteRT.

Google AI Edge Gallery App: To help developers test and validate the performance of NPU acceleration, we are happy to announce that the Google AI Edge Gallery App now features NPU support for select Gemma models and built-in benchmarking tools. Available on Android, AI Edge Gallery lets you quickly see the true potential of AI performance on mobile hardware. Developers can also access the Google AI Edge Gallery on GitHub to build their own experiences.

Sorry, your browser doesn’t support playback for this video

Explore various on-device LLM use cases with Google AI Edge Gallery

Scaling performance across the hardware spectrum

While the performance gains in speech, animation, and video are clear, the path to the NPU has historically been difficult to unlock for developers, due to various vendor-specific SDKs and complexities. By providing a streamlined workflow and cross platform support, LiteRT enables developers to deploy advanced models, from mobile phones to industrial IoT and AI PCs, without sacrificing performance or portability.

Cross-platform NPU support

As highlighted in the recent Google AI Edge Gemma 4 blog post, LiteRT extends NPU acceleration beyond mobile, allowing you to deploy your models across a range of hardware using a single framework. For the industrial edge, LiteRT supports platforms like the Qualcomm Dragonwing ™ IQ8 Series, which also powers Arduino VENTUNO Q, enabling high-reliability use cases like robotics and smart manufacturing with models like Gemma 4. For desktop, LiteRT is preparing for AI PCs through OpenVINO™ integration with Intel® Core™ Ultra series 2 and 3 processors, delivering significant power savings and responsiveness for local GenAI workloads.

Performance validation at scale

Google AI Edge Portal provides a benchmark service across 100+ of the most popular mobile phones with insights on ML workloads across devices, accelerators and configurations. Developers can now make data-driven deployment decisions, such as whether to use AOT or JIT, that best suit their use cases and their target devices. To use the latest Portal NPU features, sign up for our private preview here.

Sorry, your browser doesn’t support playback for this video

Google AI Edge Portal Benchmarking Results

Get started with your NPU journey

With our production-ready NPU integrations, LiteRT provides a unified workflow that abstracts away low-level complexities across both Just-In-Time (JIT) and Ahead-Of-Time (AOT) deployment.

Dive into our documentation and start your journey with NPU acceleration today.

Let us know your feedback and feature requests by opening an issue on our GitHub channel. We can’t wait to see what you build!

Acknowledgements

Google: Akshat Sharma, Alice Zheng, Andrew Zhang, Ashley Lin, Byungchul Kim, Changming Sun, Charlie Xu, Chenchen Tang, Chunlei Niu, Cormac Brick, Derek Bekebrede, Fabian Bergmark, Fengwu Yao, Gerardo Carranza, Gregory Karpiak, Jae Yoo, Jing Jin, Jingjiang Li, Julius Kammerl, Jun Jiang, Lu Wang, Maria Lyubimtseva, Mariana Quesada, Marissa Ikonomidis, Matt Kreileder, Matthias Grundmann, Meghna Johar, Na Li, Ping Yu, Renjie Wu, Rishika Sinha, Sachin Kotwani, Salil Tambe, Siargey Pisarchyk, Siargey Pisarchyk, Somdatta Banerjee, Steven Toribio, Suleman Shahid, Terry Heo, Wai Hon Law, Weiyi Wang, Xiaoming Hu

Partners: Alen Huang, Ankit Kapoor, Arda Atahan Ibis, Atila Orhon, Brian Keene, Chen Cen, Cheng-Dao Lee. Cheng-Yen Lin, Chun-Ting Lin (Graham), Code Lin, Deep Yap, Dylan Angus, Felix Baum, HungChun Liu, Jhih-Kuan Lin, Jiun-Kai Yang (Kelvin), Kedar Gharat, Ken Sieger, Laxmi Rayapudi, Lei Chen, Mike Tremaine, Ming-Che Lin (Vincent), Poyuan Jeng, MetaHuman Team, Vinesh Sukumar, Waimun Wong, Yi-Ru Chen, Yu-Ting Wan, Zach Nagengast

By uttu

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *