lightweight, standalone C++ inference engine for Google's Gemma models
Alibaba's high-performance LLM inference engine for diverse apps
Fast Multimodal LLM on Mobile Devices
Fast inference engine for Transformer models
Low-latency AI inference engine optimized for mobile devices
Mooncake is the serving platform for Kimi
High-speed Large Language Model Serving for Local Deployment
QVAC Fabric: cross-platform LLM inference and fine-tuning
A GPU-accelerated library containing highly optimized building blocks
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
The AI-Native Search Database
Lightweight inference library for ONNX files, written in C++
Deep learning inference framework optimized for mobile platforms
Open source embedded speech-to-text engine