A library for accelerating Transformer models on NVIDIA GPUs
LM Studio Apple MLX engine
A real time inference engine for temporal logical specifications
DeepEP: an efficient expert-parallel communication library
High-performance reactive message-passing based Bayesian engine
Ling is a MoE LLM provided and open-sourced by InclusionAI
A high-throughput and memory-efficient inference and serving engine
950 line, minimal, extensible LLM inference engine built from scratch
Open-source large language model family from Tencent Hunyuan
A high-performance inference engine for AI models
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI
A lightweight vLLM implementation built from scratch
Deep learning optimization library: makes distributed training easy
Jlama is a modern LLM inference engine for Java
Blazing fast, instant realtime GraphQL APIs on your DB
lightweight, standalone C++ inference engine for Google's Gemma models
Alibaba's high-performance LLM inference engine for diverse apps
RGBD video generation model conditioned on camera input
Code for running inference and finetuning with SAM 3 model
Code for running inference with the SAM 3D Body Model 3DB
High-performance inference framework for large language models
A Powerful Native Multimodal Model for Image Generation
OCR expert VLM powered by Hunyuan's native multimodal architecture
Offline inference engine for art, real-time voice conversations
Inference Llama 2 in one file of pure C