Inference Llama 2 in one file of pure C
Port of Facebook's LLaMA model in C/C++
Run models like Kimi-K2.5, GLM-5, DeepSeek, gpt-oss, Gemma, Qwen etc.
MiniMax M2.1, a SOTA model for real-world dev & agents.
Run Local LLMs on Any Device. Open-source
TT-NN operator library, and TT-Metalium low level kernel programming
Distribute and run LLMs with a single file
Emscripten: An LLVM-to-WebAssembly Compiler
Distributed LLM and StableDiffusion inference
Next-gen AI+IoT framework for T2/T3/T5AI/ESP32/and more
The media player for language learning, with dual subtitles
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
C#/.NET binding of llama.cpp, including LLaMa/GPT model inference
AI-powered bridge connecting LLMs and advanced AI agents
The easiest way to use Ollama in .NET
Fast Multimodal LLM on Mobile Devices
Integrate cutting-edge LLM technology quickly and easily into your app
Mooncake is the serving platform for Kimi
LLM training in simple, raw C/CUDA
Alibaba's high-performance LLM inference engine for diverse apps
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
Run PyTorch LLMs locally on servers, desktop and mobile
An Easy-to-Use and High-Performance AI Deployment Framework
High-speed Large Language Model Serving for Local Deployment