A Pythonic framework to simplify AI service building
Port of Facebook's LLaMA model in C/C++
Superduper: Integrate AI models and machine learning workflows
AI interface for tinkerers (Ollama, Haystack RAG, Python)
Integrate, train and manage any AI models and APIs with your database
Simplifies the local serving of AI models from any source
Open-Source AI Camera. Empower any camera/CCTV
Run Local LLMs on Any Device. Open-source
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
Phi-3.5 for Mac: Locally-run Vision and Language Models
AIMET is a library that provides advanced quantization and compression
State-of-the-art diffusion models for image and audio generation
The Triton Inference Server provides an optimized cloud
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Operating LLMs in production
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Bring the notion of Model-as-a-Service to life
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Neural Network Compression Framework for enhanced OpenVINO
Sparsity-aware deep learning inference runtime for CPUs
A library to communicate with ChatGPT, Claude, Copilot, Gemini
Build your chatbot within minutes on your favorite device
Official inference library for Mistral models
MII makes low-latency and high-throughput inference possible
Replace OpenAI GPT with another LLM in your app