Openai style api for open large language models
Run Local LLMs on Any Device. Open-source
The Triton Inference Server provides an optimized cloud
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
FlashInfer: Kernel Library for LLM Serving
A high-throughput and memory-efficient inference and serving engine
Ready-to-use OCR with 80+ supported languages
The official Python client for the Huggingface Hub
A library for accelerating Transformer models on NVIDIA GPUs
Operating LLMs in production
GPU environment management and cluster orchestration
Everything you need to build state-of-the-art foundation models
Simplifies the local serving of AI models from any source
AIMET is a library that provides advanced quantization and compression
Large Language Model Text Generation Inference
Official inference library for Mistral models
Easiest and laziest way for building multi-agent LLMs applications
Optimizing inference proxy for LLMs
Neural Network Compression Framework for enhanced OpenVINO
Phi-3.5 for Mac: Locally-run Vision and Language Models
LLM training code for MosaicML foundation models
Bring the notion of Model-as-a-Service to life
Uncover insights, surface problems, monitor, and fine tune your LLM
Efficient few-shot learning with Sentence Transformers
Adversarial Robustness Toolbox (ART) - Python Library for ML security