Run Local LLMs on Any Device. Open-source
A high-throughput and memory-efficient inference and serving engine
Ready-to-use OCR with 80+ supported languages
AIMET is a library that provides advanced quantization and compression
Uncover insights, surface problems, monitor, and fine tune your LLM
The official Python client for the Huggingface Hub
Everything you need to build state-of-the-art foundation models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs
A set of Docker images for training and serving models in TensorFlow
Replace OpenAI GPT with another LLM in your app
Multilingual Automatic Speech Recognition with word-level timestamps
Easiest and laziest way for building multi-agent LLMs applications
Operating LLMs in production
State-of-the-art Parameter-Efficient Fine-Tuning
Library for OCR-related tasks powered by Deep Learning
Optimizing inference proxy for LLMs
Low-latency REST API for serving text-embeddings
Standardized Serverless ML Inference Platform on Kubernetes
The Triton Inference Server provides an optimized cloud
Library for serving Transformers models on Amazon SageMaker
Single-cell analysis in Python
MII makes low-latency and high-throughput inference possible
20+ high-performance LLMs with recipes to pretrain, finetune at scale
GPU environment management and cluster orchestration
Training and deploying machine learning models on Amazon SageMaker