Replace OpenAI GPT with another LLM in your app
Official inference library for Mistral models
Large Language Model Text Generation Inference
The Triton Inference Server provides an optimized cloud
Library for serving Transformers models on Amazon SageMaker
A high-throughput and memory-efficient inference and serving engine
FlashInfer: Kernel Library for LLM Serving
Optimizing inference proxy for LLMs
Deep learning optimization library: makes distributed training easy
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
MII makes low-latency and high-throughput inference possible
AIMET is a library that provides advanced quantization and compression
Standardized Serverless ML Inference Platform on Kubernetes
Ready-to-use OCR with 80+ supported languages
Low-latency REST API for serving text-embeddings
DoWhy is a Python library for causal inference
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Easiest and laziest way for building multi-agent LLMs applications
Uplift modeling and causal inference with machine learning algorithms
Everything you need to build state-of-the-art foundation models
Single-cell analysis in Python
The official Python client for the Huggingface Hub
Bring the notion of Model-as-a-Service to life
A library for accelerating Transformer models on NVIDIA GPUs
Training and deploying machine learning models on Amazon SageMaker