GPU environment management and cluster orchestration
MII makes low-latency and high-throughput inference possible
Easiest and laziest way for building multi-agent LLMs applications
Pytorch domain library for recommendation systems
A set of Docker images for training and serving models in TensorFlow
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Replace OpenAI GPT with another LLM in your app
PyTorch extensions for fast R&D prototyping and Kaggle farming
Libraries for applying sparsification recipes to neural networks
Multilingual Automatic Speech Recognition with word-level timestamps
Lightweight Python library for adding real-time multi-object tracking
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Superduper: Integrate AI models and machine learning workflows
A high-performance ML model serving framework, offers dynamic batching
A library for accelerating Transformer models on NVIDIA GPUs
Efficient few-shot learning with Sentence Transformers
PyTorch library of curated Transformer models and their components
Fast inference engine for Transformer models
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
Simplifies the local serving of AI models from any source
Official inference library for Mistral models