A set of Docker images for training and serving models in TensorFlow
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Replace OpenAI GPT with another LLM in your app
PyTorch extensions for fast R&D prototyping and Kaggle farming
Libraries for applying sparsification recipes to neural networks
Multilingual Automatic Speech Recognition with word-level timestamps
Lightweight Python library for adding real-time multi-object tracking
Neural Network Compression Framework for enhanced OpenVINO
Openai style api for open large language models
Sparsity-aware deep learning inference runtime for CPUs
Large Language Model Text Generation Inference
Superduper: Integrate AI models and machine learning workflows
A high-performance ML model serving framework, offers dynamic batching
PyTorch library of curated Transformer models and their components
A library for accelerating Transformer models on NVIDIA GPUs
Efficient few-shot learning with Sentence Transformers
Trainable models and NN optimization tools
Probabilistic reasoning and statistical analysis in TensorFlow
Simplifies the local serving of AI models from any source
Official inference library for Mistral models
20+ high-performance LLMs with recipes to pretrain, finetune at scale
Data manipulation and transformation for audio signal processing
Standardized Serverless ML Inference Platform on Kubernetes
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Phi-3.5 for Mac: Locally-run Vision and Language Models