Standardized Serverless ML Inference Platform on Kubernetes
Large Language Model Text Generation Inference
Sparsity-aware deep learning inference runtime for CPUs
Neural Network Compression Framework for enhanced OpenVINO
Efficient few-shot learning with Sentence Transformers
Data manipulation and transformation for audio signal processing
The Triton Inference Server provides an optimized cloud
Bring the notion of Model-as-a-Service to life
Libraries for applying sparsification recipes to neural networks
Openai style api for open large language models
Unified Model Serving Framework
A Unified Library for Parameter-Efficient Learning
Lightweight Python library for adding real-time multi-object tracking
Integrate, train and manage any AI models and APIs with your database
Library for serving Transformers models on Amazon SageMaker
A unified framework for scalable computing
An easy-to-use LLMs quantization package with user-friendly apis
A toolkit to optimize ML models for deployment for Keras & TensorFlow
OpenMMLab Model Deployment Framework
Framework that is dedicated to making neural data processing
Database system for building simpler and faster AI-powered application
Framework for Accelerating LLM Generation with Multiple Decoding Heads
Toolkit for allowing inference and serving with MXNet in SageMaker