Run Local LLMs on Any Device. Open-source
FlashInfer: Kernel Library for LLM Serving
A library for accelerating Transformer models on NVIDIA GPUs
20+ high-performance LLMs with recipes to pretrain, finetune at scale
A high-throughput and memory-efficient inference and serving engine
Large Language Model Text Generation Inference
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT method
Standardized Serverless ML Inference Platform on Kubernetes
PyTorch library of curated Transformer models and their components
Pytorch domain library for recommendation systems
Simplifies the local serving of AI models from any source
Tensor search for humans
Unified Model Serving Framework
State-of-the-art diffusion models for image and audio generation
Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion
High quality, fast, modular reference implementation of SSD in PyTorch
OpenMMLab Model Deployment Framework
A computer vision framework to create and deploy apps in minutes
Framework that is dedicated to making neural data processing
Database system for building simpler and faster AI-powered application
Serve machine learning models within a Docker container
Framework for Accelerating LLM Generation with Multiple Decoding Heads
OpenMMLab Video Perception Toolbox
CPU/GPU inference server for Hugging Face transformer models