A library for accelerating Transformer models on NVIDIA GPUs
A high-throughput and memory-efficient inference and serving engine
High-performance inference framework for large language models
950 line, minimal, extensible LLM inference engine built from scratch
A lightweight vLLM implementation built from scratch
Low-latency AI inference engine optimized for mobile devices
Code for running inference and finetuning with SAM 3 model
Pruna is a model optimization framework built for developers
RGBD video generation model conditioned on camera input
Offline inference engine for art, real-time voice conversations
Inference Llama 2 in one file of pure C
Parallax is a distributed model serving framework
Universal LLM Deployment Engine with ML Compilation
LightLLM is a Python-based LLM (Large Language Model) inference
Tensor search for humans
Superduper: Integrate AI models and machine learning workflows
Supercharge Your LLM with the Fastest KV Cache Layer
Multi-Agent daTa geneRation Infra and eXperimentation framework
Effortless data labeling with AI support from Segment Anything
Running large language models on a single GPU
Toolbox of models, callbacks, and datasets for AI/ML researchers
A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
Real-Time State-of-the-art Speech Synthesis for Tensorflow 2
Auto-diff neural network library for high-dimensional sparse tensors
Open source embedded speech-to-text engine