A lightweight vLLM implementation built from scratch
High-Resolution Image Synthesis with Latent Diffusion Models
Everything you need to build state-of-the-art foundation models
Inference Llama 2 in one file of pure C
Uplift modeling and causal inference with machine learning algorithms
Official inference framework for 1-bit LLMs
The official Python client for the Huggingface Hub
Bring the notion of Model-as-a-Service to life
AirLLM 70B inference with single 4GB GPU
Unified Model Serving Framework
Gaussian processes in TensorFlow
Jupyter notebook tutorials for OpenVINO
A set of Docker images for training and serving models in TensorFlow
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Training and deploying machine learning models on Amazon SageMaker
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Faster Whisper transcription with CTranslate2
Sparsity-aware deep learning inference runtime for CPUs
Code for running inference and finetuning with SAM 3 model
A library for accelerating Transformer models on NVIDIA GPUs
Operating LLMs in production
A course of learning LLM inference serving on Apple Silicon
A Customizable Image-to-Video Model based on HunyuanVideo
Official inference repo for FLUX.1 models
Achieving 3+ generation speedup on reasoning tasks