A lightweight vLLM implementation built from scratch
AirLLM 70B inference with single 4GB GPU
Inference Llama 2 in one file of pure C
Uplift modeling and causal inference with machine learning algorithms
DoWhy is a Python library for causal inference
Official inference framework for 1-bit LLMs
High-performance Inference and Deployment Toolkit for LLMs and VLMs
A set of Docker images for training and serving models in TensorFlow
The official Python client for the Huggingface Hub
Unified Model Serving Framework
Gaussian processes in TensorFlow
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Jupyter notebook tutorials for OpenVINO
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Faster Whisper transcription with CTranslate2
Sparsity-aware deep learning inference runtime for CPUs
Training and deploying machine learning models on Amazon SageMaker
Achieving 3+ generation speedup on reasoning tasks
A course of learning LLM inference serving on Apple Silicon
Official inference repo for FLUX.1 models
Bring the notion of Model-as-a-Service to life
Code for running inference and finetuning with SAM 3 model
A Customizable Image-to-Video Model based on HunyuanVideo
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
Uncover insights, surface problems, monitor, and fine tune your LLM