Tiny vision language model
Visual Causal Flow
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Moonshot's most powerful AI model
LTX-Video Support for ComfyUI
Recovering the Visual Space from Any Views
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Python inference and LoRA trainer package for the LTX-2 audio–video
Wan2.1: Open and Advanced Large-Scale Video Generative Model
VMZ: Model Zoo for Video Modeling
Lets make video diffusion practical
Flux 2 image generation model pure C inference
Clean and efficient FP8 GEMM kernels with fine-grained scaling
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Foundational Models for State-of-the-Art Speech and Text Translation
OCR expert VLM powered by Hunyuan's native multimodal architecture
Inference script for Oasis 500M
Official code for Style Aligned Image Generation via Shared Attention
llama.go is like llama.cpp in pure Golang
GLIDE: a diffusion-based text-conditional image synthesis model
Vision-language-action model for robot control via images and text