Tiny vision language model
Visual Causal Flow
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
LTX-Video Support for ComfyUI
Recovering the Visual Space from Any Views
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Python inference and LoRA trainer package for the LTX-2 audio–video
Lets make video diffusion practical
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
VMZ: Model Zoo for Video Modeling
OCR expert VLM powered by Hunyuan's native multimodal architecture
Inference script for Oasis 500M
Official code for Style Aligned Image Generation via Shared Attention
GLIDE: a diffusion-based text-conditional image synthesis model