Accurate × Fast × Comprehensive
Contexts Optical Compression
Visual Causal Flow
Awesome multilingual OCR toolkits based on PaddlePaddle
OCR expert VLM powered by Hunyuan's native multimodal architecture
Easy Docker setup for Stable Diffusion with user-friendly UI
Official inference repo for FLUX.1 models
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Reference PyTorch implementation and models for DINOv3
An experimental version of DeepSeek model
Qwen3-omni is a natively end-to-end, omni-modal LLM
Models for object and human mesh reconstruction
Audio foundation model excelling in audio understanding
ChatGPT interface with better UI
DeepSeek Coder: Let the Code Write Itself
GLM-4-Voice | End-to-End Chinese-English Conversational Model
The official PyTorch implementation of Google's Gemma models
Fast-stable-diffusion + DreamBooth
A PyTorch library for implementing flow matching algorithms
Research code artifacts for Code World Model (CWM)
Open-weight, large-scale hybrid-attention reasoning model
High-Resolution Image Synthesis with Latent Diffusion Models
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation