Tiny vision language model
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Foundation model for image generation
Reference PyTorch implementation and models for DINOv3
State-of-the-art TTS model under 25MB
State-of-the-art (SoTA) text-to-video pre-trained model
OCR expert VLM powered by Hunyuan's native multimodal architecture
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Official inference repo for FLUX.2 models
A SOTA open-source image editing model
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Programmatic access to the AlphaGenome model
Industrial-level controllable zero-shot text-to-speech system
RGBD video generation model conditioned on camera input
Qwen3-Coder is the code version of Qwen3
Open-source deep-learning framework
Video understanding codebase from FAIR for reproducing video models
A series of math-specific large language models of our Qwen2 series
Open-source industrial-grade ASR models
Multimodal embedding and reranking models built on Qwen3-VL
Qwen3-omni is a natively end-to-end, omni-modal LLM
Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
New family of code large language models (LLMs)
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1