LLM-based Reinforcement Learning audio edit model
GLM-4 series: Open Multilingual Multimodal Chat LMs
Tiny vision language model
State-of-the-art (SoTA) text-to-video pre-trained model
The official PyTorch implementation of Google's Gemma models
Stable Diffusion WebUI Forge is a platform on top of Stable Diffusion
A series of math-specific large language models of our Qwen2 series
Diversity-driven optimization and large-model reasoning ability
CodeGeeX2: A More Powerful Multilingual Code Generation Model
Chinese and English multimodal conversational language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Open Source Speech Language Model
Open-source industrial-grade ASR models
Foundation model for image generation
Hunyuan Translation Model Version 1.5
Multimodal embedding and reranking models built on Qwen3-VL
Implementation of "MobileCLIP" CVPR 2024
Official implementation of Watermark Anything with Localized Messages
High-resolution models for human tasks
Video understanding codebase from FAIR for reproducing video models
Tool for exploring and debugging transformer model behaviors
CLIP, Predict the most relevant text snippet given an image
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Multimodal Diffusion with Representation Alignment