Chat & pretrained large vision language model
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Open-source multi-speaker long-form text-to-speech model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A SOTA open-source image editing model
Tool for exploring and debugging transformer model behaviors
Capable of understanding text, audio, vision, video
Qwen3-TTS is an open-source series of TTS models
Qwen-Image is a powerful image generation foundation model
OpenTinker is an RL-as-a-Service infrastructure for foundation models
Generate Any 3D Scene in Seconds
OCR expert VLM powered by Hunyuan's native multimodal architecture
A Family of Open Foundation Models for Code Intelligence
Open-weight, large-scale hybrid-attention reasoning model
FAIR Sequence Modeling Toolkit 2
GLM-4 series: Open Multilingual Multimodal Chat LMs
Qwen3-omni is a natively end-to-end, omni-modal LLM
An AI-powered security review GitHub Action using Claude
Renderer for the harmony response format to be used with gpt-oss
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
State-of-the-art (SoTA) text-to-video pre-trained model
Open-source framework for intelligent speech interaction
Block Diffusion for Ultra-Fast Speculative Decoding