PyTorch library of curated Transformer models and their components
A high-throughput and memory-efficient inference and serving engine
State-of-the-art Parameter-Efficient Fine-Tuning
Multilingual sentence & image embeddings with BERT
MobileLLM Optimizing Sub-billion Parameter Language Models
Operating LLMs in production
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Low-code framework for building custom LLMs, neural networks
Replace OpenAI GPT with another LLM in your app
Qwen3-Coder is the code version of Qwen3
Framework and no-code GUI for fine-tuning LLMs
Gemma open-weight LLM library, from Google DeepMind
Toolkit for conversational AI
Unified KV Cache Compression Methods for Auto-Regressive Models
A series of math-specific large language models of our Qwen2 series
Qwen3-omni is a natively end-to-end, omni-modal LLM
Designed for text embedding and ranking tasks
Capable of understanding text, audio, vision, video
A state-of-the-art open visual language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Database system for building simpler and faster AI-powered application
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)