Lets make video diffusion practical
Long-form streaming TTS system for multi-speaker dialogue generation
Advancing Open-source World Models
Recovering the Visual Space from Any Views
LTX-Video Support for ComfyUI
Inference code for scalable emulation of protein equilibrium ensembles
ChatGPT interface with better UI
An experimental version of DeepSeek model
Repo for SeedVR2 & SeedVR
A Powerful Native Multimodal Model for Image Generation
gpt-oss-120b and gpt-oss-20b are two open-weight language models
PyTorch code and models for the DINOv2 self-supervised learning
Designed for text embedding and ranking tasks
GLM-4.5: Open-source LLM for intelligent agents by Z.ai
GLM-4 series: Open Multilingual Multimodal Chat LMs
CLIP, Predict the most relevant text snippet given an image
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Collection of Gemma 3 variants that are trained for performance
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Multimodal Models for Video Understanding and Editing
Block Diffusion for Ultra-Fast Speculative Decoding
4M: Massively Multimodal Masked Modeling
The official PyTorch implementation of Google's Gemma models
OCR expert VLM powered by Hunyuan's native multimodal architecture
The ChatGPT Retrieval Plugin lets you easily find personal documents