New set of lightweight state-of-the-art, open foundation models
LTX-Video Support for ComfyUI
Repo for SeedVR2 & SeedVR
GLM-4 series: Open Multilingual Multimodal Chat LMs
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Collection of Gemma 3 variants that are trained for performance
Repo of Qwen2-Audio chat & pretrained large audio language model
Block Diffusion for Ultra-Fast Speculative Decoding
tiktoken is a fast BPE tokeniser for use with OpenAI's models
Large Multimodal Models for Video Understanding and Editing
4M: Massively Multimodal Masked Modeling
The official PyTorch implementation of Google's Gemma models
OCR expert VLM powered by Hunyuan's native multimodal architecture
The ChatGPT Retrieval Plugin lets you easily find personal documents
Production-tested AI infrastructure tools
Instructions on how to use the Realtime API on Microcontrollers
High-Fidelity and Controllable Generation of Textured 3D Assets
Global weather forecasting model using graph neural networks and JAX
Pretrained time-series foundation model developed by Google Research
Inference script for Oasis 500M
Official implementation of DreamCraft3D
Implementation of the Surya Foundation Model for Heliophysics
code for Mesh R-CNN, ICCV 2019
LLM-based Reinforcement Learning audio edit model
A SOTA open-source image editing model