Long-form streaming TTS system for multi-speaker dialogue generation
Collection of Gemma 3 variants that are trained for performance
LTX-Video Support for ComfyUI
Repo of Qwen2-Audio chat & pretrained large audio language model
The official PyTorch implementation of Google's Gemma models
Inference script for Oasis 500M
Diffusion Transformer with Fine-Grained Chinese Understanding
A Customizable Image-to-Video Model based on HunyuanVideo
Large Multimodal Models for Video Understanding and Editing
OCR expert VLM powered by Hunyuan's native multimodal architecture
Implementation of "MobileCLIP" CVPR 2024
Global weather forecasting model using graph neural networks and JAX
Pretrained time-series foundation model developed by Google Research
ICLR2024 Spotlight: curation/training code, metadata, distribution
Official implementation of DreamCraft3D
LLM-based Reinforcement Learning audio edit model
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
code for Mesh R-CNN, ICCV 2019
The ChatGPT Retrieval Plugin lets you easily find personal documents
Implementation of the Surya Foundation Model for Heliophysics
A SOTA open-source image editing model
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Release for Improved Denoising Diffusion Probabilistic Models
Qwen2.5-Coder is the code version of Qwen2.5, the large language model
Open-source, high-performance Mixture-of-Experts large language model