Controllable and fast Text-to-Speech for over 7000 languages
Unified Multimodal Understanding and Generation Models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
PyTorch code and models for VJEPA2 self-supervised learning from video
Educational framework exploring multi-agent orchestration
A lightweight vision library for performing large object detection
This repo contains the code for 1D tokenizer and generator
Flexible Photo Recrafting While Preserving Your Identity
A SOTA open-source image editing model
Multi-Agent daTa geneRation Infra and eXperimentation framework
Build cross-modal and multimodal applications on the cloud
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Deep and online learning with spiking neural networks in Python
GUI Exploration Lab. One of the best GUI agent solutions
Large-language-model & vision-language-model based on Linear Attention
Chat & pretrained large audio language model proposed by Alibaba Cloud
Did you say you like data?
An Efficient and Easy-to-use Federated Learning Framework
Run LLMs locally on Cloud Workstations
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Tool-integrated Reasoning LLM Agents
NOTICE OF CONSOLIDATION & PARTNERSHIP PENDING As of April 2026, the 20
Scientific Visualisation Made Easy
A Customizable Image-to-Video Model based on HunyuanVideo