Generating Immersive, Explorable, and Interactive 3D Worlds
Unifying 3D Mesh Generation with Language Models
A Unified Framework for Text-to-3D and Image-to-3D Generation
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
A text-to-speech, speech-to-text and speech-to-speech library
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Generate Any 3D Scene in Seconds
HY-Motion model for 3D character animation generation
Official implementation of DreamCraft3D
State-of-the-art (SoTA) text-to-video pre-trained model
Implementation of Make-A-Video, new SOTA text to video generator
Implementation of Video Diffusion Models
A Systematic Framework for Interactive World Modeling
Framework for building AI-powered interactive digital humans and agent
The data structure for multimodal data
Workflow and speech recognition app
Crafting engine for artists, designers, and filmmakers
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Framework for building neural networks
State-of-the-art diffusion models for image and audio generation
Build cross-modal and multimodal applications on the cloud
Amica is an open source interface for interactive communication
Generate 3D objects conditioned on text or images
Framework that is dedicated to making neural data processing
CLIP + FFT/DWT/RGB = text to image/video