ComfyUI wrapper nodes for WanVideo and related models
Capable of understanding text, audio, vision, video
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Implementation of Phenaki Video, which uses Mask GIT
Generate high-definition story short videos with one click using AI
Director, Screenwriter, Producer, and Video Generator All-in-One
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Motion-controllable Video Generation via Latent Trajectory Guidance
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Lets make video diffusion practical
GPT4V-level open-source multi-modal model based on Llama3-8B
An unsupervised and free tool for image and video dataset analysis
Implementation of a U-net complete with efficient attention
Label Studio is a multi-type data labeling and annotation tool
Powerful open source team chat application
Recovering the Visual Space from Any Views
Generating Immersive, Explorable, and Interactive 3D Worlds
InvokeAI is a leading creative engine for Stable Diffusion models
A general fine-tuning kit geared toward image/video/audio diffusion
The most powerful and modular diffusion model GUI, api and backend
Python data, Leaflet.js maps
Sharp Monocular Metric Depth in Less Than a Second
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Dealing with all unstructured data, such as reverse image search
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA