A framework to enable multimodal models to operate a computer
Automated translation solution for visual novels
A simple tool for reading in poorly redacted documents
Tiny vision language model
Recovering the Visual Space from Any Views
A Grub Theme in the style of Minecraft!
A state-of-the-art open visual language model
Visual tool for building, testing, and deploying AI agent workflows
Director, Screenwriter, Producer, and Video Generator All-in-One
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
SAPIEN Manipulation Skill Framework
Machine Learning, Criticism and Correction
A Python visual Flow Based Programming library
Turn WiFi signals into real-time human pose estimation and detection
Generating Immersive, Explorable, and Interactive 3D Worlds
Machine learning image inpainting task that removes watermarks
Unified Multimodal Understanding and Generation Models
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Book_4_Matrix Power | The Iris Book: From Addition, Subtraction
Python inference and LoRA trainer package for the LTX-2 audio–video
A neural network that transforms a design mock-up into static websites
Video Object and Interaction Deletion
"VideoRAG: Chat with Your Videos
The most powerful Android RPA agent framework
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories