The most powerful and modular diffusion model GUI, api and backend
Recovering the Visual Space from Any Views
A Multi-Modal World Model for Reconstructing, Generating, Simulation
An unsupervised and free tool for image and video dataset analysis
PyTorch code and models for VJEPA2 self-supervised learning from video
Implementation of a U-net complete with efficient attention
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
NBA Stats API via Basketball Reference
Voice Recognition to Text Tool
Advancing Open-source World Models
Lightweight Python library for adding real-time multi-object tracking
A general fine-tuning kit geared toward image/video/audio diffusion
Harmonized and Coherent Human Image Animation
Official code for StoryMem: Multi-shot Long Video Storytelling
Director, Screenwriter, Producer, and Video Generator All-in-One
Streaming Real-time Audio-Driven Avatar Generation
Convert AI papers to GUI
NVR with realtime local object detection for IP cameras
Python data, Leaflet.js maps
Qwen3-omni is a natively end-to-end, omni-modal LLM
Code and models for ICML 2024 paper, NExT-GPT
Visual intelligence for your home.
Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop
Code for running inference and finetuning with SAM 3 model
Sharp Monocular Metric Depth in Less Than a Second