Multimodal Diffusion with Representation Alignment
Visual intelligence for your home.
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Taming Stable Diffusion for Lip Sync
Extension of Google Research’s PaperBanana
All-in-one AI productivity platform with agents, workflows, and IM
Full-stack AI Red Teaming platform
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official implementation of Watermark Anything with Localized Messages
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
StarVector is a foundation model for SVG generation
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
Azure command-line interface
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Reference PyTorch implementation and models for DINOv3
The book 5 of statistics in simplicity
Effortless data labeling with AI support from Segment Anything
The open-source C/C++ package manager
Master the fundamentals of machine learning, deep learning
VMZ: Model Zoo for Video Modeling
Agent S: an open agentic framework that uses computers like a human