LISA: Reasoning Segmentation via Large Language Model
Qwen2.5-VL is the multimodal large language model series
Gracefully face hCaptcha challenge with multimodal llms
Visual intelligence for your home.
A python module to repair invalid JSON from LLMs
Driving with Graph Visual Question Answering
Open source demo platform where you can easily showcase your AI models
Chat & pretrained large vision language model
AI agent that streamlines the entire process of data analysis