objects free download

LISA

LISA: Reasoning Segmentation via Large Language Model

...Instead of relying solely on predefined object categories, the model is capable of reasoning about complex textual queries and translating them into visual segmentation outputs. This approach allows the system to identify objects or regions in images based on semantic descriptions, contextual reasoning, and world knowledge. The model integrates multimodal capabilities by combining language understanding with visual perception so that text instructions guide the segmentation process. Researchers created a specialized task called reasoning segmentation, where the model must generate a mask for regions described in natural language instructions.

Downloads: 0 This Week

Last Update: 2026-03-06

See Project

Qwen-2.5-VL

Qwen2.5-VL is the multimodal large language model series

Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...

Downloads: 11 This Week

Last Update: 2026-01-30

See Project

hCaptcha Challenger

Gracefully face hCaptcha challenge with multimodal llms

...Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. The framework includes support for multiple types of captcha challenges such as object selection, drag-and-drop puzzles, and image labeling tasks. It implements an agent-style workflow where the system interprets the challenge prompt, selects the appropriate vision model, and generates the required interaction automatically.

Downloads: 4 This Week

Last Update: 2026-03-06

See Project

LLM Vision

Visual intelligence for your home.

LLM Vision is an open-source integration for Home Assistant that adds multimodal large language model capabilities to smart home environments. The project enables Home Assistant to analyze images, video files, and live camera feeds using vision-capable AI models. Instead of relying only on traditional object detection pipelines, it allows users to send prompts about visual content and receive contextual descriptions or answers about what is happening in camera footage. The system can process...

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

JSON_REPAIR

A python module to repair invalid JSON from LLMs

...The tool is particularly useful in scenarios where JSON output is generated by large language models or external services that may produce syntactically invalid responses. Instead of failing when encountering errors such as missing quotes, trailing commas, or incomplete objects, the library analyzes the malformed data and reconstructs it into valid JSON. The repair process can also be combined with optional JSON Schema validation to enforce structural constraints and ensure the output conforms to expected data types and formats. Developers can integrate the library into applications as a drop-in replacement for standard JSON parsing functions, allowing systems to tolerate imperfect structured data without crashing.

Downloads: 0 This Week

Last Update: 2 days ago

See Project

DriveLM

Driving with Graph Visual Question Answering

...The system includes DriveLM-Data, a dataset built on driving environments such as nuScenes and CARLA, where human-written reasoning steps connect different layers of driving tasks. This design allows models to learn relationships between objects, behaviors, and navigation decisions through graph-structured logic.

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

InternGPT

Open source demo platform where you can easily showcase your AI models

...Unlike traditional chat systems that rely solely on text prompts, InternGPT allows users to interact with visual content using both language and nonverbal signals such as pointing or highlighting objects within images. The framework connects multiple specialized AI models that perform tasks such as object detection, segmentation, captioning, and visual editing while coordinating them through a central conversational interface. This architecture enables the system to plan actions, execute visual operations, and return results in a coherent dialogue with the user.

Downloads: 0 This Week

Last Update: 2026-03-05

See Project

Qwen-VL

Chat & pretrained large vision language model

...Qwen-VL supports multilingual inputs and conversation (e.g. Chinese, English), and is aimed at tasks like image captioning, question answering on images (VQA, DocVQA), grounding (detecting objects or regions from textual queries), etc.

Downloads: 1 This Week

Last Update: 2025-09-23

See Project

Streamline Analyst

AI agent that streamlines the entire process of data analysis

Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates all the tasks such as data cleaning, preprocessing, and even complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless.

Downloads: 0 This Week

Last Update: 2024-09-23

See Project

Search Results for "objects"

Showing 9 open source projects for "objects"