image text input free download

Showing 2059 open source projects for "image text input"

View related business solutions

Cortex: Boost Developer Coding Skills
Cortex makes coding easier and faster for developers. See how our portal connects tools and cuts busywork.

Cortex is a simple portal that helps developers work smarter by linking all your tools, setting clear rules, and slashing repetitive tasks. It speeds up onboarding, updates old code, and fixes issues fast. Over 100 big companies use it to save time and get better results.

Try it now!
Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution
For Windows-Centric Organizations Looking for Secure File Transfer solutions

Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.

Learn More
1

Text-to-image Playground

A playground to generate images from any text prompt using SD

dalle-playground is an open-source web application that allows users to generate images from natural language text prompts using modern text-to-image generative models. Originally built around DALL-E Mini, the project later transitioned to using Stable Diffusion, enabling more detailed and higher-quality image synthesis. The system combines a backend machine learning service with a browser-based frontend interface that lets users experiment interactively with prompt engineering and generative AI. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
2

Qwen-Image

Qwen-Image is a powerful image generation foundation model

Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence.

1 Review

Downloads: 11 This Week

Last Update: 2026-02-10
See Project
3

GLM-Image

GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image

GLM-Image is an open-source generative AI model designed to create high-fidelity images from text prompts using a hybrid architecture that combines autoregressive semantic understanding with diffusion-based detail refinement. It excels at generating images that include complex layouts and detailed text content, making it especially useful for posters, diagrams, info-graphics, social media graphics, and visual content that requires precise text placement and semantic alignment. ...

Downloads: 3 This Week

Last Update: 2026-03-20
See Project
4

Image Toolbox

Image Toolbox is an powerful picture editor, which can crop

Image Toolbox is a powerful picture editor, which can crop, apply filters, add some drawings, erase background, edit EXIF, or even create a PDF file.

Downloads: 28 This Week

Last Update: 2026-04-09
See Project
Transforming NetOps Through No-Code Network Automation - NetBrain
For anyone searching for a complete no-code automation platform for hybrid network observability and AIOps

NetBrain, founded in 2004, provides a powerful no-code automation platform for hybrid network observability, allowing organizations to enhance their operational efficiency through automated workflows. The platform applies automation across three key workflows: troubleshooting, change management, and assessment.

Learn More
5

Z-Image

Image generation model with single-stream diffusion transformer

...The project includes several variants: Z-Image-Turbo, a distilled version optimized for speed and low resource consumption; Z-Image-Base, the full-capacity foundation model; and Z-Image-Edit, fine-tuned for image editing tasks. Despite its compact size, Z-Image produces outputs that closely rival those from much larger models — including strong rendering of bilingual (English and Chinese) text inside images, accurate prompt adherence, and good layout and composition.

Downloads: 44 This Week

Last Update: 2026-02-09
See Project
6

LongCat-Image

Foundation model for image generation

...The model excels at both text-to-image generation and instruction-guided image editing, offering users versatile capabilities for creative and practical tasks—whether generating art, mockups, or adjusting existing visuals with fine control.

Downloads: 2 This Week

Last Update: 2026-04-15
See Project
7

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 91 This Week

Last Update: 2 days ago
See Project
8

Intervention Image

PHP Image Processing

Intervention Image is a PHP image handling and manipulation library. It provides an easy-to-use interface for performing common image operations such as resizing, cropping, and applying filters. It supports a variety of image formats and can be integrated into Laravel projects or used independently in other PHP applications. The library is highly customizable, allowing for simple image manipulation tasks, or more advanced image processing workflows.

Downloads: 1 This Week

Last Update: 2026-04-07
See Project
9

ascii from image

Literally just an image -> ascii image generator

Converts images/video to ascii art.

Downloads: 0 This Week

Last Update: 2025-03-11
See Project
Estimating Software for Heavy Construction
Developed specifically for civil construction

Built by an estimator, SharpeSoft Estimator is a fully comprehensive software that allows for a more efficient and quicker job-winning bids. Ideal for civil, utility, heavy/highway, grading, excavating, paving, and pipeline contractors, SharpeSoft Estimator offers advanced features such as Item Master, Subcontractor Comparison, Materials Comparison, Grouped Items, Trench Profiler, Haul Calculations, What-if Scenarios, Batch Reports, and more.

Learn More
10

ComfyUI-HunyuanVideoWrapper

ComfyUI wrapper nodes for HunyuanVideo

The ComfyUI-HunyuanVideoWrapper project is a ComfyUI extension that integrates Hunyuan-based multimodal video generation models into node-based workflows. It allows users to generate or manipulate video content by combining text prompts with one or more input images, enabling flexible conditioning of outputs. The system introduces specialized nodes such as text-image encoders that allow multiple image inputs to be referenced directly within prompts. This makes it possible to guide generation using both visual and textual context simultaneously. The wrapper is designed to fit seamlessly into ComfyUI pipelines, enabling chaining with other nodes for advanced workflows. ...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
11

FLUX.2

Official inference repo for FLUX.2 models

FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved.

Downloads: 44 This Week

Last Update: 2026-03-12
See Project
12

Readest

Readest is a modern, feature-rich ebook reader

...The design seems to prioritize flexible input formats, possibly OCR or uploaded documents, and interactive tools to navigate or annotate them.

Downloads: 37 This Week

Last Update: 2026-04-13
See Project
13

DeepSeek VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal

DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to process visual inputs as context for downstream tasks. ...

Downloads: 6 This Week

Last Update: 2025-10-03
See Project
14

FireRed-Image-Edit

General-purpose image editing model that delivers high-fidelity

FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong instruction following and editing consistency. ...

Downloads: 0 This Week

Last Update: 2026-04-03
See Project
15

Tesseract OCR

Open Source OCR Engine

Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns. Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. ...

5 Reviews

Downloads: 3,117 This Week

Last Update: 2025-12-26
See Project
16

Mozc

Mozc - a Japanese Input Method Editor designed for multi-platform

Mozc is an open source Japanese Input Method Editor (IME) developed by Google, designed to provide Japanese text input across multiple operating systems including Android, macOS, Windows, GNU/Linux, and Chromium OS. The project originated as a subset of Google Japanese Input, released publicly under the BSD 3-Clause license for community use and development. Mozc offers core IME functionality such as text conversion, prediction, and dictionary-based input, enabling users to efficiently type and edit Japanese text. ...

Downloads: 6 This Week

Last Update: 7 hours ago
See Project
17

HunyuanCustom

Multimodal-Driven Architecture for Customized Video Generation

HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity reinforcement and modality-specific condition injection. Text-image fusion module based on LLaVA for improved multimodal understanding. ...

Downloads: 0 This Week

Last Update: 2025-10-15
See Project
18

Fooocus

Focus on prompting and generating

Fooocus is an open-source image generation software that simplifies the process of creating images from text prompts. Built on Gradio and leveraging Stable Diffusion XL, Fooocus eliminates the need for manual parameter tweaking, allowing users to focus solely on crafting prompts. It offers a user-friendly interface with minimal setup, making advanced image synthesis accessible to a broader audience.

Downloads: 259 This Week

Last Update: 2025-06-03
See Project
19

Qwen-Image-Layered

Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

...By combining text and structured image representations, it aims to facilitate tasks where both descriptive and structural understanding are important, such as detailed image QA, interactive image editing via prompt layers, and image-conditioned generation with structural control. The layered approach supports training signals that help the model learn how visual elements relate to each other and to textual context, rather than simply learning global image embeddings.

Downloads: 5 This Week

Last Update: 2026-01-05
See Project
20

jq

Lightweight and flexible command-line JSON processor

...Data in jq is represented as streams of JSON values - every jq expression runs for each value in its input stream, and can produce any number of values to its output stream. jq filters run on a stream of JSON data. The input to jq is parsed as a sequence of whitespace-separated JSON values which are passed through the provided filter one at a time. The output(s) of the filter are written to standard out, again as a sequence of whitespace-separated JSON data.

Downloads: 97 This Week

Last Update: 2025-07-01
See Project
21

Paint.NET

Downloads for Paint.NET, such as installer EXEs and portable ZIPs

Every feature and user interface element was designed to be immediately intuitive and quickly learnable without assistance. In order to handle multiple images easily, it uses a tabbed document interface. The tabs display a live thumbnail of the image instead of a text description. This makes navigation very simple and fast. Extensive work has gone into making it the fastest image editor available. Starting the app is nearly instantaneous, and every feature has been thoroughly optimized to take advantage of the latest multicore CPUs, GPUs, and NVMe SSDs. The use of DXGI Flip Model ensures low input latency and reduced power consumption. ...

Downloads: 159 This Week

Last Update: 2026-03-08
See Project
22

Hunyuan3D 2.0

High-Resolution 3D Assets Generation with Large Scale Diffusion Models

The Hunyuan3D-2 model, developed by Tencent, is designed for generating high-resolution 3D assets using large-scale diffusion models. This model offers advanced capabilities for creating detailed 3D models, including texture enhancements, multi-view shape generation, and rapid inference for real-time applications. It is particularly useful for industries requiring high-quality 3D content, such as gaming, film, and virtual reality. Hunyuan3D-2 supports various enhancements and is available...

Downloads: 34 This Week

Last Update: 2025-10-28
See Project
23

Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM

Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and 10 speech output languages. ...

Downloads: 2 This Week

Last Update: 2026-01-08
See Project
24

Qwen-VL

Chat & pretrained large vision language model

Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs and conversation (e.g. ...

Downloads: 4 This Week

Last Update: 2025-09-23
See Project
25

Tesseract.js

A pure Javascript Multilingual OCR

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image parameter, which should be something that is like an image. ...

Downloads: 18 This Week

Last Update: 2025-12-15
See Project