Search Results for "tesseract-ocr-w64-setup-v5.x.x.exe"

Sort By:

Showing 2650 open source projects for "tesseract-ocr-w64-setup-v5.x.x.exe"

View related business solutions

Linux Clear Filters & Widen Search

SoftCo: Enterprise Invoice and P2P Automation Software
For companies that process over 20,000 invoices per year

SoftCo Accounts Payable Automation processes all PO and non-PO supplier invoices electronically from capture and matching through to invoice approval and query management. SoftCoAP delivers unparalleled touchless automation by embedding AI across matching, coding, routing, and exception handling to minimize the number of supplier invoices requiring manual intervention. The result is 89% processing savings, supported by a context-aware AI Assistant that helps users understand exceptions, answer questions, and take the right action faster.

Learn More
MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design.
Microstation enables architects, engineers, and designers to create precise 2D and 3D drawings that bring complex projects to life.

MicroStation is the only computer-aided design software for infrastructure design, helping architects and engineers like you bring their vision to life, present their designs to their clients, and deliver their projects to the community.

Learn More
1

Tesseract OCR

Open Source OCR Engine

Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns.

5 Reviews

Downloads: 3,423 This Week

Last Update: 2025-12-26
See Project
2

Tesseract.js

A pure Javascript Multilingual OCR

Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images.

Downloads: 24 This Week

Last Update: 2025-12-15
See Project
3

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines.

Downloads: 52 This Week

Last Update: 2026-01-15
See Project
4

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...

Downloads: 20 This Week

Last Update: 6 days ago
See Project
Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
5

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines.

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
6

Zerox OCR

PDF to Markdown with vision models

A dead simple way of OCR-ing a document for AI ingestion. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. The vision models just make sense. ZeroX is an open-source machine learning framework designed for fast experimentation and production deployment, optimized for speed and ease of use.

Downloads: 6 This Week

Last Update: 2024-12-18
See Project
7

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...

Downloads: 2 This Week

Last Update: 2026-01-27
See Project
8

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...

Downloads: 9 This Week

Last Update: 2026-02-03
See Project
9

Setup IPsec VPN

Scripts to build your own IPsec VPN server with IPsec/L2TP

setup-ipsec-vpn is a set of automated scripts that allow you to deploy your own IPsec VPN server on Linux in just a few minutes. It supports multiple VPN protocols, including IPsec/L2TP, Cisco IPsec (XAuth), and IKEv2, providing strong encryption to protect network traffic. By encrypting all traffic between the client and server, the VPN prevents eavesdropping on unsecured networks such as public Wi-Fi in coffee shops, hotels, or airports.

Downloads: 26 This Week

Last Update: 3 days ago
See Project
Turn traffic into pipeline and prospects into customers
For account executives and sales engineers looking for a solution to manage their insights and sales data

Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.

Learn More
10
$Rapid LaTeX OCR$

Rapid LaTeX OCR

Formula recognition based on LaTeX-OCR and ONNXRuntime

Formula recognition based on LaTeX-OCR and ONNXRuntime. rapid_latex_ocr is a tool to convert formula images to latex format. The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy. The repo only has codes based on ONNXRuntime or OpenVINO inference in onnx format and does not contain training model codes.

Downloads: 4 This Week

Last Update: 2024-11-03
See Project
11

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 98 This Week

Last Update: 2026-04-06
See Project
12

Extractous

Fast and efficient unstructured data extraction

...For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. It also supports OCR for images and scanned documents through Tesseract, making it useful for document ingestion pipelines that include image-based or scanned inputs.

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
13

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle

PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general ppocr_server series models, and ultra lightweight compression ppocr_mobile_slim series models. ...

Downloads: 64 This Week

Last Update: 2026-01-29
See Project
14

PaddleOCR-json

OCR offline image text recognition command line windows program

...This makes it practical for developers or system integrators who want reliable OCR output in JSON while avoiding the complexity of training or managing models by hand. Projects and wrappers built around PaddleOCR-json demonstrate how it can be integrated into other applications, such as desktop OCR utilities or language-specific bindings, because the JSON output is easy to parse and consume.

Downloads: 5 This Week

Last Update: 2026-01-15
See Project
15

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 24 This Week

Last Update: 7 days ago
See Project
16

VietOCR

Provides optical character recognition (OCR) solutions for Vietnamese language.

24 Reviews

Downloads: 175 This Week

Last Update: 2026-01-17
See Project
17

Setup PHP in GitHub Actions

GitHub action to set up PHP with extensions, php.ini configuration

GitHub action to set up PHP with extensions, php.ini configuration, coverage drivers, and various tools. Setup PHP with required extensions, php.ini configuration, code-coverage support and various tools like composer in GitHub Actions. This action gives you a cross-platform interface to set up the PHP environment you need to test your application. Refer to Usage section and examples to see how to use this. Refer to the self-hosted setup to use the action on self-hosted runners. ...

Downloads: 5 This Week

Last Update: 2026-03-15
See Project
18

Agentic Coding Flywheel Setup

System tool for beginners wanting agentic engineering capabilities

Agentic Coding Flywheel Setup (ACFS) is a comprehensive environment bootstrap project that configures a full stack of tools for autonomous AI-assisted coding workflows. With a single shell installer, ACFS transforms a fresh compute environment into a ready-to-use development setup that includes modern shells, language runtimes, AI coding agents (like Claude Code, Codex CLI, and Gemini CLI), and a coordinated toolchain for orchestration and safety.

Downloads: 3 This Week

Last Update: 2026-02-02
See Project
19

Pot Desktop

A cross-platform software for text translation and recognition

Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. ...

Downloads: 59 This Week

Last Update: 2025-11-28
See Project
20

Scanopy

Clean network diagrams, One-time setup, zero upkeep

Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines that chain together transforms, filters, and exporters, enabling automation of tedious data preparation steps and accelerating insights with minimal code. ...

Downloads: 17 This Week

Last Update: 2026-04-02
See Project
21

Paperless-ngx

A community-supported supercharged version of paperless

Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Downloads: 8 This Week

Last Update: 2026-03-21
See Project
22

Umi-OCR

Free OCR Software: No internet required, easy to use.

Support screenshots/pasting/batch importing of images, paragraph layout/excluding watermarks, scanning/generating QR codes. No need for internet connection throughout the entire process, with built-in multi language recognition library. 支持截屏/粘贴/批量导入图片，支持段落排版/排除水印，扫描/生成二维码。全程无需联网，内置多国语言识别库。

Downloads: 608 This Week

Last Update: 2025-03-26
See Project
23

Scribe.js

JavaScript OCR and text extraction for images and PDFs

Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. ...

Downloads: 11 This Week

Last Update: 2026-03-14
See Project
24

Tesseract Animator

This program animates a shaded tesseract

This program animates a shaded tesseract in a GTK window.

Downloads: 0 This Week

Last Update: 6 days ago
See Project
25

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools.

Downloads: 0 This Week

Last Update: 6 days ago
See Project