Search Results for "document layout recognition"

Sort By:

Showing 139 open source projects for "document layout recognition"

View related business solutions

Dragonfly | An In-Memory Data Store without Limits
Dragonfly Cloud is engineered to handle the heaviest data workloads with the strictest security requirements.

Dragonfly is a drop-in Redis replacement that is designed for heavy data workloads running on modern cloud hardware. Migrate in less than a day and experience up to 25X the performance on half the infrastructure.

Learn More
Safety Compliance Made Easy
SiteDocs is a digital safety management software used to support work site compliance.

Ideally designed for business that deals with Construction, Oil & Gas, Mining, Manufacturing, Mechanical, Electrical, Plumbing, Heating, and Excavating, SiteDocs is a perfect solution for any size business looking to modernize the way Safety Compliance is organized.

Learn More
1

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 10 This Week

Last Update: 1 day ago
See Project
2

Docling

Get your documents ready for gen AI

Docling is an open-source document processing toolkit built to prepare diverse content types for modern generative AI and data workflows. The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. ...

Downloads: 8 This Week

Last Update: 4 days ago
See Project
3

dots.ocr

Multilingual Document Layout Parsing in a Single Vision-Language Model

dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks.

Downloads: 1 This Week

Last Update: 2026-03-24
See Project
4

docext

An on-premises, OCR-free unstructured data extraction

docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual and textual information directly from document images. ...

Downloads: 2 This Week

Last Update: 2026-03-12
See Project
Apify is a full-stack web scraping and automation platform helping anyone get value from the web.
Get web data. Build automations.

Actors are serverless cloud programs that extract data, automate web tasks, and run AI agents. Developers build them using JavaScript, Python, or Crawlee, Apify's open-source library. Build once, publish to Store, and earn when others use it. Thousands of developers do this - Apify handles infrastructure, billing, and monthly payouts.

Learn More
5

DeepSeek-OCR

Contexts Optical Compression

DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...

Downloads: 7 This Week

Last Update: 2026-01-27
See Project
6

deepdoctection

A Repo For Document AI

DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for fine-tuning, evaluating and running models. ...

Downloads: 2 This Week

Last Update: 2026-04-09
See Project
7

Open Semantic Search

Open source semantic search and text analytics for large document sets

...It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.

Downloads: 3 This Week

Last Update: 17 hours ago
See Project
8

Umi-OCR

OCR software, free and offline

...The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 54 This Week

Last Update: 2026-01-15
See Project
9

canvas-editor

Canvas-based WYSIWYG rich text editor with advanced layout tools

canvas-editor is a browser-based rich text editor that renders content using HTML5 Canvas and SVG instead of traditional DOM-based approaches. It is designed to provide a WYSIWYG editing experience similar to word processors, enabling precise control over layout, rendering, and document structure. canvas-editor supports a wide range of formatting and document features, including text styling, tables, images, and embedded elements, all managed through a structured data model. Its architecture is modular, allowing developers to extend functionality through plugins, custom commands, and event hooks. ...

Downloads: 6 This Week

Last Update: 8 hours ago
See Project
Job Evaluation and Talent Management Software
For human resources departments in search of a tool to manage time, expenses, leave, documents, recruitment, and onboarding

Encompassing Visions (ENCV), industry-leading job evaluation and pay equity software, is the best choice for organizations requiring transparent, comprehensive, and objective Job Evaluation software designed to help them ensure equal pay for work of equal value.

Learn More
10

SILE

The SILE Typesetter — Simon’s Improved Layout Engine

SILE is a typesetting system; its job is to produce beautiful printed documents. Conceptually, SILE is similar to TeX—from which it borrows some concepts and even syntax and algorithms—but the similarities end there. Rather than being a derivative of the TeX family SILE is a new typesetting and layout engine written from the ground up using modern technologies and borrowing some ideas from graphical systems such as InDesign.

Downloads: 4 This Week

Last Update: 2025-05-31
See Project
11

DocTR

Library for OCR-related tasks powered by Deep Learning

...Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract. Easy integration (available templates for browser demo & API deployment). End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). ...

Downloads: 3 This Week

Last Update: 2026-02-04
See Project
12

Chandra

OCR model for complex documents with layout-aware structured outputs

Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. It is capable of handling over 40 languages and is optimized to read difficult inputs such as messy handwriting and multi-column layouts. ...

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
13

Leku

Map location picker component for Android

Map location picker component for Android. Based on Google Maps. An alternative to Google Place Picker. Component library for Android that uses Google Maps and returns a latitude, longitude and an address based on the location picked with the Activity provided. Note that you have the voice_search_extra_language that is used for the language of the voice recognition. Replace it with the allowed voice recognition locale for your language. We encourage you to add these languages to this...

Downloads: 0 This Week

Last Update: 2026-01-13
See Project
14

Docspell

Assist in organizing your piles of documents

Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents are...

Downloads: 3 This Week

Last Update: 2025-03-15
See Project
15

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...

Downloads: 11 This Week

Last Update: 2026-04-08
See Project
16

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
17

vitae

R Markdown Résumés and CVs

vitae is an R package that streamlines resume and CV creation via R Markdown. It includes a collection of LaTeX and HTML templates along with helper functions to dynamically populate content from data sources such as ORCID or spreadsheets.

Downloads: 0 This Week

Last Update: 2025-07-30
See Project
18

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. ...

Downloads: 1 This Week

Last Update: 2026-04-08
See Project
19

Pix2Text

Open-Source Python3 tool for recognizing layouts, tables, and math

An Open-Source Python3 tool for recognizing layouts, tables, math formulas, and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported. Pix2Text (P2T) aims to be a free and open-source Python alternative to Mathpix, and it can already accomplish Mathpix's core functionality. Pix2Text (P2T) can recognize layouts, tables, images, text, and mathematical...

Downloads: 9 This Week

Last Update: 2026-02-07
See Project
20

DocStrange

Extract and convert data from any document, images, pdfs, word doc

DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
21

pdfly

CLI tool to extract (meta)data from PDF and manipulate PDF files

A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.

Downloads: 5 This Week

Last Update: 2025-10-13
See Project
22

OpenDataLoader PDF

PDF Parser for AI-ready data. Automate PDF accessibility

OpenDataLoader PDF is an open-source document processing system designed to convert complex PDF files into structured, AI-ready formats such as Markdown, JSON, and HTML while preserving layout, hierarchy, and semantic meaning. It focuses on enabling downstream use cases like retrieval-augmented generation (RAG), knowledge extraction, and document intelligence pipelines by maintaining accurate reading order and spatial metadata through bounding boxes.

Downloads: 9 This Week

Last Update: 2026-04-03
See Project
23

Collabora Online

Collabora Online is a collaborative online office suite

Collabora Online is a powerful online office suite that you can integrate into your own infrastructure or access via one of our trusted hosting Partners. Your digital sovereignty is our priority. We provide you with all the tools to keep your data secure, without compromising on features. Collabora Online’s text document editor provides a true WYSIWYG editing experience, making visualizing your document layout incredibly easy. Open any document, add comments and track changes from anywhere, with anyone. Format and style your pages with endless options. From simple spreadsheets and calculations to advanced formulas, Calc can do it all. Create giant spreadsheets with up to 16k columns, and add charts, sparklines, and hyperlinks. ...

Downloads: 4 This Week

Last Update: 2026-03-22
See Project
24

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 2 This Week

Last Update: 2025-07-01
See Project
25

GLM-4.5V

GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

...It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding, and long-document interpretation. GLM-4.5V emerged from a training framework that leverages scalable reinforcement learning (with curriculum sampling) to boost performance across tasks ranging from STEM problem solving to long-context reasoning, giving it broad applicability beyond narrow benchmarks. ...

Downloads: 1 This Week

Last Update: 2026-04-06
See Project