pdf free download - SourceForge

Showing 9 open source projects for "pdf"

View related business solutions

OCR Python Clear Filters & Widen Search

Secure Online Fax and Business Text Messaging Service
Elevate your business communications with secure SMS and fax solutions.

Send and receive SMS and fax online, from email, app or with our developer friendly SMS & fax API. HIPAA compliant & ISO 27001 certified. Outstanding value and 5-star service.

Learn More
Your go-to FinOps platform
Analyze, optimize, and govern your multi-cloud environment effortlessly with AI Agentic FinOps.

Unlike reporting-only FinOps tools, FinOpsly unifies cloud (AWS, Azure, GCP), data (Snowflake, Databricks, BigQuery), and AI costs into a single system of action — enabling teams to plan spend before it happens, automate optimization safely, and prove value in weeks, not quarters.

Learn More
1

OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files

OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.

Downloads: 112 This Week

Last Update: 2026-04-06
See Project
2

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 13 This Week

Last Update: 9 hours ago
See Project
3

Umi-OCR

OCR software, free and offline

...It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 37 This Week

Last Update: 2026-01-15
See Project
4

Papermerge

Open Source Document Management System for Digital Archives

...Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.

Downloads: 18 This Week

Last Update: 2025-07-24
See Project
Ango Hub | All-in-one data labeling platform
For AI teams and Computer Vision team in organizations of all size

AI-Assisted features of the Ango Hub will automate your AI data workflows to improve data labeling efficiency and model RLHF, all while allowing domain experts to focus on providing high-quality data.

Learn More
5

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...

Downloads: 8 This Week

Last Update: 2026-02-03
See Project
6

Paperless-ng

A supercharged version of paperless, scan, index and archive docs

Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...

Downloads: 0 This Week

Last Update: 2022-03-04
See Project
7

Linux-Intelligent-Ocr-Solution

Easy-OCR solution and Tesseract trainer for GNU/Linux

Linux-intelligent-ocr-solution Lios is a free and open source software for converting print in to text using either scanner or a camera, It can also produce text out of scanned images from other sources such as Pdf, Image, Folder containing Images or screenshot. Program is given total accessibility for visually impaired. A Tesseract Trainer GUI is also shipped with this package. Forum : https://groups.google.com/forum/#!forum/lios Video Tutorial : https://www.youtube.com/playlist?list=PLn29o8rxtRe1zS1r2-yGm1DNMOZCgdU0i Tesseract Training Tutorial (beta) : https://www.youtube.com/watch?...

5 Reviews

Downloads: 10 This Week

Last Update: 2020-10-19
See Project
8

RadicalSpam Virtual Appliance

Virtual Appliance of RadicalSpam

RadicalSpam Virtual Appliance takes full solution of RadicalSpam Community Edition , pre-installed in a OVF virtual machine ( Open Virtual Format ) compatible with the best virtualization platforms on the market , including VMware ESX Server. More information : http://www.radical-spam.org

Downloads: 0 This Week

Last Update: 2015-11-12
See Project
9

OCR Reader

The tool supports template-based parsing, allowing structured output i

OCR Reader is a lightweight Windows utility designed to extract text from PDF files and images using OCR (Tesseract engine). The tool supports template-based parsing, allowing structured output into CSV or TXT without manual coding. Core components Tesseract OCR engine Poppler (PDF rendering) Template-based extraction system Homepage: https://martan1484.github.io/OCR_Reader

Downloads: 0 This Week

Last Update: 18 hours ago
See Project
Attack Surface Management | Criminal IP ASM
For security operations, threat-intelligence and risk teams wanting a tool to get access to auto-monitored assets exposed to attack surfaces

Criminal IP’s Attack Surface Management (ASM) is a threat-intelligence–driven platform that continuously discovers, inventories, and monitors every internet-connected asset associated with an organization, including shadow and forgotten resources, so teams see their true external footprint from an attacker’s perspective. The solution combines automated asset discovery with OSINT techniques, AI enrichment and advanced threat intelligence to surface exposed hosts, domains, cloud services, IoT endpoints and other Internet-facing vectors, capture evidence (screenshots and metadata), and correlate findings to known exploitability and attacker tradecraft. ASM prioritizes exposures by business context and risk, highlights vulnerable components and misconfigurations, and provides real-time alerts and dashboards to speed investigation and remediation.

Learn More