Showing 163 open source projects for "apache pdf"

View related business solutions
  • Business password and access manager solution for IT security teams Icon
    Business password and access manager solution for IT security teams

    Simplify Access, Secure Your Business

    European businesses use Uniqkey to simplify password management, reclaim IT control and reduce password-based cyber risk. All in one super easy-to-use tool.
    Learn More
  • Queue Management System for Busy Service Providers | WaitWell Icon
    Queue Management System for Busy Service Providers | WaitWell

    The queue management system that perfectly adapts to your workflows

    The queue management system that perfectly adapts to your workflows. Improve operational efficiency in weeks with the most configurable enterprise queue system.
    Learn More
  • 1
    PDF.js

    PDF.js

    A PDF Reader in JavaScript

    PDF.js is a web standards-based platform for parsing and rendering Portable Document Formats (PDFs). Open source and built with HTML5, this PDF viewer is supported by a great community and Mozilla Labs. PDF.js can be used on both modern and older browsers, and is built into version 19+ of Firefox.
    Downloads: 80 This Week
    Last Update:
    See Project
  • 2
    Apache OpenOffice Extensions

    Apache OpenOffice Extensions

    Hundreds of ready to use Apache OpenOffice extensions

    The official catalog of Apache OpenOffice extensions. You'll find extensions ranging from dictionaries to tools to import PDF files and to connect with external databases. Extensions can improve your productivity, and are easy to use.
    Leader badge
    Downloads: 2,821 This Week
    Last Update:
    See Project
  • 3
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    ...OpenOffice is also able to export files in PDF format. OpenOffice has supported extensions, in a similar manner to Mozilla Firefox, making easy to add new functionality to an existing OpenOffice installation.
    Leader badge
    Downloads: 225,187 This Week
    Last Update:
    See Project
  • 4
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...
    Downloads: 13 This Week
    Last Update:
    See Project
  • GR4VY: Payment Orchestration Platform Icon
    GR4VY: Payment Orchestration Platform

    Payment orchestration platform that connects PSPs, methods, and tools in one layer, streamlining payments and increasing success rates.

    Gr4vy’s payment orchestration platform empowers enterprise merchants and platforms to optimize their stack and create bespoke checkout experiences, giving you full control over your payment strategy.
    Learn More
  • 5
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    HummusJS

    HummusJS

    Node.js module for high performance creation and modification of PDFs

    PDFWriter latest release (4.5.12) includes support for fonts that contain Emojis. Notable examples for Emoji fonts are Windows Segoe UI emoji and Google Noto font. This means that writing text that include emojis will result in lovely colorful emojis, rather than black and white representations. PDFHummus is a fast and free PDF Writing, Parsing and Modification library.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    Vanilla.PDF

    Vanilla.PDF

    Cross-platform SDK for creating and modifying PDF documents

    Vanilla.PDF is a modern, high-performance, open-source C++17 SDK designed for creating, editing, signing, and analyzing PDF documents across multiple platforms. It requires no external runtime dependencies, making it lightweight and ideal for embedding into desktop applications, servers, or automation pipelines. The SDK offers full cross-platform support including Windows, Linux, macOS, and Android, with builds available for major compilers and architectures. Vanilla.PDF supports advanced...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Tesseract OCR

    Tesseract OCR

    Open Source OCR Engine

    Tesseract is an open source OCR or optical character recognition engine and command line program. OCR is a technology that allows for the recognition of text characters within a digital image. With the latest version of Tesseract, there is a greater focus on line recognition, however it still supports the legacy Tesseract OCR engine which recognizes character patterns. Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports...
    Downloads: 3,236 This Week
    Last Update:
    See Project
  • Monitor production, track downtime and improve OEE. Icon
    Monitor production, track downtime and improve OEE.

    For manufacturing companies interested in OEE monitoring solutions

    Evocon is a visual and user-friendly OEE software that helps manufacturing companies improve productivity and remove waste as they become better.
    Learn More
  • 10
    GROBID

    GROBID

    A machine learning software for extracting information

    GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor based on JavaFX 20

    Asciidoc Editor and Toolchain written with JavaFX 19

    Asciidoc FX is a WYSIWYG editor for the Asciidoc markup language. You can build PDF, Epub, and HTML books, documents, and slides. Supported Operating Systems and Builds shows the list of available builds with links for reference. If you are looking for the very latest version, visit the link in the note above to be guaranteed of downloading the latest and greatest version of AsciidocFX. AsciidocFX converts documents via the AsciidoctorJ library.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists. By producing Markdown rather than raw text,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    MetaScreener

    MetaScreener

    AI-powered tool for efficient abstract and PDF screening

    MetaScreener is an open-source AI-assisted tool designed to streamline the screening process in systematic literature reviews and academic research workflows. The system helps researchers analyze large collections of academic abstracts and research papers to determine which studies are relevant for inclusion in evidence synthesis projects. Instead of manually reviewing hundreds or thousands of documents, researchers can use MetaScreener to apply machine learning techniques that assist with...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 16
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 17
    Resume-Matcher

    Resume-Matcher

    Improve your resumes with Resume Matcher

    Resume-Matcher is a command-line application that compares resumes against job descriptions using natural language processing. It provides a compatibility score based on keyword relevance and highlights areas where the resume aligns—or doesn't—with the target role. Designed for job seekers and HR professionals, it helps improve resume tailoring and streamlines candidate screening.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    dvisvgm

    dvisvgm

    A fast DVI, EPS, and PDF to SVG converter

    The command-line utility dvisvgm is a tool for TEX/LATEX users. It converts DVI, EPS, and PDF files to the XML-based vector graphics format SVG. In contrast to bitmap graphics, vector graphics are arbitrarily scalable without loss of quality. All modern web browsers support a large amount of the current SVG standard 1.1. Furthermore, SVG files can also be displayed with the Java-based Squiggle SVG browser which is part of the Apache Batik project, and the free vector graphics editor Inkscape.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    deepdoctection

    deepdoctection

    A Repo For Document AI

    DeepDoctection is a document AI framework that applies deep learning techniques to analyze and extract structured data from scanned documents, PDFs, and images. deepdoctection is a Python library that orchestrates document extraction and document layout analysis tasks using deep learning models. It does not implement models but enables you to build pipelines using highly acknowledged libraries for object detection, OCR and selected NLP tasks and provides an integrated frameworks for...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    docker-maven-plugin

    docker-maven-plugin

    Maven plugin for running and creating Docker images

    This is a Maven plugin for building Docker images and managing containers for integration tests. It works with Maven 3.0.5 and Docker 1.6.0 or later.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Visual Regression Tracker

    Visual Regression Tracker

    Backend and Frontend application for tracking differences via image

    Open source, self-hosted solution for visual testing and managing results of visual testing. Service receives images, performs pixel-by-pixel comparisons with its previously accepted baseline, and provides immediate results in order to catch unexpected changes. Use implemented libraries to integrate with existing automated suites by adding assertions based on image comparison. We provide native integration with automation libraries, core SDK and Rest API interfaces that allow the system to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    PRDownloader

    PRDownloader

    A file downloader library for Android with pause and resume support

    A file downloader library for Android with pause and resume support. PRDownloader can be used to download any type of files like image, video, pdf, apk and etc. This file downloader library supports pause and resume while downloading a file. Supports large file download. This downloader library has a simple interface to make download request. We can check if the status of downloading with the given download Id. PRDownloader gives callbacks for everything like onProgress, onCancel, onStart,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    QPDF

    QPDF

    PDF transformation/manipulation program + library

    QPDF is a C++ library and set of programs that inspect and manipulate the structure of PDF files. It can encrypt and linearize files, expose the internals of a PDF file, and do many other operations useful to end users and PDF developers.
    Leader badge
    Downloads: 1,022 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB