Showing 2507 open source projects for "pdf tool python"

View related business solutions
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 1
    PDF Arranger

    PDF Arranger

    Small python-gtk application, to merge or split PDFs

    PDF Arranger is a small python-gtk application, which helps the user to merge or split PDF documents and rotate, crop and rearrange their pages using an interactive and intuitive graphical interface. It is a front end for pikepdf. PDF Arranger is a fork of Konstantinos Poulios’s PDF Shuffler (see Savannah or Sourceforge). It’s a humble attempt to make the project a bit more active.
    Downloads: 570 This Week
    Last Update:
    See Project
  • 2
    Malicious PDF Generator

    Malicious PDF Generator

    Generate a bunch of malicious pdf files with phone-home functionality

    Generate ten different malicious PDF files with phone-home functionality. Can be used with Burp Collaborator or Interact.sh. Used for penetration testing and/or red-teaming etc. I created this tool because I needed a third-party tool to generate a bunch of PDF files with various links.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    py-pdf-parser

    py-pdf-parser

    A Python tool to help extracting information from structured PDFs

    py-pdf-parser is a Python tool designed to help extract information from structured PDFs. It provides a simple interface to define parsing rules and extract data from PDF documents. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
    Learn More
  • 5
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    Nano PDF Editor is a minimalist, portable PDF viewer and toolkit that focuses on simplicity, speed, and ease of integration for applications that need basic PDF rendering without heavy dependencies. It provides core functionality such as page navigation, zooming, text selection, and rendering directly to native graphics surfaces, making it suitable for lightweight PDF viewing scenarios on desktop or embedded platforms. Designed to be easily embedded into larger software projects, Nano-PDF...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 7
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    python-gitlab

    python-gitlab

    A python wrapper for the GitLab API

    python-gitlab is a Python package providing access to the GitLab server API. It supports the v4 API of GitLab and provides a CLI tool (GitLab). As of 3.0.0, python-gitlab is compatible with Python 3.7+.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Instagram OSINT Tool

    Instagram OSINT Tool

    Instagram OSINT tool for gathering profile data and public posts

    InstagramOSINT is an open source intelligence (OSINT) tool designed to collect publicly accessible information from Instagram profiles. It retrieves details that are not always easily visible when browsing an Instagram account normally, allowing investigators, researchers, and developers to gather structured data about a target profile. It works by scraping publicly available profile information and extracting metadata from Instagram pages using Python.
    Downloads: 24 This Week
    Last Update:
    See Project
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 10
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 91 This Week
    Last Update:
    See Project
  • 11
    Toonily Downloader

    Toonily Downloader

    A python tool for downloading manga from Toonily

    ...It uses concurrent downloading techniques to significantly speed up the process and includes robust error handling to recover from interruptions or failed downloads. Additionally, the tool allows users to convert downloaded chapters into high-quality PDF files without re-encoding images, ensuring fidelity to the original content.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 12
    pikepdf

    pikepdf

    A Python library for reading and writing PDF, powered by QPDF

    pikepdf is a Python library allowing the creation, manipulation, and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF. Python + QPDF = “py” + “qpdf” = “pyqpdf”, which looks like a dyslexia test and is no fun to type. But say “pyqpdf” out loud, and it sounds like “pikepdf”. pikepdf is a library intended for developers who want to create, manipulate, parse, repair, and abuse the PDF format.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Ollama Python

    Ollama Python

    Ollama Python library

    ...This tool is ideal for those building AI-driven apps with local model deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    OpenAI Agents (Python)

    OpenAI Agents (Python)

    A lightweight, powerful framework for multi-agent workflows

    openai-agents-python is a library developed by OpenAI to simplify the process of creating and running agents that interact with tools and APIs using OpenAI models. It provides abstractions for tool usage, memory management, and agent workflows, enabling developers to define function-calling agents that reason through multi-step tasks. Ideal for building custom AI workflows, the library supports dynamic tool definitions and contextual memory handling.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Blackbird

    Blackbird

    OSINT tool for finding accounts across 600+ sites by username or email

    ...The tool operates primarily through a command line interface, allowing users to run automated searches and gather results from many platforms in a single process. Blackbird also includes an optional AI-powered profiling feature that analyzes discovered sites to generate behavioral and technical insights about a user’s online presence. Results from searches can be exported in formats such as PDF, CSV, or JSON for documentation or reporting purposes.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 18
    fpdf2

    fpdf2

    Simple PDF generation for Python

    fpdf2 is a library for simple & fast PDF document generation in Python. It is a fork and the successor of PyFPDF. Compared with other PDF libraries, fpdf2 is fast, versatile, easy to learn and to extend (example). It is also entirely written in Python and has very few dependencies: Pillow, defusedxml, & fontTools. It is a fork and the successor of PyFPDF.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    Anthropic SDK Python

    Anthropic SDK Python

    Provides convenient access to the Anthropic REST API from any Python 3

    The anthropic-sdk-python repository is the official Python client library for interacting with the Anthropic (Claude) REST API. It is designed to provide a user-friendly, type-safe, and asynchronous/synchronous capable interface for making chat/completion requests to models like Claude. The library includes definitions for all request and response parameters using Python typed objects, automatically handles serialization and deserialization, and wraps HTTP logic (timeouts, retries, error...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Docling

    Docling

    Get your documents ready for gen AI

    ...The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    WeebCentral Downloader

    WeebCentral Downloader

    A powerful manga downloader for WeebCentral with both GUI and CLI

    ...Users can select specific chapters, adjust download speed, and configure output formats such as PDF or CBZ, making it adaptable to different reading preferences. The tool also incorporates progress tracking and background worker threads to ensure a responsive experience during large downloads. Its modular structure separates scraping logic, interface components, and configuration management, making it maintainable and extensible.
    Downloads: 32 This Week
    Last Update:
    See Project
  • 23
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    MuPDF is a lightweight PDF, XPS, and E-book viewer. MuPDF consists of a software library, command line tools, and viewers for various platforms. The renderer in MuPDF is tailored for high-quality anti-aliased graphics. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB,...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 24
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 25
    Portia SDK Python

    Portia SDK Python

    Portia Labs Python SDK for building agentic workflows

    portia‑sdk‑python is an open-source Python SDK by Portia Labs for creating reliable, stateful, authenticated multi-agent AI workflows. It supports tool-backed agents capable of real-world interactions—like web browsing, API access, and human-in-the-loop clarifications—while maintaining transparency and auditability through structured plans and execution hooks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB