117 projects for "pdf python" with 1 filter applied:

  • The most trusted software in construction Icon
    The most trusted software in construction

    HCSS is the gold standard software solution for winning, planning, and managing construction projects by connecting the office to the field.

    HCSS provides easy-to-use software built for construction companies that want to win more work, work smarter, and boost profits. For nearly 40 years, we've helped heavy civil contractors, infrastructure builders, and utility companies improve operations, from estimating and project management to field tracking, equipment maintenance, and safety. Tools like HeavyBid, HeavyJob, and HCSS Safety are built for the field and designed to work together, giving your team real-time visibility, tighter cost control, and better job outcomes. With 45+ accounting integrations and customizable APIs, HCSS fits seamlessly into your tech stack. We regularly update our software based on feedback from real crews, ensuring it fits the way your team works. Backed by award-winning 24/7/365 support and a proven implementation process, HCSS helps reduce risk, cut inefficiencies, and deliver fast ROI. If you're ready to grow your business and gain a competitive edge, HCSS is the partner that gets you there.
    Learn More
  • EasySend is a no-code platform that transforms customer journeys Icon
    EasySend is a no-code platform that transforms customer journeys

    Defy form limits. 
Create digital experiences.

    Evolve forms into smart, AI-powered digital workflows that streamline your data intake and elevate customer experiences.
    Learn More
  • 1
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists. By producing Markdown rather than raw text,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches. It implements multiple PDF...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Blackbird

    Blackbird

    OSINT tool for finding accounts across 600+ sites by username or email

    Blackbird is an open source OSINT tool designed to search for user accounts across social networks and online platforms using a username or email address. The project focuses on helping investigators, researchers, and security professionals quickly discover where a specific identity appears on the internet. It performs reverse searches across more than 600 websites by leveraging data from the community-driven WhatsMyName project, which improves detection accuracy and reduces false positives....
    Downloads: 18 This Week
    Last Update:
    See Project
  • Effortlessly Manage Product Information Icon
    Effortlessly Manage Product Information

    OneTimePIM is a comprehensive Product Information Management System designed to streamline the import and distribution of product data.

    A single source of truth for all of your product information with easy ways to distribute that data to wherever it needs to go, including the most powerful e-commerce connectors in the industry.
    Learn More
  • 5
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 6
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    PageIndex is an innovative open-source framework that reimagines retrieval-augmented generation (RAG) by eliminating conventional vector similarity search and instead building hierarchical semantic indexes that mirror a document’s natural structure. Rather than chunking text and embedding it into a vector database, PageIndex constructs a tree-structured index — similar to a detailed, AI-enhanced table of contents — that a large language model can traverse to locate the most relevant sections...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Desktop Commander MCP

    Desktop Commander MCP

    AI-powered MCP server for desktop file and terminal automation

    Desktop Commander MCP is an advanced Model Context Protocol server designed to extend AI assistants with direct control over a user’s local machine, including the file system and terminal. It integrates with clients like Claude Desktop to enable AI-driven workflows such as editing files, executing commands, and automating development tasks from a single conversational interface. Desktop Commander MCP builds on top of an MCP filesystem server and enhances it with powerful search, replace, and...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Manage your hosting business with our vacation rental software Icon
    Manage your hosting business with our vacation rental software

    Empowering your short-term rental business to succeed

    Whether you’re a new or established business, you can rely on Lodgify’s vacation rental property management software for support through every step of your journey.
    Learn More
  • 10
    Jupyter Notebook Tools for Sphinx

    Jupyter Notebook Tools for Sphinx

    Sphinx source parser for Jupyter notebooks

    nbsphinx is a Sphinx extension that provides a source parser for *.ipynb files. Custom Sphinx directives are used to show Jupyter Notebook code cells (and of course their results) in both HTML and LaTeX output. Un-evaluated notebooks – i.e. notebooks without stored output cells – will be automatically executed during the Sphinx build process.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    myGPTReader

    myGPTReader

    AI Slack bot for reading, summarizing, and chatting with content

    myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Libros de Programación en Español

    Libros de Programación en Español

    List of programming books in Spanish for free

    Libros de Programación en Español is a curated list of free programming books in Spanish, organized by topic and technology so learners can find high-quality materials without cost. The README is structured as an index with general programming books, followed by sections for specific languages such as JavaScript, TypeScript, Python, Ruby, Rust, PHP, Haskell, Go, Kotlin, Java, and R.Each entry includes the book title, author, and a link to the official or legal free version (PDF, HTML, eBook, etc.), focusing on resources that are legitimately available. Beyond languages, the list also covers frameworks and libraries (like React and Qwik), tools (such as Git), and databases (SQL), grouping them in separate sections for easier browsing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents. Built using FastAPI and the LangChain framework, the application exposes a REST API that can process documents and return...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 39,869 This Week
    Last Update:
    See Project
  • 20
    MComix

    MComix

    GTK+ comic book viewer.

    MComix is a user-friendly, customizable image viewer. It is specifically designed to handle comic books (both Western comics and manga) and supports a variety of container formats (including CBR, CBZ, CB7, CBT, LHA and PDF). MComix is a fork of Comix.
    Leader badge
    Downloads: 748 This Week
    Last Update:
    See Project
  • 21
    EasyABC

    EasyABC

    EasyABC is an open source ABC editor

    EasyABC allows the user to create, edit, view, play, convert music written in the ABC music notation language. The program was originally written in Python 2.7 and WxPython by Nils Liberg and runs on Windows, OSX, and Linux. Jan Wybren de Jong has converted to run on Python 3.8 or higher. Frédéric Aupépin has been supporting EasyABC on OSX. EasyABC depends upon other external programs like abc2midi, abcm2ps, fluidsynth. If you install the Windows or Mac executables most of these programs...
    Leader badge
    Downloads: 225 This Week
    Last Update:
    See Project
  • 22
    Apache OpenOffice

    Apache OpenOffice

    The free and Open Source productivity suite

    Free alternative for Office productivity tools: Apache OpenOffice - formerly known as OpenOffice.org - is an open-source office productivity software suite containing word processor, spreadsheet, presentation, graphics, formula editor, and database management applications. OpenOffice is available in many languages, works on all common computers, stores data in ODF - the international open standard format - and is able to read and write files in other formats, included the format used by the...
    Leader badge
    Downloads: 228,140 This Week
    Last Update:
    See Project
  • 23
    Small Python library with various things such as Configuration file parsing (in Python syntax), HTML and PDF parsing. Used in others of my projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MediaWiki To LaTeX converts MediaWiki markup to LaTeX and generates a PDF. So it provides an export from MediaWiki to LaTeX. It works with any project running MediaWiki, especially Wikipedia and Wikibooks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25

    littleutils

    Various small and useful command-line utilities

    The littleutils include duplicate file finders (repeats, repeats.pl, repeats.py), image optimizers (opt-jpg, opt-png, opt-gif, recomp-jpg), file rename tools (lowercase, uppercase, pren), archive recompressors (to-gzip, to-bzip2, to-bzip3, to-7zip, to-lzma, to-lzip, to-xz), a tempfile utility (tempname), file property tools (filedate, filemode, filenode, fileown, filesize, and lrealpath), and others. See the README file for more details.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB