Showing 273 open source projects for "pdf python"

View related business solutions
  • Component Content Management System for Software Documentation Icon
    Component Content Management System for Software Documentation

    Great tool for serious technical writers

    Paligo is an end-to-end Component Content Management System (CCMS) solution for technical documentation, policies and procedures, knowledge management, and more.
    Learn More
  • Queue Management System for Busy Service Providers | WaitWell Icon
    Queue Management System for Busy Service Providers | WaitWell

    The queue management system that perfectly adapts to your workflows

    The queue management system that perfectly adapts to your workflows. Improve operational efficiency in weeks with the most configurable enterprise queue system.
    Learn More
  • 1
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Libros de Programación en Español

    Libros de Programación en Español

    List of programming books in Spanish for free

    Libros de Programación en Español is a curated list of free programming books in Spanish, organized by topic and technology so learners can find high-quality materials without cost. The README is structured as an index with general programming books, followed by sections for specific languages such as JavaScript, TypeScript, Python, Ruby, Rust, PHP, Haskell, Go, Kotlin, Java, and R.Each entry includes the book title, author, and a link to the official or legal free version (PDF, HTML, eBook, etc.), focusing on resources that are legitimately available. Beyond languages, the list also covers frameworks and libraries (like React and Qwik), tools (such as Git), and databases (SQL), grouping them in separate sections for easier browsing. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • A privacy-first API that predicts global consumer preferences Icon
    A privacy-first API that predicts global consumer preferences

    Qloo AI adds value to a wide range of Fortune 500 companies in the media, technology, CPG, hospitality, and automotive sectors.

    Through our API, we provide contextualized personalization and insights based on a deep understanding of consumer behavior and more than 575 million people, places, and things.
    Learn More
  • 5
    PdfBooklet
    PdfBooklet is a Python Gtk application which allows to make books or booklets from existing pdf files. It can also adjust margins, rotate, scale, merge files or extract pages.
    Leader badge
    Downloads: 191 This Week
    Last Update:
    See Project
  • 6
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    myGPTReader

    myGPTReader

    AI Slack bot for reading, summarizing, and chatting with content

    myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Khoj

    Khoj

    An AI personal assistant for your digital brain

    Get more done with your open-source AI personal assistant. Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models. Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction...
    Downloads: 1 This Week
    Last Update:
    See Project
  • The top-rated AI recruiting platform for faster, smarter hiring. Icon
    The top-rated AI recruiting platform for faster, smarter hiring.

    Humanly is an AI recruiting platform that automates candidate conversations, screening, and scheduling.

    Humanly is an AI-first recruiting platform that helps talent teams hire in days, not months—without adding headcount. Our intuitive CRM pairs with powerful agentic AI to engage and screen every candidate instantly, surfacing top talent fast. Built on insights from over 4 million candidate interactions, Humanly delivers speed, structure, and consistency at scale—engaging 100% of interested candidates and driving pipeline growth through targeted outreach and smart re-engagement. We integrate seamlessly with all major ATSs to reduce manual work, improve data flow, and enhance recruiter efficiency and candidate experience. Independent audits ensure our AI remains fair and bias-free, so you can hire confidently.
    Learn More
  • 10
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    pdf2wordx

    pdf2wordx

    Convertir "pdf" a documentos ".docx"

    `pip install pdf2wordx` Este proyecto usa "Tkinter" V8.6 y usa "pdf2docx" V0.5.8 para realizar las conversiones de PDF a DOCX. El programa es fácil de usar, solo se dene seleccionar el archivo PDF, Bucar la carpeta donde se guardará el documento DOCX, finalmente de click en el botón "Convertir", el documento se convertirá y guardará en la ruta especificada.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    LLMStack

    LLMStack

    No-code multi-agent framework to build LLM Agents, workflows

    LLMStack is a no-code platform for building generative AI agents, workflows and chatbots, connecting them to your data and business processes. Build tailor-made generative AI agents, applications and chatbots that cater to your unique needs by chaining multiple LLMs. Seamlessly integrate your own data, internal tools and GPT-powered models without any coding experience using LLMStack's no-code builder. Trigger your AI chains from Slack or Discord. Deploy to the cloud or on-premise.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Little Book of Linear Algebra

    Little Book of Linear Algebra

    A concise, beginner-friendly introduction to the core ideas of linear

    This is a concise, beginner-friendly introduction to the fundamental concepts of linear algebra, intended to give readers intuition without overwhelming detail. The material is organized into chapters covering vectors, matrices, linear systems, vector spaces, eigenvalues/eigenvectors, and other central topics, each with worked examples and explanations. There is also a companion “LAB” section for hands-on exploration (e.g. using Python/NumPy) to help cement the connections between algebraic...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    libvips

    libvips

    A fast image processing library with low memory needs

    ...Images can have any number of bands. It supports a good range of image formats, including JPEG, JPEG2000, JPEG-XL, TIFF, PNG, WebP, HEIC, AVIF, FITS, Matlab, OpenEXR, PDF, SVG, HDR, PPM / PGM / PFM, CSV, GIF, Analyze, NIfTI, DeepZoom, and OpenSlide. It can also load images via ImageMagick or GraphicsMagick, letting it work with formats like DICOM. It comes with bindings for C, C++, and the command-line. Full bindings are available for Ruby, Python, PHP, C# / .NET, Go, and Lua.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 16
    DocsGPT

    DocsGPT

    Private AI platform for agents, enterprise search and RAG pipelines

    DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models. Works with Qdrant, MongoDB, and Elasticsearch and more. Deploy via Docker or Kubernetes with full data sovereignty. Build...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents. Built using FastAPI and the LangChain framework, the application exposes a REST API that can process documents and return...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    pdf combiner merger converter splitter

    pdf combiner merger converter splitter

    PDF Combiner is a user-friendly, GUI-based tool built in

    PDF Combiner is a user-friendly open source free to use, GUI-based tool for combining, pdf to excel, pdf to word, image to pdf, zip, unzip annotate and splitting PDF files. It is easy to use, supports multiple file insert and delete and process, and allows you to adjust the order of files before combining.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    bridgex

    bridgex

    Convert files like docx, xlsx, pptx, html, and more to MarkDown

    Bridgex is an open‑source graphical interface for converting files to Markdown, built in Python and based on Pyside6 (Qt for Python). Its objective is to simplify access to the Markitdown library through a straightforward, modular visual experience. Features ✨ - Cross‑platform graphical interface. - Efficient file‑to‑Markdown conversion. - Modularity: easy to adapt and extend. - Support for multiple input formats. - Lightweight editing prior to saving.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    dxf2gcode

    dxf2gcode

    DXF2GCODE: converting 2D dxf drawings to CNC machine compatible G-Code

    DXF2GCODE is a tool for converting 2D (dxf, pdf, ps) drawings to CNC machine compatible GCode. Windows, Linux, and Mac support by using python scripting language.
    Leader badge
    Downloads: 315 This Week
    Last Update:
    See Project
  • 25
    Scribus

    Scribus

    Powerful desktop publishing software

    Scribus is an Open Source program that brings professional page layout to Linux, BSD UNIX, Solaris, OpenIndiana, GNU/Hurd, Mac OS X, OS/2 Warp 4, eComStation, and Windows desktops with a combination of press-ready output and new approaches to page design. Underneath a modern and user-friendly interface, Scribus supports professional publishing features, such as color separations, CMYK and spot colors, ICC color management, and versatile PDF creation.
    Leader badge
    Downloads: 41,477 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB