Showing 349 open source projects for "document"

View related business solutions
  • Turn traffic into pipeline and prospects into customers Icon
    Turn traffic into pipeline and prospects into customers

    For account executives and sales engineers looking for a solution to manage their insights and sales data

    Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.
    Learn More
  • Inventory and Order Management Software for Multichannel Sellers Icon
    Inventory and Order Management Software for Multichannel Sellers

    Avoid stockouts, overselling, and losing control as your business grows.

    We are the most powerful inventory and order management platform for Amazon, Walmart, and multichannel product sellers. Centralize orders, product information, and fulfillment operations to run more efficiently, sell more products, and stay compliant with marketplace requirements so you can grow profitably.
    Learn More
  • 1
    MegaParse

    MegaParse

    File Parser optimised for LLM Ingestion with no loss

    MegaParse is a file parser optimized for Large Language Model (LLM) ingestion, ensuring no loss of information. It efficiently parses various document formats, such as PDFs, DOCX, and PPTX, converting them into formats ideal for processing by LLMs. This tool is essential for applications that require accurate and comprehensive data extraction from diverse document types.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Nano PDF Editor

    Nano PDF Editor

    Edit PDF files with Nano Banana

    ...Designed to be easily embedded into larger software projects, Nano-PDF has a small code footprint and straightforward APIs that developers can call from common languages, helping it fit into text editors, document explorers, or custom user interfaces with minimal effort. The viewer strives to comply with standard PDF features, like annotations and linked bookmarks, while avoiding the complexity of full-featured document suites that prioritize editing or creation.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 3
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    ...The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. Pipelines are typically defined using a low-code YAML interface, giving users full control over prompts and processing steps while still simplifying workflow creation.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    PDFSticher

    PDFSticher

    Code repository for PDFStitcher, a utility to stitch together PDFs

    The open source PDF stitching software for sewists, by sewists. PDFSticher is a utility for stitching together many PDF pages from one document into a single page. This is also called "N-Up" or page imposition. This program was created in order to convert sewing patterns into a convenient format for projecting, though it could be used to stitch together any PDF. Since version 0.4, it is also possible to select layers for inclusion/exclusion in the final output. Additionally, line properties can be modified for each layer if the input PDF is compatible.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Premier Construction Software Icon
    Premier Construction Software

    Premier is a global leader in financial construction ERP software.

    Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.
    Learn More
  • 5
    Semantra

    Semantra

    Multi-tool for semantic search

    ...The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs from the command line and automatically launches a local web interface where users can perform interactive searches and examine document passages related to a query. By relying on semantic embeddings and contextual analysis, the tool can identify passages that are relevant even when the query uses different wording than the source documents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Salt

    Salt

    Automate the management and configuration of infrastructures at scale

    ...What systems and infrastructure can be managed by a Salt Minion? Salt runs on and manages many versions of Linux, Windows, Mac OS X and UNIX. The Salt Supported Operating System document defines the specific operating systems that are fully supported and outlines the package creation policy for each operating system listed. The document also outlines the best-effort support policy for additional operating systems. Salt Bootstrap is a shell script that detects the target platform and selects the best installation method.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 7
    RAG API

    RAG API

    ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

    ...It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    ...If the project you are modifying originates from Google, you may be directed to the English version of the project page to understand the style used by the project. The Chinese version of the project uses reStructuredText plain text markup syntax, and uses Sphinx to generate document formats such as HTML / CHM / PDF.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    PyMuPDF

    PyMuPDF

    Python bindings for MuPDF's rendering library.

    ...It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on the screen. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well). The command line tools allow you to annotate, edit, and convert documents to other formats such as HTML, SVG, PDF, and CBZ. You can also write scripts to manipulate documents using Javascript. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • Failed Payment Recovery for Subscription Businesses Icon
    Failed Payment Recovery for Subscription Businesses

    For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

    FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
    Learn More
  • 10
    llmware

    llmware

    Unified framework for building enterprise RAG pipelines

    ...The platform focuses on building secure and private AI workflows that can run locally on laptops, edge devices, or self-hosted servers without relying exclusively on cloud APIs. It provides a unified interface for constructing retrieval-augmented generation pipelines, agent workflows, and document intelligence applications. One of the framework’s defining characteristics is its collection of small specialized language models optimized for specific tasks such as summarization, classification, and document analysis. The system supports a wide range of inference backends including PyTorch, OpenVINO, ONNX Runtime, and other optimized runtimes, allowing developers to choose the most efficient execution environment for their hardware.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    ...The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. GLM-OCR integrates a comprehensive SDK and inference toolchain that makes it easy for developers to install, invoke, and embed into production pipelines with simple commands or APIs.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 13
    DeepCode

    DeepCode

    DeepCode: Open Agentic Coding

    ...The system description highlights an orchestration layer that plans, assigns subtasks, and adapts strategies as complexity changes, rather than relying on a single monolithic prompt. It also describes document parsing capabilities aimed at extracting algorithmic and mathematical details from technical materials, translating them into implementable specifications and code.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Claude Skills

    Claude Skills

    Public repository for Agent Skills

    ...Rather than relying on handcrafted prompts every time, Skills teach an AI agent procedural knowledge and task-specific workflows so it can apply that expertise reliably, whether the task involves document creation, data analysis, design generation, or technical automation. Each Skill lives in its own directory with a SKILL.md file containing metadata and instructions, and can include supplemental scripts or assets that the agent uses to perform complex operations when relevant.
    Downloads: 87 This Week
    Last Update:
    See Project
  • 16
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    ...This reasoning-driven retrieval aligns more naturally with how humans explore complex texts, improving relevance and traceability, especially in professional domains like financial reports, legal contracts, and technical manuals. The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DocTR

    DocTR

    Library for OCR-related tasks powered by Deep Learning

    ...Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document datasets, comparable with GoogleVision/AWS Textract. Easy integration (available templates for browser demo & API deployment). End-to-End OCR is achieved in docTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    PrivateGPT

    PrivateGPT

    Interact with your documents using the power of GPT

    PrivateGPT is a production-ready, privacy-first AI system that allows querying of uploaded documents using LLMs, operating completely offline in your own environment. It provides contextual generative AI capabilities without sending data externally. Now maintained under Zylon.ai with enterprise deployment options (air gapped, cloud, or on-prem).
    Downloads: 27 This Week
    Last Update:
    See Project
  • 19
    Flask RESTX

    Flask RESTX

    Fully featured framework for fast, easy and documented API development

    ...It provides a coherent collection of decorators and tools to describe your API and expose its documentation properly using Swagger. With Flask-RESTX, you only import the api instance to route and document your endpoints.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    BISHENG

    BISHENG

    BISHENG is an open LLM devops platform for next generation apps

    BISHENG is an open LLM application DevOps platform, focusing on enterprise scenarios. It has been used by a large number of industry-leading organizations and Fortune 500 companies. "Bi Sheng" was the inventor of movable type printing, which played a vital role in promoting the transmission of human knowledge. We hope that BISHENG can also provide strong support for the widespread implementation of intelligent applications. Everyone is welcome to participate.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    LongBench

    LongBench

    LongBench v2 and LongBench (ACL 25'&24')

    ...LongBench addresses this gap by providing datasets that require models to process and reason over long sequences of text across multiple tasks. The benchmark includes multiple categories such as single-document question answering, multi-document reasoning, summarization, long dialogue understanding, and code analysis. It supports bilingual evaluation in English and Chinese to assess multilingual capabilities across extended contexts. Newer versions of the benchmark introduce extremely long context windows ranging from thousands to millions of tokens, enabling researchers to test the limits of modern long-context models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    GPT Academic

    GPT Academic

    Research-oriented chatbot framework

    GPT Academic is a research-oriented chatbot framework designed to integrate large language models (LLMs) into academic workflows. It provides tools for structured document processing, citation management, and enhanced interaction with research papers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. It is capable of handling over 40 languages and is optimized to read difficult inputs such as messy handwriting and multi-column layouts. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Zeep

    Zeep

    A Python SOAP client

    ...Support for WS-Addressing headers. Support for WSSE (UserNameToken / x.509 signing) Support for asyncio via httpx. Experimental support for XOP messages. Zeep inspects the WSDL document and generates the corresponding code to use the services and types in the document. This provides an easy-to-use programmatic interface to a SOAP server. Parsing the XML documents is done by using the lxml library. This is the most performant and compliant Python XML library currently available. This results in major speed benefits when processing large SOAP responses. ...
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB