Showing 276 open source projects for "pdf search"

View related business solutions
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 1
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    Open Semantic Search is an open source research and analytics platform designed for searching, analyzing, and exploring large collections of documents using semantic search technologies. It provides an integrated search server combined with a document processing pipeline that supports crawling, text extraction, and automated analysis of content from many different sources.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 3
    rga

    rga

    rga: ripgrep, but also search in PDFs, E-Books, Office documents, etc.

    rga is a line-oriented search tool that allows you to look for a regex in a multitude of file types. rga wraps the awesome ripgrep and enables it to search in PDF, docx, sqlite, JPG, movie subtitles (mkv, mp4), etc.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Career-Ops

    Career-Ops

    AI-powered job search system built on Claude Code

    Career Ops is an open-source platform designed to help individuals manage their job search process with a structured, operations-style approach that treats career development like a pipeline. It provides a system for organizing job applications, tracking progress across different stages, and maintaining visibility into opportunities, much like a lightweight CRM tailored for job seekers. The project emphasizes clarity and accountability, enabling users to monitor applications, follow-ups, and...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Field Sales+ for MS Dynamics 365 and Salesforce Icon
    Field Sales+ for MS Dynamics 365 and Salesforce

    Maximize your sales performance on the go.

    Bring Dynamics 365 and Salesforce wherever you go with Resco’s solution. With powerful offline features and reliable data syncing, your team can access CRM data on mobile devices anytime, anywhere. This saves time, cuts errors, and speeds up customer visits.
    Learn More
  • 5
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ripgrep is a line-oriented search tool that actively searches the directory you're currently in for a regex pattern. By default, ripgrep will ignore your .gitignore and skip hidden files or directories and binary files automatically. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. ripgrep supports arbitrary input preprocessing filters which could be PDF text extraction, less supported decompression, decrypting, automatic encoding detection and so on. ...
    Downloads: 71 This Week
    Last Update:
    See Project
  • 6
    Memvid

    Memvid

    Video-based AI memory library. Store millions of text chunks in MP4

    Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.
    Downloads: 66 This Week
    Last Update:
    See Project
  • 7
    TikZ

    TikZ

    TikZ figures for concepts in physics/chemistry/ML

    Collection of 111 standalone TikZ figures for illustrating concepts in physics, chemistry, and machine learning. Check out janosh.github.io to search, sort, open in Overleaf, and download figures (PDF/SVG/PNG) from this collection.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 8
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    PEASS-ng

    PEASS-ng

    Privilege Escalation Awesome Scripts SUITE

    ...Here you will find privilege escalation tools for Windows and Linux/Unix and MacOS. Find the latest versions of all the scripts and binaries in the releases page. Check the parsers directory to transform PEASS outputs to JSON, HTML and PDF.
    Downloads: 88 This Week
    Last Update:
    See Project
  • Iris Powered By Generali - Iris puts your customer in control of their identity. Icon
    Iris Powered By Generali - Iris puts your customer in control of their identity.

    Increase customer and employee retention by offering Onwatch identity protection today.

    Iris Identity Protection API sends identity monitoring and alerts data into your existing digital environment – an ideal solution for businesses that are looking to offer their customers identity protection services without having to build a new product or app from scratch.
    Learn More
  • 10
    Desktop Commander MCP

    Desktop Commander MCP

    AI-powered MCP server for desktop file and terminal automation

    ...It integrates with clients like Claude Desktop to enable AI-driven workflows such as editing files, executing commands, and automating development tasks from a single conversational interface. Desktop Commander MCP builds on top of an MCP filesystem server and enhances it with powerful search, replace, and code editing capabilities tailored for real-world development environments. It allows users to run terminal commands with streaming output, manage long-running processes, and even execute code in memory without saving files. It also supports working with structured and document formats such as Excel, PDF, and DOCX, enabling AI to read, modify, and generate these files directly.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    PageIndex

    PageIndex

    Document Index for Vectorless, Reasoning-based RAG

    ...This reasoning-driven retrieval aligns more naturally with how humans explore complex texts, improving relevance and traceability, especially in professional domains like financial reports, legal contracts, and technical manuals. The project includes example notebooks, scripts for tree generation and search, and support for multiple document formats including PDF and markdown, with tools designed to preserve context and semantic boundaries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PandaWiki

    PandaWiki

    AI-powered open source platform for building intelligent wiki bases

    PandaWiki is an open source knowledge base system designed to help users build intelligent documentation platforms powered by large language models. It combines traditional wiki functionality with modern AI capabilities, allowing teams and individuals to create and manage product documentation, technical manuals, FAQs, and blog-style knowledge resources. PandaWiki provides tools for managing knowledge bases through an administrative interface while also generating public-facing wiki sites...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    kb

    kb

    A minimalist command line knowledge base manager

    kb is a minimalist command-line knowledge base manager that gives users a fast, organized way to collect, store, search, and retrieve notes, documents, cheatsheets, procedures, and other artifacts directly from the terminal. It was created to solve the common problem of having scattered text files or reference materials on disk that are hard to search or categorize, and it surfaces a simple CLI interface with intuitive commands for adding, viewing, editing, and deleting knowledge items. Each...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Blackbird

    Blackbird

    OSINT tool for finding accounts across 600+ sites by username or email

    ...Blackbird also includes an optional AI-powered profiling feature that analyzes discovered sites to generate behavioral and technical insights about a user’s online presence. Results from searches can be exported in formats such as PDF, CSV, or JSON for documentation or reporting purposes.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 15
    Income Tax Portal

    Income Tax Portal

    An automated tool to fetch data from income tax websites

    ...It's designed to be used by Chartered Accountants, Tax Consultants, Large corporates. Basically, anyone who wants to view all the information about multiple PANs in one Unified Dashboard. Fast, intuitive search. All the reporting needs are covered. One-click data fetching from the Income tax portal for all PAN. Including all PDF files (i.e. Notices, Challans, Attachments). Super simple and easy-to-use interface to track Demand, e-Proceeding, Return Status, and Notices. Inbuilt help on the Jargons used by the income tax portal.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. ...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 17
    Khoj

    Khoj

    An AI personal assistant for your digital brain

    Get more done with your open-source AI personal assistant. Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 18
    PasDoc

    PasDoc

    Documentation tool for ObjectPascal (Free Pascal, Lazarus, Delphi)

    PasDoc is a documentation tool for Pascal and Object Pascal source code. Documentation is generated from comments found in the source code or from external files. Many formatting @-tags are supported. Many output formats are supported, including HTML and LaTeX.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    PaperQA2

    PaperQA2

    High accuracy RAG for answering questions from scientific documents

    PaperQA2 is a package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. See our recent 2024 paper to see examples of PaperQA2's superhuman performance in scientific tasks like question answering, summarization, and contradiction detection. In this example we take a folder of research paper PDFs, magically get their metadata - including citation counts and a retraction check, then parse and cache PDFs into a...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    Calibre-Web

    Calibre-Web

    Web app for browsing, reading and downloading eBooks stored in Calibre

    ...User Interface in Brazilian, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Khmer, Polish, Russian, simplified and traditional Chinese, Spanish, Swedish, Turkish, Ukrainian. Filter and search by titles, authors, tags, series and language. Support for editing eBook metadata and deleting eBooks from Calibre library. Support for converting eBooks through Calibre binaries. Restrict eBook download to logged-in users. Support for public user registration. Send eBooks to Kindle devices with the click of a button. Support for reading eBooks directly in the browser (.txt, .epub, .pdf, .cbr, .cbt, .cbz, .djvu).
    Downloads: 20 This Week
    Last Update:
    See Project
  • 22
    Outline

    Outline

    Fastest wiki and knowledge base for growing teams

    A modern team knowledge base for your internal documentation, product specs, support answers, meeting notes, onboarding, & more. An intuitive editor with markdown support, slash commands, rich embeds, and more. Beautiful documents, without even trying. Search and share documents without ever leaving your team chat. Nest documents in a hierachy, automatically build a network of backlinks and search across everything. Onboard new team members easily through internal guides, resources, and checklists. Give new team members a leg up getting to know your product, best practices, and culture. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    Laila.Pdf

    Laila.Pdf

    A .NET6 WPF Pdfium-based viewer control and printer object.

    Experience seamless PDF viewing, printing, and interaction with this .NET 6 Pdfium-powered solution! Enjoy: ✅ Ultra-smooth scrolling for effortless navigation ✅ Precision text selection & copying ✅ Powerful search capabilities to find what you need instantly ✅ Basic PDF form support for interactive documents ✅ Reliable .NET 6 PDF printing for crisp, professional output Built on an enhanced version of PDFiumSharp, featuring added PDF form support for a more complete document experience...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DocsGPT

    DocsGPT

    Private AI platform for agents, enterprise search and RAG pipelines

    DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models. Works with Qdrant, MongoDB, and Elasticsearch and more. Deploy via Docker or Kubernetes with full data sovereignty. Build...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB