Showing 66 open source projects for "document search engine"

View related business solutions
  • Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution Icon
    Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution

    For Windows-Centric Organizations Looking for Secure File Transfer solutions

    Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.
    Learn More
  • Complete Data Management for Nonprofits Icon
    Complete Data Management for Nonprofits

    Designed to fit with multi-level non-profit organization, across any sector

    NewOrg is a robust platform built with enhanced features to help non-profit organizations that capture and integrate the information from all of their operational areas to better manage volunteers, clients, programs, outcome reporting, activity sign-ups & scheduling, communications, surveys, fundraising activities and Development campaigns. NewOrg can truly deliver an intuitive product that will help manage your Committees, Donors, Events, and Memberships so that the organization runs efficiently.
    Learn More
  • 1
    Search with Lepton

    Search with Lepton

    Lightweight demo to build a conversational AI search engine quickly

    Search with Lepton is an open source demonstration project that shows how to build a conversational search engine using the Lepton AI framework. It combines traditional web search with large language models to provide natural language answers to user queries. It retrieves information from supported search engines and uses that context to generate responses through a retrieval-augmented generation approach.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    marqo

    marqo

    Tensor search for humans

    A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    SAG

    SAG

    SQL-Driven RAG Engine

    ...These vectors allow the system to identify relationships between concepts and construct a graph representation of knowledge at runtime. The architecture also includes a three-stage retrieval pipeline consisting of recall, expansion, and reranking steps to improve search accuracy. The engine integrates semantic vector similarity with traditional full-text search to improve both recall and precision. Because the knowledge graph is generated dynamically, the system can adapt to new information without requiring manual graph maintenance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Paperless-AI

    Paperless-AI

    AI-powered document analysis and tagging for Paperless-ngx

    ...A key capability is its use of retrieval-augmented generation, which enables semantic search and natural language interaction across an entire document archive. Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Curtain LogTrace File Activity Monitoring Icon
    Curtain LogTrace File Activity Monitoring

    For any organizations (up to 10,000 PCs)

    Curtain LogTrace File Activity Monitoring is an enterprise file activity monitoring solution. It tracks user actions: create, copy, move, delete, rename, print, open, close, save. Includes source/destination paths and disk type. Perfect for monitoring user file activities.
    Learn More
  • 5
    WeKnora

    WeKnora

    LLM framework for document understanding and semantic retrieval

    ...This approach enables the system to provide more reliable answers by grounding model reasoning in the content of uploaded documents. WeKnora is designed with a modular architecture that separates components for document processing, search strategies, and model inference, allowing developers to customize or extend different parts of the pipeline. It supports knowledge base management and conversational question answering built on top of structured and unstructured documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    RAG API

    RAG API

    ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

    rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 9
    MCP Server Qdrant

    MCP Server Qdrant

    An official Qdrant Model Context Protocol (MCP) server implementation

    The Qdrant MCP Server is an official Model Context Protocol server that integrates with the Qdrant vector search engine. It acts as a semantic memory layer, allowing for the storage and retrieval of vector-based data, enhancing the capabilities of AI applications requiring semantic search functionalities. ​
    Downloads: 9 This Week
    Last Update:
    See Project
  • Professional Streaming and Video Hosting - GDPR Compliant - 3Q Icon
    Professional Streaming and Video Hosting - GDPR Compliant - 3Q

    Secure hosting, scalable streaming, and easy integration for internal and external communications

    3Q offers a multifunctional video platform for hosting, managing and distributing video and audio content on all channels. Live and on-demand.
    Learn More
  • 10
    PaperAI

    PaperAI

    Semantic search and workflows for medical/scientific papers

    PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Cherche

    Cherche

    Neural Search

    Cherche allows the creation of efficient neural search pipelines using retrievers and pre-trained language models as rankers. Cherche's main strength is its ability to build diverse and end-to-end pipelines from lexical matching, semantic matching, and collaborative filtering-based models. Cherche provides modules dedicated to summarization and question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines. Search is...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    RAGFlow

    RAGFlow

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

    RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 13
    SeaGOAT

    SeaGOAT

    local-first semantic code search engine

    SeaGOAT is an open-source semantic code search engine designed to help developers explore and understand large codebases more efficiently. Instead of relying solely on traditional keyword search, it uses vector embeddings to represent the meaning of code and queries, allowing users to perform semantic searches that find relevant code even when the exact keywords are not present.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    Elasticsearch MCP Server

    Elasticsearch MCP Server

    A Model Context Protocol (MCP) server implementation

    This MCP server implementation provides interaction capabilities with Elasticsearch and OpenSearch, enabling functionalities such as document searching, index analysis, and cluster management through a set of tools. ​
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    ...It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems. This architecture enables cost-efficient storage and elastic scaling for workloads that involve large datasets and complex queries. Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 16
    Sunfish

    Sunfish

    Sunfish: a Python Chess Engine in 111 lines of code

    sunfish is a minimalist yet surprisingly strong chess engine written in Python, designed to demonstrate how powerful algorithms can be implemented in a highly compact codebase. Despite being only around a hundred lines of core logic, the engine achieves competitive performance, reaching ratings above 2000 on online platforms. It implements classic chess engine techniques such as alpha-beta pruning and efficient board representation while maintaining readability and simplicity. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    Haystack

    Haystack

    Haystack is an open source NLP framework to interact with your data

    Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 18
    FlagEmbedding

    FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Canopy

    Canopy

    Retrieval Augmented Generation (RAG) framework

    Canopy is an open-source retrieval-augmented generation (RAG) framework developed by Pinecone to simplify the process of building applications that combine large language models with external knowledge sources. The system provides a complete pipeline for transforming raw text data into searchable embeddings, storing them in a vector database, and retrieving relevant context for language model responses. It is designed to handle many of the complex components required for a RAG workflow,...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 21
    Paperless-ngx

    Paperless-ngx

    A community-supported supercharged version of paperless

    Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 22
    huggingface_hub

    huggingface_hub

    The official Python client for the Huggingface Hub

    The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 23
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 42 This Week
    Last Update:
    See Project
  • 24
    LEANN

    LEANN

    Local RAG engine for private multimodal knowledge search on devices

    LEANN is an open source system designed to enable retrieval-augmented generation (RAG) and semantic search across personal data while running entirely on local devices. It focuses on dramatically reducing the storage overhead typically required for vector search and embedding indexes, enabling efficient large-scale knowledge retrieval on consumer hardware. LEANN introduces a storage-efficient approximate nearest neighbor index combined with on-the-fly embedding recomputation to avoid storing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Pathway AI Pipelines

    Pathway AI Pipelines

    Ready-to-run cloud templates for RAG

    Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB