Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "document search engine"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 62
Windows 61
Mac 58
More...
BSD 30
ChromeOS 29
Mobile Operating Systems 1

Category

Artificial Intelligence 66
Software Development 4
Internet 2
Text Editors 2
Business 1
Database 1
Education 1
Scientific/Engineering 1
Social sciences 1
System 1

License

OSI-Approved Open Source 61
Other License 1

Translations

English 1
French 1
German 1

Programming Language

Python 66
JavaScript 4
TypeScript 3
Unix Shell 2
C 1
More...
C++ 1
C# 1
Go 1
Lua 1
Rust 1
S/R 1

Status

Beta 3
Pre-Alpha 1

Showing 66 open source projects for "document search engine"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution
For Windows-Centric Organizations Looking for Secure File Transfer solutions

Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.

Learn More
Complete Data Management for Nonprofits
Designed to fit with multi-level non-profit organization, across any sector

NewOrg is a robust platform built with enhanced features to help non-profit organizations that capture and integrate the information from all of their operational areas to better manage volunteers, clients, programs, outcome reporting, activity sign-ups & scheduling, communications, surveys, fundraising activities and Development campaigns. NewOrg can truly deliver an intuitive product that will help manage your Committees, Donors, Events, and Memberships so that the organization runs efficiently.

Learn More
1

Search with Lepton

Lightweight demo to build a conversational AI search engine quickly

Search with Lepton is an open source demonstration project that shows how to build a conversational search engine using the Lepton AI framework. It combines traditional web search with large language models to provide natural language answers to user queries. It retrieves information from supported search engines and uses that context to generate responses through a retrieval-augmented generation approach.

Downloads: 3 This Week

Last Update: 4 days ago
See Project
2

marqo

Tensor search for humans

A tensor-based search and analytics engine that seamlessly integrates with your applications, websites, and workflows. Marqo is a versatile and robust search and analytics engine that can be integrated into any website or application. Due to horizontal scalability, Marqo provides lightning-fast query times, even with millions of documents. Marqo helps you configure deep-learning models like CLIP to pull semantic meaning from images.

Downloads: 5 This Week

Last Update: 2026-04-02
See Project
3

SAG

SQL-Driven RAG Engine

...These vectors allow the system to identify relationships between concepts and construct a graph representation of knowledge at runtime. The architecture also includes a three-stage retrieval pipeline consisting of recall, expansion, and reranking steps to improve search accuracy. The engine integrates semantic vector similarity with traditional full-text search to improve both recall and precision. Because the knowledge graph is generated dynamically, the system can adapt to new information without requiring manual graph maintenance.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
4

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

...A key capability is its use of retrieval-augmented generation, which enables semantic search and natural language interaction across an entire document archive. Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.

Downloads: 5 This Week

Last Update: 2026-03-17
See Project
Curtain LogTrace File Activity Monitoring
For any organizations (up to 10,000 PCs)

Curtain LogTrace File Activity Monitoring is an enterprise file activity monitoring solution. It tracks user actions: create, copy, move, delete, rename, print, open, close, save. Includes source/destination paths and disk type. Perfect for monitoring user file activities.

Learn More
5

WeKnora

LLM framework for document understanding and semantic retrieval

...This approach enables the system to provide more reliable answers by grounding model reasoning in the content of uploaded documents. WeKnora is designed with a modular architecture that separates components for document processing, search strategies, and model inference, allowing developers to customize or extend different parts of the pipeline. It supports knowledge base management and conversational question answering built on top of structured and unstructured documents.

Downloads: 6 This Week

Last Update: 2 days ago
See Project
6

RAG API

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL

rag_api is an open-source REST API for building Retrieval-Augmented Generation (RAG) systems using LLMs like GPT. It lets users index documents, search semantically, and retrieve relevant content for use in generative AI workflows. Designed for rapid prototyping, it is ideal for chatbot development, document assistants, and knowledge-based LLM apps.

Downloads: 3 This Week

Last Update: 2026-03-20
See Project
7

Semantra

Multi-tool for semantic search

Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. ...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
8

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats.

Downloads: 20 This Week

Last Update: 2025-07-24
See Project
9

MCP Server Qdrant

An official Qdrant Model Context Protocol (MCP) server implementation

The Qdrant MCP Server is an official Model Context Protocol server that integrates with the Qdrant vector search engine. It acts as a semantic memory layer, allowing for the storage and retrieval of vector-based data, enhancing the capabilities of AI applications requiring semantic search functionalities.

Downloads: 9 This Week

Last Update: 2025-12-10
See Project
Professional Streaming and Video Hosting - GDPR Compliant - 3Q
Secure hosting, scalable streaming, and easy integration for internal and external communications

3Q offers a multifunctional video platform for hosting, managing and distributing video and audio content on all channels. Live and on-demand.

Learn More
10

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 9 This Week

Last Update: 2025-07-01
See Project
11

Cherche

Neural Search

Cherche allows the creation of efficient neural search pipelines using retrievers and pre-trained language models as rankers. Cherche's main strength is its ability to build diverse and end-to-end pipelines from lexical matching, semantic matching, and collaborative filtering-based models. Cherche provides modules dedicated to summarization and question answering. These modules are compatible with Hugging Face's pre-trained models and fully integrated into neural search pipelines. Search is...

Downloads: 9 This Week

Last Update: 2024-06-01
See Project
12

RAGFlow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

Downloads: 13 This Week

Last Update: 2026-02-10
See Project
13

SeaGOAT

local-first semantic code search engine

SeaGOAT is an open-source semantic code search engine designed to help developers explore and understand large codebases more efficiently. Instead of relying solely on traditional keyword search, it uses vector embeddings to represent the meaning of code and queries, allowing users to perform semantic searches that find relevant code even when the exact keywords are not present.

Downloads: 9 This Week

Last Update: 2026-03-09
See Project
14

Elasticsearch MCP Server

A Model Context Protocol (MCP) server implementation

This MCP server implementation provides interaction capabilities with Elasticsearch and OpenSearch, enabling functionalities such as document searching, index analysis, and cluster management through a set of tools.

Downloads: 6 This Week

Last Update: 2026-02-02
See Project
15

Databend

Cloud-native open source data warehouse for analytics and AI queries

...It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems. This architecture enables cost-efficient storage and elastic scaling for workloads that involve large datasets and complex queries. Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.

Downloads: 21 This Week

Last Update: 2026-03-13
See Project
16

Sunfish

Sunfish: a Python Chess Engine in 111 lines of code

sunfish is a minimalist yet surprisingly strong chess engine written in Python, designed to demonstrate how powerful algorithms can be implemented in a highly compact codebase. Despite being only around a hundred lines of core logic, the engine achieves competitive performance, reaching ratings above 2000 on online platforms. It implements classic chess engine techniques such as alpha-beta pruning and efficient board representation while maintaining readability and simplicity. ...

Downloads: 5 This Week

Last Update: 2026-03-18
See Project
17

Haystack

Haystack is an open source NLP framework to interact with your data

Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! ...

Downloads: 13 This Week

Last Update: 2026-04-01
See Project
18

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

FlagEmbedding is an open-source toolkit for building and deploying high-performance text embedding models used in information retrieval and retrieval-augmented generation systems. The project is part of the BAAI FlagOpen ecosystem and focuses on creating embedding models that transform text into dense vector representations suitable for semantic search and large language model pipelines. FlagEmbedding includes a family of models known as BGE (BAAI General Embedding), which are designed to...

Downloads: 2 This Week

Last Update: 2026-03-04
See Project
19

Canopy

Retrieval Augmented Generation (RAG) framework

Canopy is an open-source retrieval-augmented generation (RAG) framework developed by Pinecone to simplify the process of building applications that combine large language models with external knowledge sources. The system provides a complete pipeline for transforming raw text data into searchable embeddings, storing them in a vector database, and retrieving relevant context for language model responses. It is designed to handle many of the complex components required for a RAG workflow,...

Downloads: 7 This Week

Last Update: 2026-03-10
See Project
20

abogen

Generate audiobooks from EPUBs, PDFs and text with captions

abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on the go, or for users who prefer audio over reading. The repository supports handling common ebook formats and generating outputs that combine audio plus caption metadata. ...

Downloads: 16 This Week

Last Update: 2026-02-06
See Project
21

Paperless-ngx

A community-supported supercharged version of paperless

Paperless-ngx is a community-supported open-source document management system that transforms your physical documents into a searchable online archive so you can keep, well, less paper.

Downloads: 15 This Week

Last Update: 2026-03-21
See Project
22

huggingface_hub

The official Python client for the Huggingface Hub

The huggingface_hub library allows you to interact with the Hugging Face Hub, a platform democratizing open-source Machine Learning for creators and collaborators. Discover pre-trained models and datasets for your projects or play with the thousands of machine-learning apps hosted on the Hub. You can also create and share your own models, datasets, and demos with the community. The huggingface_hub library provides a simple way to do all these things with Python.

Downloads: 16 This Week

Last Update: 9 hours ago
See Project
23

vLLM

A high-throughput and memory-efficient inference and serving engine

vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.

Downloads: 42 This Week

Last Update: 2026-04-03
See Project
24

LEANN

Local RAG engine for private multimodal knowledge search on devices

LEANN is an open source system designed to enable retrieval-augmented generation (RAG) and semantic search across personal data while running entirely on local devices. It focuses on dramatically reducing the storage overhead typically required for vector search and embedding indexes, enabling efficient large-scale knowledge retrieval on consumer hardware. LEANN introduces a storage-efficient approximate nearest neighbor index combined with on-the-fly embedding recomputation to avoid storing...

Downloads: 0 This Week

Last Update: 2026-03-13
See Project
25

Pathway AI Pipelines

Ready-to-run cloud templates for RAG

Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...

Downloads: 0 This Week

Last Update: 2026-03-02
See Project

Previous
You're on page 1
2
3
Next

Related Searches

offline document management

torch chess engine

rag

dms

medical

document search engine

ragflow

paperless-ngx

vllm

web database server

Related Categories

Artificial Intelligence

Software Development

Internet

Text Editors

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise