Page 3 | Best Open Source Semantic Search Tools 2026

Semantic Search Tools

View 42 business solutions

Semantic Search Clear Filters

Failed Payment Recovery for Subscription Businesses
For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.

Learn More
Turn traffic into pipeline and prospects into customers
For account executives and sales engineers looking for a solution to manage their insights and sales data

Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.

Learn More
1

DOSE

DOSE: a distributed platform for semantic elaboration that provides semantic services such as automatic annotation of web resources at the document substructure level, semantic search facilities, semantic annotation storage and retrieval.

Downloads: 0 This Week

Last Update: 2013-06-04
See Project
2

Eigenfocus

Self-Hosted - Project Management, Planning and Time Tracker

Eigenfocus is an AI-powered personal knowledge management system that uses embeddings and semantic search to help users organize and retrieve ideas across documents. Designed for researchers and creatives, it enables deep linking between notes and supports querying based on meaning rather than keywords.

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
3

Kernel Memory

Research project. A Memory solution for users, teams, and applications

Kernel Memory is an open-source reference architecture developed by Microsoft to help developers build memory systems for AI applications powered by large language models. The project focuses on enabling applications to store, index, and retrieve information so that AI systems can incorporate external knowledge when generating responses. It supports scenarios such as document ingestion, semantic search, and retrieval-augmented generation, allowing language models to answer questions using contextual information from private or enterprise datasets. Kernel Memory can ingest documents in multiple formats, process them into embeddings, and store them in searchable indexes. Applications can then query these indexed data sources to retrieve relevant information and include it as context for AI responses.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
4

LEANN

Local RAG engine for private multimodal knowledge search on devices

LEANN is an open source system designed to enable retrieval-augmented generation (RAG) and semantic search across personal data while running entirely on local devices. It focuses on dramatically reducing the storage overhead typically required for vector search and embedding indexes, enabling efficient large-scale knowledge retrieval on consumer hardware. LEANN introduces a storage-efficient approximate nearest neighbor index combined with on-the-fly embedding recomputation to avoid storing large embedding vectors. By recomputing embeddings during queries and using compact graph-based indexing structures, LEANN can maintain high search accuracy while minimizing disk usage. It aims to act as a unified personal knowledge layer that connects different types of data such as documents, code, images, and other local files into a searchable context for language models.

Downloads: 0 This Week

Last Update: 2026-03-13
See Project
Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight
Lock Down Any Resource, Anywhere, Anytime

CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.

Learn More
5

Language Models

Explore large language models in 512MB of RAM

languagemodels is a lightweight Python library designed to simplify experimentation with large language models while maintaining extremely low hardware requirements. The project focuses on enabling developers and students to explore language model capabilities without needing expensive GPUs or large cloud infrastructures. By using small and optimized models, the library allows LLM inference to run in environments with limited resources, sometimes requiring only a few hundred megabytes of memory. The package provides simple APIs that allow developers to generate text, perform semantic search, classify text, and answer questions using local models. It is particularly useful for educational purposes, as it demonstrates the fundamental mechanics of language model inference and prompt-based applications. The repository includes multiple example applications such as chatbots, document question answering systems, and information retrieval tools.

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
6

MindSearch

An LLM-based Multi-agent Framework of Web Search Engine

MindSearch is an AI-powered search engine based on large language models (LLMs) designed for deep semantic search and retrieval. It leverages InternLM's language model to understand complex queries and retrieve highly relevant answers from large datasets.

Downloads: 0 This Week

Last Update: 2025-03-13
See Project
7

MyMedia Peer: Mobile P2P Media Services

Jointly search, share and experience media in mobile P2P networks

The MyMedia Peer supports search, sharing and experiencing of semantically annotated media in unstructured P2P networks. The API provides interfaces for semantic service coordination in unstructured P2P networks. An implementation for mobile Android devices is given, which comprises the components: - semantic service selector iSeM (1.1) - semantic service planner OWLS-XPlan 2 - semantic search and replication in P2P networks: S2P2P and DSDR The MyMedia Peer, mobile service selector iSeM (1.1), S2P2P and DSDR were developed by Patrick Kapahnke, Xiaoqi Cao and PD Dr. Matthias Klusch at the German Research Center for Artificial Intelligence DFKI GmbH (http://www.dfki.de) in Saarbrücken, Germany. Copyright: DFKI, 2014, All Rights Reserved. For bug reports, technical problems and feature requests please contact: Patrick Kapahnke: patrick.kapahnke@dfki.de For general scientific inquiries please contact: PD Dr. Matthias Klusch: klusch@dfki.de

Downloads: 0 This Week

Last Update: 2015-03-11
See Project
8

MyScaleDB

A @ClickHouse fork that supports high-performance vector search

MyScaleDB is an open-source SQL vector database designed for building large-scale AI and machine learning applications that require both analytical queries and semantic vector search. The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. MyScaleDB enables developers to perform vector similarity searches using standard SQL syntax, eliminating the need to learn specialized vector database query languages. The database is optimized for high performance and scalability, allowing it to handle extremely large datasets and high query loads typical of production AI applications.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
9

Paul Graham GPT

RAG on Paul Graham's essays

Paul Graham GPT is a specialized AI-powered search and chat app built on a corpus of essays from Paul Graham, giving users the ability to query and discuss his writings in a conversational way. The repo stores the full text of his essays (chunked), uses embeddings (e.g. via OpenAI embeddings) to allow semantic search over that corpus, and hosts a chat interface that combines retrieval results with LLM-based answering — enabling RAG (retrieval-augmented generation) over a fixed dataset. The app uses a Postgres database (with pgvector) hosted on Supabase for its embedding store, making the backend relatively simple and accessible, and the frontend is again built with Next.js/TypeScript for a modern responsive UI. By pulling together search and chat, it creates a useful tool both for readers who want to revisit or explore Paul Graham’s ideas thematically, and for learners or researchers who want to query specific essays or concepts quickly.

Downloads: 0 This Week

Last Update: 2025-12-08
See Project
Simplify Purchasing For Your Business
Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.

Learn More
10

Qualipso-A3-A4-XFSearch

This project provides cross-forge semantic search for the Qualipso Forge. It integrates A4 AdvDoc prototype (semantic search GUI and engine) with A3 homogeneous and heterogeneous cross-forge semantic search capabilities. See Qualipso.org for details

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
11

RAG from Scratch

Demystify RAG by building it from scratch

RAG From Scratch is an educational open-source project designed to teach developers how retrieval-augmented generation systems work by building them step by step. Instead of relying on complex frameworks or cloud services, the repository demonstrates the entire RAG pipeline using transparent and minimal implementations. The project walks through key concepts such as generating embeddings, building vector databases, retrieving relevant documents, and integrating the retrieved context into language model prompts. Each example is written with detailed explanations so that developers can understand the internal mechanics of semantic search and context-aware language generation. The repository emphasizes learning through direct implementation, allowing users to see how each component of the RAG architecture functions independently.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
12

Really Simple Search Engine

RSSE: Really Simple Search Engine is a C#.NET library, that allows indexing of RSS feeds for use in a search engine. RSSE provides semantic search, custom weight functions for keywords, and binary operators in search queries (AND, OR)...

Downloads: 0 This Week

Last Update: 2014-04-23
See Project
13

Social Semantic Search and Browsing

S3B - Social Semantic Search and Browsing - is a middleware that delivers a set of search and browsing components that can be used in J2EE web applications to deliver user-oriented features based on semantic descriptions and social networking

Downloads: 0 This Week

Last Update: 2013-04-25
See Project
14

Social Share

Started as a social bookmarking platform aimed at education professionals. The system allows posting of links and files, liking posts, adding tags for semantic search etc. Very early stages of development - probably quite a few holes in the code!

Downloads: 0 This Week

Last Update: 2013-04-11
See Project
15

UForm

Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained features but often neglect fine-grained ones. This type of models is well-suited for retrieval in large collections. The most famous example of such models is CLIP by OpenAI. Early-fusion models encode both modalities jointly so they can take into account fine-grained features. Usually, these models are used for re-ranking relatively small retrieval results. Mid-fusion models are the golden midpoint between the previous two types. Mid-fusion models consist of two parts – unimodal and multimodal.

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
16

Use Vim as IDE

use vim as IDE

Use Vim As IDE is a comprehensive configuration repository (by YangYangWithGnu) that guides you how to turn Vim into a full-fledged Integrated Development Environment (IDE). The project isn’t just a single plugin; it’s more like a curated set of plugins, configuration tips, and workflow suggestions to enable syntax highlighting, smart code completion, project navigation, semantic search, file-switching, build-integration, undo-history, templating and more—particularly geared toward C/C++ development, but with many ideas applicable more broadly. The documentation is long and detailed, walking users from the fundamentals of Vim configuration (.vimrc, plugin management) through higher-order capabilities like semantic navigation and project toolchain integration. The philosophy: Vim already offers “what you need when you need it; what you want when you want it” and this repo shows how to tap that potential.

Downloads: 0 This Week

Last Update: 2025-10-14
See Project
17

askaitools-community-edition

A cutting-edge search engine project tailored specifically for AI apps

Our mission is to revolutionize the way users discover AI products by providing the most accurate, comprehensive, lightning-fast, and intelligent search experience. Developers can effortlessly integrate their own data on top of this framework, enabling them to swiftly build specialized vertical search engines or internal document search systems for their organizations. Under the hood, AskAITools employs a hybrid search engine architecture, seamlessly combining keyword search (full-text search) and semantic search (vector search/embedding search) capabilities. By leveraging statistical data and weighted fusion techniques, it achieves a balance between relevance and popularity. Project Architecture and Tech Stack - Front-end: Next.js - Deployment: Vercel - Styling: Tailwind CSS - Database: Supabase - Keyword Search: PostgreSQL Full-Text Search Engine - Semantic Search: Pgvector Vector Database - Semantic Vector Generation: OpenAI text-embedding-3 model

Downloads: 0 This Week

Last Update: 2024-07-18
See Project
18

bge-base-en-v1.5

Efficient English embedding model for semantic search and retrieval

bge-base-en-v1.5 is an English sentence embedding model from BAAI optimized for dense retrieval tasks, part of the BGE (BAAI General Embedding) family. It is a fine-tuned BERT-based model designed to produce high-quality, semantically meaningful embeddings for tasks like semantic similarity, information retrieval, classification, and clustering. This version (v1.5) improves retrieval performance and stabilizes similarity score distribution without requiring instruction-based prompts. With 768 embedding dimensions and a maximum sequence length of 512 tokens, it achieves strong performance across multiple MTEB benchmarks, nearly matching larger models while maintaining efficiency. It supports use via SentenceTransformers, Hugging Face Transformers, FlagEmbedding, and ONNX for various deployment scenarios. Typical usage includes normalizing output embeddings and calculating cosine similarity via dot product for ranking.

Downloads: 0 This Week

Last Update: 2025-07-01
See Project
19

bge-large-en-v1.5

BGE-Large v1.5: High-accuracy English embedding model for retrieval

BAAI/bge-large-en-v1.5 is a powerful English sentence embedding model designed by the Beijing Academy of Artificial Intelligence to enhance retrieval-augmented language model systems. It uses a BERT-based architecture fine-tuned to produce high-quality dense vector representations optimized for sentence similarity, search, and retrieval. This model is part of the BGE (BAAI General Embedding) family and delivers improved similarity distribution and state-of-the-art results on the MTEB benchmark. It is recommended for use in document retrieval tasks, semantic search, and passage reranking, particularly when paired with a reranker like BGE-Reranker. The model supports inference through multiple frameworks, including FlagEmbedding, Sentence-Transformers, LangChain, and Hugging Face Transformers. It accepts English text as input and returns normalized 1024-dimensional embeddings suitable for cosine similarity comparisons.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
20

bge-small-en-v1.5

Compact English sentence embedding model for semantic search tasks

BAAI/bge-small-en-v1.5 is a lightweight English sentence embedding model developed by the Beijing Academy of Artificial Intelligence (BAAI) as part of the BGE (BAAI General Embedding) series. Designed for dense retrieval, semantic search, and similarity tasks, it produces 384-dimensional embeddings that can be used to compare and rank sentences or passages. This version (v1.5) improves similarity distribution, enhancing performance without the need for special query instructions. The model is optimized for speed and efficiency, making it suitable for resource-constrained environments. It is compatible with popular libraries such as FlagEmbedding, Sentence-Transformers, and Hugging Face Transformers. The model achieves competitive results on the MTEB benchmark, especially in retrieval and classification tasks. With only 33.4M parameters, it provides a strong balance of accuracy and performance for English-only use cases.

Downloads: 0 This Week

Last Update: 2025-07-02
See Project
21

eagle-i

eagle-i is an ontology-driven, RDF-based distributed platform for creating, storing and searching semantically rich data. eagle-i is built around semantic web technologies and adheres to linked open data principles.

Downloads: 0 This Week

Last Update: 2014-01-27
See Project
22

finetuner

Task-oriented finetuning for better embeddings on neural search

Fine-tuning is an effective way to improve performance on neural search tasks. However, setting up and performing fine-tuning can be very time-consuming and resource-intensive. Jina AI’s Finetuner makes fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure in the cloud. With Finetuner, you can easily enhance the performance of pre-trained models, making them production-ready without extensive labeling or expensive hardware. Create high-quality embeddings for semantic search, visual similarity search, cross-modal text image search, recommendation systems, clustering, duplication detection, anomaly detection, or other uses. Bring considerable improvements to model performance, making the most out of as little as a few hundred training samples, and finish fine-tuning in as little as an hour.

Downloads: 0 This Week

Last Update: 2023-08-21
See Project
23

hora

Efficient approximate nearest neighbor search algorithm collections

hora is an open-source high-performance vector similarity search library designed for large-scale machine learning and information retrieval systems. The project focuses on approximate nearest neighbor search, a fundamental technique used in modern AI applications such as recommendation systems, image search, and semantic search engines. Hora implements multiple efficient indexing algorithms that allow systems to rapidly search through high-dimensional vectors produced by machine learning models. These vectors are commonly generated by neural networks to represent images, text, audio, or other data types in a mathematical embedding space. The library is written in Rust and emphasizes performance, safety, and efficient memory management, making it suitable for production-grade applications requiring low latency and high throughput.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
24

iSearch-Image Search Engine

Adopting Web2.0-wisdom of crowds, Generate New Generation of Image Semantic Search Engine Based on XML Technology, RDF knowledge warehouse.

Downloads: 0 This Week

Last Update: 2013-04-08
See Project
25

rag-search

RAG Search API

rag-search is a lightweight Retrieval-Augmented Generation API service designed to provide structured semantic search and answer generation through a simple FastAPI backend. The project integrates web search, vector embeddings, and reranking logic to retrieve relevant context before passing it to a language model for response generation. It is built to be easily deployable, requiring only environment configuration and dependency installation to run a functional RAG service. The system supports configurable filtering, scoring thresholds, and reranking options, allowing developers to fine-tune retrieval quality. Its architecture is modular, separating handlers, services, and utilities to support customization and extension. Overall, rag-search serves as a practical starter backend for teams building AI search or question-answering applications on their own data.

Downloads: 0 This Week

Last Update: 2026-03-03
See Project