Showing 72 open source projects for "text search"

View related business solutions
  • Hightouch is a data and AI platform for marketing and personalization. Icon
    Hightouch is a data and AI platform for marketing and personalization.

    Marketing needs data and AI. Give them Hightouch.

    Find insights, run real-time campaigns, and build AI agents with all your data.
    Learn More
  • Cloudbrink Personal SASE service Icon
    Cloudbrink Personal SASE service

    For companies looking for low maintenance, secure, high performance connectivity for hybrid and remote workers

    Cloudbrink’s Personal SASE is a high-performance connectivity and security service that delivers a lightning-fast, in-office experience to the modern hybrid workforce anywhere. Combining high-performance ZTNA with Automated Moving Target Defense (AMTD), and Personal SD-WAN all connections are ultra-secure.
    Learn More
  • 1
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    ...OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Argilla

    Argilla

    The open-source data curation platform for LLMs

    Argilla is a production-ready framework for building and improving datasets for NLP projects. Deploy your own Argilla Server on Spaces with a few clicks. Use embeddings to find the most similar records with the UI. This feature uses vector search combined with traditional search (keyword and filter based). Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    OpenViking

    OpenViking

    Context database designed specifically for AI Agents

    OpenViking is an open-source context database engineered for efficient indexing and retrieval of large amounts of unstructured or semi-structured context data used by AI applications. It’s primarily designed to serve as a high-performance, scalable backend for storing app context, embeddings, conversational histories, and other textual artifacts that need rapid lookup and semantic search, which makes it especially useful for systems like chatbots or memory-augmented agents. The project is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • The most trusted software in construction Icon
    The most trusted software in construction

    HCSS is the gold standard software solution for winning, planning, and managing construction projects by connecting the office to the field.

    HCSS provides easy-to-use software built for construction companies that want to win more work, work smarter, and boost profits. For nearly 40 years, we've helped heavy civil contractors, infrastructure builders, and utility companies improve operations, from estimating and project management to field tracking, equipment maintenance, and safety. Tools like HeavyBid, HeavyJob, and HCSS Safety are built for the field and designed to work together, giving your team real-time visibility, tighter cost control, and better job outcomes. With 45+ accounting integrations and customizable APIs, HCSS fits seamlessly into your tech stack. We regularly update our software based on feedback from real crews, ensuring it fits the way your team works. Backed by award-winning 24/7/365 support and a proven implementation process, HCSS helps reduce risk, cut inefficiencies, and deliver fast ROI. If you're ready to grow your business and gain a competitive edge, HCSS is the partner that gets you there.
    Learn More
  • 5
    PaSa

    PaSa

    An advanced paper search agent powered by large language models

    ...Given a complex scholarly question (for example, “Which works focus on non-stationary reinforcement learning with UCB-based value methods?”), PaSa decomposes the task: the Crawler generates search queries, retrieves candidate papers (via search tools and citation expansion), then adds them to a “paper queue.” The Selector then reads abstracts or full text (depending on what’s available) and decides which papers are relevant.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    refinery

    refinery

    Open-source choice to scale, assess and maintain natural language data

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. Also, the makers...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Code-Graph-RAG

    Code-Graph-RAG

    The ultimate RAG for your monorepo

    ...It uses Tree-sitter to parse source code into abstract syntax trees, extracting relationships between functions, classes, and modules to build a graph-based representation of the entire codebase. This structured approach enables more accurate and context-aware querying compared to traditional text-based search methods, allowing users to ask natural language questions about code structure and functionality. The system integrates with graph databases such as Memgraph to store and manage relationships, enabling efficient querying and visualization of complex dependencies. It also supports AI-driven query translation, converting natural language into graph queries for deeper analysis and interaction.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 8
    shuyuan

    shuyuan

    Reading book source

    shuyuan is a project oriented around reading and knowledge consumption, especially targeting large-scale text content such as books, articles, or educational material. The name suggests “academy” or “study hall,” and the tool aims to help users ingest, organize, and manage reading content — possibly offering features like text parsing, annotation, metadata generation, translation, or storage for later reference. The repository is set up to support document ingestion, indexing, and maybe some...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    PyGPT

    PyGPT

    Open source personal AI Assistant for Linux, Windows and Mac

    ...It allows you to talk in chat mode and in completion mode, as well as generate images using DALL-E 2. PyGPT also adds access to the Internet for GPT via Google Custom Search API and Wikipedia API and includes voice synthesis using Microsoft Azure Text-to-Speech API. Moreover, the application has implemented context memory support, context storage, history of contexts, which can be restored at any time and e.g. continue the conversation from point in history, and also has a convenient and intuitive system of presets that allows you to quickly and pleasantly create and manage your prompts. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Software Defined Storage Icon
    Software Defined Storage

    The layered architecture of QuantaStor provides solution engineers with unprecedented flexibility and application design options.

    QuantaStor is a unified Software-Defined Storage platform designed to scale up and out to make storage management easy while reducing overall enterprise storage costs.
    Learn More
  • 10
    Pathway AI Pipelines

    Pathway AI Pipelines

    Ready-to-run cloud templates for RAG

    Pathway AI Pipelines is a collection of ready-to-deploy AI pipeline templates designed to help developers rapidly build production-grade retrieval-augmented generation and enterprise search applications. The project provides end-to-end examples that connect live data sources to LLM workflows, enabling applications to stay synchronized with continuously changing information. It supports numerous connectors including local files, Google Drive, SharePoint, Kafka, PostgreSQL, and real-time APIs,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MobileCLIP

    MobileCLIP

    Implementation of "MobileCLIP" CVPR 2024

    MobileCLIP is a family of efficient image-text embedding models designed for real-time, on-device retrieval and zero-shot classification. The repo provides training, inference, and evaluation code for MobileCLIP models trained on DataCompDR, and for newer MobileCLIP2 models trained on DFNDR. It includes an iOS demo app and Core ML artifacts to showcase practical, offline photo search and classification on iPhone-class hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ElevenLabs Python

    ElevenLabs Python

    The official Python SDK for the ElevenLabs API

    elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls. It exposes ElevenLabs’ main models such as Eleven Multilingual v2, Eleven Flash v2.5, and Eleven Turbo v2.5, each targeting different trade-offs...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Generative AI for Beginners (Version 3)

    Generative AI for Beginners (Version 3)

    21 Lessons, Get Started Building with Generative AI

    Generative AI for Beginners is a 21-lesson course by Microsoft Cloud Advocates that teaches the fundamentals of building generative AI applications in a practical, project-oriented way. Lessons are split into “Learn” modules for core concepts and “Build” modules with hands-on code in Python and TypeScript, so you can jump in at any point that matches your goals. The course covers everything from model selection, prompt engineering, and chat/text/image app patterns to secure development...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    AutoGluon

    AutoGluon

    AutoGluon: AutoML for Image, Text, and Tabular Data

    AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OpenRecall

    OpenRecall

    OpenRecall is a fully open-source, privacy-first alternative

    OpenRecall is an open-source, privacy-first system designed to capture, index, and make searchable a user’s entire digital activity history, effectively acting as a personal memory layer for computing environments. It works by taking periodic screenshots of a user’s screen and applying local AI processing, including OCR and semantic analysis, to extract and structure information from both text and images. This data is then indexed into a searchable database, allowing users to retrieve past...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Flowly AI

    Flowly AI

    Flowly is 100x faster than OpenClaw

    Flowly is an open-source personal AI assistant that runs locally on your machine and connects to multiple communication platforms like Telegram, WhatsApp, Discord, and Slack. It acts as a centralized AI system that can perform tasks such as web browsing, file management, command execution, scheduling, and more—all while keeping your data private. Designed for flexibility, Flowly supports multiple AI providers and models through LiteLLM, allowing users to customize how their assistant...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    Claude Code Tools

    Claude Code Tools

    Practical productivity tools for Claude Code, Codex-CLI

    Claude Code Tools is an open-source collection of command-line utilities and productivity plugins designed to enhance developer workflows when using AI coding agents such as Claude Code and Codex-CLI. The project focuses on solving common problems encountered in AI-assisted development environments, including managing session history, automating terminal interactions, and maintaining context across multiple coding sessions. It includes tools that allow developers to search conversation logs...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector stores for fine-grained chunks, chapter summaries, and book quotes to support nuanced queries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    MLJAR Studio

    MLJAR Studio

    Python package for AutoML on Tabular Data with Feature Engineering

    We are working on new way for visual programming. We developed a desktop application called MLJAR Studio. It is a notebook-based development environment with interactive code recipes and a managed Python environment. All running locally on your machine. We are waiting for your feedback. The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    MiniRAG

    MiniRAG

    Making RAG Simpler with Small and Open-Sourced Language Models

    MiniRAG is a lightweight retrieval-augmented generation tool designed to bring the benefits of RAG workflows to smaller datasets, edge environments, and constrained compute settings by simplifying embedding, indexing, and retrieval. It extracts text from documents, codes, or other structured inputs and converts them into embeddings using efficient models, then stores these vectors for fast nearest-neighbor search without requiring huge databases or separate vector servers. When a query is issued, MiniRAG retrieves the most relevant contexts and feeds them into a generative model to produce an answer that is grounded in the source material rather than hallucinated. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Universal Tool Calling Protocol (UTCP)

    Universal Tool Calling Protocol (UTCP)

    Official python implementation of UTCP. UTCP is an open standard

    The python-utcp repository is the official Python SDK implementation of the Universal Tool Calling Protocol (UTCP). UTCP is an open, modern standard designed to let AI agents call any tool or API directly—over HTTP, CLI, WebSocket, gRPC, and more—without the overhead of extra wrapper layers or middleware. It leverages a modular, plugin-based architecture built around Pydantic models and separates the core functionality into a lightweight client and extensible protocol plugins, enabling...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    GLM-4.6V

    GLM-4.6V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.6V represents the latest generation of the GLM-V family and marks a major step forward in multimodal AI by combining advanced vision-language understanding with native “tool-call” capabilities, long-context reasoning, and strong generalization across domains. Unlike many vision-language models that treat images and text separately or require intermediate conversions, GLM-4.6V allows inputs such as images, screenshots or document pages directly as part of its reasoning pipeline — and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Weights and Biases

    Weights and Biases

    Tool for visualizing and tracking your machine learning experiments

    Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production models. Quickly identify model regressions. Use W&B to visualize results in real time, all in a central dashboard. Focus on the interesting ML. Spend less time manually tracking results in spreadsheets and text files. Capture dataset versions with W&B Artifacts to identify how changing data affects your resulting models. Reproduce any model, with saved...
    Downloads: 6 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB