Showing 186 open source projects for "tesseract-ocr-w64-setup"

View related business solutions
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • Data management solutions for confident marketing Icon
    Data management solutions for confident marketing

    For companies wanting a complete Data Management solution that is native to Salesforce

    Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
    Learn More
  • 1
    Agent Starter Pack

    Agent Starter Pack

    Ship AI Agents to Google Cloud in minutes, not months

    Agent Starter Pack is a production-focused framework that provides pre-built templates and infrastructure for rapidly developing and deploying generative AI agents on Google Cloud. It is designed to eliminate the complexity of moving from prototype to production by bundling essential components such as deployment pipelines, monitoring, security, and evaluation tools into a single package. Developers can create fully functional agent projects with a single command, generating both backend and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Lagent

    Lagent

    A lightweight framework for building LLM-based agents

    ...The system includes modular components that allow developers to connect different models and tools within the same agent architecture. Its design emphasizes simplicity and flexibility so that developers can experiment with different agent workflows without needing a complex infrastructure setup. Lagent can also be deployed as a web service to support distributed or multi-agent applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    MiniMax-MCP

    MiniMax-MCP

    Official MiniMax Model Context Protocol (MCP) server

    MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Flow Matching

    Flow Matching

    A PyTorch library for implementing flow matching algorithms

    flow_matching is a PyTorch library implementing flow matching algorithms in both continuous and discrete settings, enabling generative modeling via matching vector fields rather than diffusion. The underlying idea is to parameterize a flow (a time-dependent vector field) that transports samples from a simple base distribution to a target distribution, and train via matching of flows without requiring score estimation or noisy corruption—this can lead to more efficient or stable generative...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Premier Construction Software Icon
    Premier Construction Software

    Premier is a global leader in financial construction ERP software.

    Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.
    Learn More
  • 5
    Code World Model (CWM)

    Code World Model (CWM)

    Research code artifacts for Code World Model (CWM)

    CWM (Code World Model) is a 32-billion-parameter open-weights language model. It is developed by Meta for enhancing code generation and reasoning about programs. It is explicitly trained on execution traces, action-observation trajectories, and agentic interactions in controlled environments. It has been developed to better capture how code, actions, and state interact over time. The repository provides inference code, reproducibility scripts, prompt guides, and more. It has model cards,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    PyTorch Geometric

    PyTorch Geometric

    Geometric deep learning extension library for PyTorch

    ...These packages come with their own CPU and GPU kernel implementations based on C++/CUDA extensions. We do not recommend installation as root user on your system python. Please setup an Anaconda/Miniconda environment or create a Docker image. We provide pip wheels for all major OS/PyTorch/CUDA combinations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    ChatTTS_colab

    ChatTTS_colab

    One-click deployment (including offline integration package)

    ChatTTS_colab is a wrapper project around the ChatTTS model that focuses on “one-click” deployment, especially in Google Colab. It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Recommenders

    Recommenders

    Best practices on recommendation systems

    ...Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Failed Payment Recovery for Subscription Businesses Icon
    Failed Payment Recovery for Subscription Businesses

    For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

    FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
    Learn More
  • 10
    Controllable-RAG-Agent

    Controllable-RAG-Agent

    This repository provides an advanced RAG

    Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    GELab-Zero

    GELab-Zero

    GUI Exploration Lab. One of the best GUI agent solutions

    GELab-Zero is an open-source “GUI Agent” framework aiming to automate interactions with graphical user interfaces (GUIs), combining both the agent model and all supporting infrastructure — including inference, input orchestration, and GUI automation logic — in a plug-and-play package that runs locally, without cloud dependencies. The idea is to let developers or users harness an AI agent that can simulate clicking, typing, reading UI elements, and interacting with apps in a human-like way...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Agent Payments Protocol (AP2)

    Agent Payments Protocol (AP2)

    Building a Secure and Interoperable Future for AI-Driven Payments

    AP2 is a project released by Google’s “Agentic Commerce” initiative, focusing on a protocol and reference implementation for agent-driven or AI-mediated payments. In effect, AP2 aims to define a secure, interoperable protocol that allows software agents to act on behalf of users—making payments or shopping decisions autonomously—while preserving necessary security, auditability, and trust. The repository contains sample scenarios (in Python, Android, etc.) that illustrate how agents,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Darts

    Darts

    A python library for easy manipulation and forecasting of time series

    ...The ML-based models can be trained on potentially large datasets containing multiple time series, and some of the models offer a rich support for probabilistic forecasting. We recommend to first setup a clean Python environment for your project with at least Python 3.7 using your favorite tool (conda, venv, virtualenv with or without virtualenvwrapper).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FlowLens MCP

    FlowLens MCP

    Open-source MCP server that gives your coding agent

    FlowLens MCP Server is an open-source tool designed to give AI-powered coding agents (like Claude Code, Cursor, GitHub Copilot / Codex, and others) full, replayable browser context to dramatically improve debugging, bug reporting, and regression testing for web applications. It works together with a companion browser extension: when a user reproduces a bug or a complicated UI interaction, the extension captures a rich session log, including screen/video recording, network traffic, console...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    bitfarm-Archiv Document Management - DMS
    bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 17
    local-llm

    local-llm

    Run LLMs locally on Cloud Workstations

    local-llm is a development framework that enables developers to run large language models locally within Google Cloud Workstations or standard environments without requiring GPU hardware. It focuses on making generative AI development more accessible by leveraging quantized models and CPU-based execution, eliminating the dependency on expensive GPU infrastructure. The repository includes tools, Docker configurations, and command-line utilities that simplify the process of downloading,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    StyleTTS2 is a state-of-the-art text-to-speech system that aims for human-level naturalness by combining style diffusion, adversarial training, and large speech language models. It extends the original StyleTTS idea by introducing a style diffusion model that can sample rich, realistic speaking styles conditioned on reference speech, allowing highly expressive and diverse prosody. The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    OpenKYC - FaceOnLive Community Project

    OpenKYC - FaceOnLive Community Project

    FaceOnLive Open KYC: Streamlining Identity Verification with AI

    Immerse yourself in the groundbreaking realm of the FaceOnLive Open KYC Project, a trailblazing endeavor at the forefront of redefining identity verification paradigms. With a commitment to leveraging the latest advancements in biometric technology, our platform presents a comprehensive solution encompassing cutting-edge features such as face recognition, face liveness detection, and ID document recognition. By seamlessly integrating these powerful tools, we empower businesses across...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models....
    Downloads: 292 This Week
    Last Update:
    See Project
  • 22
    AnimateDiff

    AnimateDiff

    Plug-n-play module turning text-to-image models into animation

    AnimateDiff is an open-source project designed to enhance text-to-image diffusion models by adding animation capabilities. It allows users to turn static images generated by popular text-to-image models into animated sequences without requiring additional model training. This plug-and-play tool is compatible with a wide range of community models and facilitates the generation of animation directly from pre-existing text-to-image models. It supports various configurations to create animations...
    Leader badge
    Downloads: 26 This Week
    Last Update:
    See Project
  • 23
    EasyTTS

    EasyTTS

    Text to Speech Utility

    EasyTTS is a text to speech app for 64 bit Windows that offers online and offline text-to-speech, with settings for how fast the voice is. It supports languages other than English but only if you are connected to the Internet. These are Spanish, Portuguese, Russian, French, and Mandarin (?) Chinese.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    ...It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    DiffRhythm

    DiffRhythm

    Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation

    DiffRhythm is an open-source, diffusion-based model designed to generate full-length songs. Focused on music creation, it combines advanced AI techniques to produce coherent and creative audio compositions. The model utilizes a latent diffusion architecture, making it capable of producing high-quality, long-form music. It can be accessed on Huggingface, where users can interact with a demo or download the model for further use. DiffRhythm offers tools for both training and inference, and its...
    Downloads: 5 This Week
    Last Update:
    See Project