Search Results for "tesseract-ocr-w64-setup" - Page 5

Sort By:

Showing 186 open source projects for "tesseract-ocr-w64-setup"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
Data management solutions for confident marketing
For companies wanting a complete Data Management solution that is native to Salesforce

Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.

Learn More
1

Agent Starter Pack

Ship AI Agents to Google Cloud in minutes, not months

Agent Starter Pack is a production-focused framework that provides pre-built templates and infrastructure for rapidly developing and deploying generative AI agents on Google Cloud. It is designed to eliminate the complexity of moving from prototype to production by bundling essential components such as deployment pipelines, monitoring, security, and evaluation tools into a single package. Developers can create fully functional agent projects with a single command, generating both backend and...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
2

Lagent

A lightweight framework for building LLM-based agents

...The system includes modular components that allow developers to connect different models and tools within the same agent architecture. Its design emphasizes simplicity and flexibility so that developers can experiment with different agent workflows without needing a complex infrastructure setup. Lagent can also be deployed as a web service to support distributed or multi-agent applications.

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
3

MiniMax-MCP

Official MiniMax Model Context Protocol (MCP) server

MiniMax-MCP is the official Model Context Protocol (MCP) server for accessing MiniMax’s multimodal generative APIs from MCP-compatible clients. It acts as a bridge between tools like Claude Desktop, Cursor, Windsurf, OpenAI Agents, and the MiniMax platform, exposing capabilities such as text-to-speech, voice cloning, image generation, text-to-image, video generation, image-to-video, text-to-video, and music generation. The server is written in Python and distributed under the MIT license,...

Downloads: 1 This Week

Last Update: 2026-01-07
See Project
4

Flow Matching

A PyTorch library for implementing flow matching algorithms

flow_matching is a PyTorch library implementing flow matching algorithms in both continuous and discrete settings, enabling generative modeling via matching vector fields rather than diffusion. The underlying idea is to parameterize a flow (a time-dependent vector field) that transports samples from a simple base distribution to a target distribution, and train via matching of flows without requiring score estimation or noisy corruption—this can lead to more efficient or stable generative...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
Premier Construction Software
Premier is a global leader in financial construction ERP software.

Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.

Learn More
5

Code World Model (CWM)

Research code artifacts for Code World Model (CWM)

CWM (Code World Model) is a 32-billion-parameter open-weights language model. It is developed by Meta for enhancing code generation and reasoning about programs. It is explicitly trained on execution traces, action-observation trajectories, and agentic interactions in controlled environments. It has been developed to better capture how code, actions, and state interact over time. The repository provides inference code, reproducibility scripts, prompt guides, and more. It has model cards,...

Downloads: 0 This Week

Last Update: 2025-09-26
See Project
6

PyTorch Geometric

Geometric deep learning extension library for PyTorch

...These packages come with their own CPU and GPU kernel implementations based on C++/CUDA extensions. We do not recommend installation as root user on your system python. Please setup an Anaconda/Miniconda environment or create a Docker image. We provide pip wheels for all major OS/PyTorch/CUDA combinations.

Downloads: 0 This Week

Last Update: 2025-10-14
See Project
7

OpenAI-Compatible Edge-TTS API

Free, high-quality text-to-speech API endpoint to replace OpenAI

OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
8

ChatTTS_colab

One-click deployment (including offline integration package)

ChatTTS_colab is a wrapper project around the ChatTTS model that focuses on “one-click” deployment, especially in Google Colab. It provides an integrated offline bundle and scripts for Windows and macOS so users can run ChatTTS locally without wrestling with complex environment setup. The repository includes Colab notebooks that launch a Gradio-based web UI and expose streaming TTS, making it possible to listen to generated audio as it is produced. A distinctive feature is the “voice gacha” system, which batch-generates many distinct voice timbres and allows users to save the ones they like into a curated voice library. ...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
9

Recommenders

Best practices on recommendation systems

...Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. Please see the setup guide for more details on setting up your machine locally, on a data science virtual machine (DSVM) or on Azure Databricks. Independent or incubating algorithms and utilities are candidates for the contrib folder. This will house contributions which may not easily fit into the core repository or need time to refactor or mature the code and add necessary tests.

Downloads: 0 This Week

Last Update: 2024-12-23
See Project
Failed Payment Recovery for Subscription Businesses
For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.

Learn More
10

Controllable-RAG-Agent

This repository provides an advanced RAG

Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector...

Downloads: 0 This Week

Last Update: 2025-11-13
See Project
11

GELab-Zero

GUI Exploration Lab. One of the best GUI agent solutions

GELab-Zero is an open-source “GUI Agent” framework aiming to automate interactions with graphical user interfaces (GUIs), combining both the agent model and all supporting infrastructure — including inference, input orchestration, and GUI automation logic — in a plug-and-play package that runs locally, without cloud dependencies. The idea is to let developers or users harness an AI agent that can simulate clicking, typing, reading UI elements, and interacting with apps in a human-like way...

Downloads: 0 This Week

Last Update: 2026-01-23
See Project
12

MiniMax-M1

Open-weight, large-scale hybrid-attention reasoning model

MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to support a native context length of 1 million tokens while using far fewer FLOPs than comparable reasoning models for very long generations. The team emphasizes efficient scaling of test-time compute: at 100K-token generation lengths, M1 reportedly uses only about 25 percent of the FLOPs of some competing models, making extended “think step” traces more feasible. ...

Downloads: 0 This Week

Last Update: 2025-12-01
See Project
13

Agent Payments Protocol (AP2)

Building a Secure and Interoperable Future for AI-Driven Payments

AP2 is a project released by Google’s “Agentic Commerce” initiative, focusing on a protocol and reference implementation for agent-driven or AI-mediated payments. In effect, AP2 aims to define a secure, interoperable protocol that allows software agents to act on behalf of users—making payments or shopping decisions autonomously—while preserving necessary security, auditability, and trust. The repository contains sample scenarios (in Python, Android, etc.) that illustrate how agents,...

Downloads: 0 This Week

Last Update: 2025-09-18
See Project
14

Darts

A python library for easy manipulation and forecasting of time series

...The ML-based models can be trained on potentially large datasets containing multiple time series, and some of the models offer a rich support for probabilistic forecasting. We recommend to first setup a clean Python environment for your project with at least Python 3.7 using your favorite tool (conda, venv, virtualenv with or without virtualenvwrapper).

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
15

FlowLens MCP

Open-source MCP server that gives your coding agent

FlowLens MCP Server is an open-source tool designed to give AI-powered coding agents (like Claude Code, Cursor, GitHub Copilot / Codex, and others) full, replayable browser context to dramatically improve debugging, bug reporting, and regression testing for web applications. It works together with a companion browser extension: when a user reproduces a bug or a complicated UI interaction, the extension captures a rich session log, including screen/video recording, network traffic, console...

Downloads: 0 This Week

Last Update: 2025-12-05
See Project
16

bitfarm-Archiv Document Management - DMS

bitfarm-Archiv is a powerful Document Management (DMS), Enterprise Content Management (ECM) and Knowledge Management System (KMS) with Workflow Components. Help us! As we live in the internet age, the best thing, you can help, is to write a short statement about your scenario and your use of the DMS, along with your experiences and put it on your own website or in a blog or forum. It would help us best, if you can also add a hyperlink to our site http://www.bitfarm-archiv.com. By this...

11 Reviews

Downloads: 11 This Week

Last Update: 6 days ago
See Project
17

local-llm

Run LLMs locally on Cloud Workstations

local-llm is a development framework that enables developers to run large language models locally within Google Cloud Workstations or standard environments without requiring GPU hardware. It focuses on making generative AI development more accessible by leveraging quantized models and CPU-based execution, eliminating the dependency on expensive GPU infrastructure. The repository includes tools, Docker configurations, and command-line utilities that simplify the process of downloading,...

Downloads: 1 This Week

Last Update: 2026-03-17
See Project
18

realwatermark

A Python application to add watermarks (text or image) to PDF files

A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.

Downloads: 1 This Week

Last Update: 2025-01-27
See Project
19

StyleTTS 2

Towards Human-Level Text-to-Speech through Style Diffusion

StyleTTS2 is a state-of-the-art text-to-speech system that aims for human-level naturalness by combining style diffusion, adversarial training, and large speech language models. It extends the original StyleTTS idea by introducing a style diffusion model that can sample rich, realistic speaking styles conditioned on reference speech, allowing highly expressive and diverse prosody. The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide...

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
20

OpenKYC - FaceOnLive Community Project

FaceOnLive Open KYC: Streamlining Identity Verification with AI

Immerse yourself in the groundbreaking realm of the FaceOnLive Open KYC Project, a trailblazing endeavor at the forefront of redefining identity verification paradigms. With a commitment to leveraging the latest advancements in biometric technology, our platform presents a comprehensive solution encompassing cutting-edge features such as face recognition, face liveness detection, and ID document recognition. By seamlessly integrating these powerful tools, we empower businesses across...

149 Reviews

Downloads: 5 This Week

Last Update: 2024-04-02
See Project
21

Stable Diffusion

High-Resolution Image Synthesis with Latent Diffusion Models

Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models....

2 Reviews

Downloads: 292 This Week

Last Update: 2025-02-28
See Project
22

AnimateDiff

Plug-n-play module turning text-to-image models into animation

AnimateDiff is an open-source project designed to enhance text-to-image diffusion models by adding animation capabilities. It allows users to turn static images generated by popular text-to-image models into animated sequences without requiring additional model training. This plug-and-play tool is compatible with a wide range of community models and facilitates the generation of animation directly from pre-existing text-to-image models. It supports various configurations to create animations...

1 Review

Downloads: 26 This Week

Last Update: 2025-03-06
See Project
23

EasyTTS

Text to Speech Utility

EasyTTS is a text to speech app for 64 bit Windows that offers online and offline text-to-speech, with settings for how fast the voice is. It supports languages other than English but only if you are connected to the Internet. These are Spanish, Portuguese, Russian, French, and Mandarin (?) Chinese.

1 Review

Downloads: 2 This Week

Last Update: 2024-05-01
See Project
24

CSM (Conversational Speech Model)

A Conversational Speech Generation Model

...It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.

Downloads: 5 This Week

Last Update: 2025-03-19
See Project
25

DiffRhythm

Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation

DiffRhythm is an open-source, diffusion-based model designed to generate full-length songs. Focused on music creation, it combines advanced AI techniques to produce coherent and creative audio compositions. The model utilizes a latent diffusion architecture, making it capable of producing high-quality, long-form music. It can be accessed on Huggingface, where users can interact with a demo or download the model for further use. DiffRhythm offers tools for both training and inference, and its...

1 Review

Downloads: 5 This Week

Last Update: 2025-03-06
See Project