Search Results for "intelligent character recognition"

Showing 31 open source projects for "intelligent character recognition"

View related business solutions
  • Outbound sales software Icon
    Outbound sales software

    Unified cloud-based platform for dialing, emailing, appointment scheduling, lead management and much more.

    Adversus is an outbound dialing solution that helps you streamline your call strategies, automate manual processes, and provide valuable insights to improve your outbound workflows and efficiency.
    Learn More
  • Agentic AI SRE built for Engineering and DevOps teams. Icon
    Agentic AI SRE built for Engineering and DevOps teams.

    No More Time Lost to Troubleshooting

    NeuBird AI's agentic AI SRE delivers autonomous incident resolution, helping team cut MTTR up to 90% and reclaim engineering hours lost to troubleshooting.
    Learn More
  • 1
    SimpleHTR

    SimpleHTR

    Handwritten Text Recognition (HTR) system implemented with TensorFlow

    ...It also employs connectionist temporal classification (CTC) to align predicted character sequences with input images without requiring character-level segmentation. The repository provides code for training models, performing inference on handwritten text images, and evaluating recognition accuracy. SimpleHTR is commonly used as an educational example for understanding how modern handwriting recognition systems operate.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    PaddleOCR

    PaddleOCR

    Awesome multilingual OCR toolkits based on PaddlePaddle

    PaddleOCR offers exceptional, multilingual, and practical Optical Character Recognition (OCR) tools that can help users train better models and apply them into practice. Inspired by PaddlePaddle, PaddleOCR is an ultra lightweight OCR system, with multilingual recognition, digit recognition, vertical text recognition, as well as long text recognition. It features a PPOCR series of high-quality pre-trained models, which includes: ultra lightweight ppocr_mobile series models, general ppocr_server series models, and ultra lightweight compression ppocr_mobile_slim series models. ...
    Downloads: 64 This Week
    Last Update:
    See Project
  • 3
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    ...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    DeepSeek-OCR

    DeepSeek-OCR

    Contexts Optical Compression

    DeepSeek-OCR is an open-source optical character recognition solution built as part of the broader DeepSeek AI vision-language ecosystem. It is designed to extract text from images, PDFs, and scanned documents, and integrates with multimodal capabilities that understand layout, context, and visual elements beyond raw character recognition. The system treats OCR not simply as “read the text” but as “understand what the text is doing in the image”—for example distinguishing captions from body text, interpreting tables, or recognizing handwritten versus printed words. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Data management solutions for confident marketing Icon
    Data management solutions for confident marketing

    For companies wanting a complete Data Management solution that is native to Salesforce

    Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
    Learn More
  • 5
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines.
    Downloads: 52 This Week
    Last Update:
    See Project
  • 6
    Concordia

    Concordia

    Crowdsourcing platform for full text transcription and tagging

    ...It was developed by the Library of Congress so that volunteers of all backgrounds could transcribe and tag digitized images of manuscripts and typed materials from the Library’s collections that could not otherwise be done by optical character recognition.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    Agently

    Agently

    AI Agent Application Development Framework

    Build AI agent native application in very little code. Easy to interact with AI agents in code using structure data and chained-calls syntax. Enhance AI Agent using plugins instead of rebuilding a whole new agent. Agently is a development framework that helps developers build AI agent native applications really fast. You can use and build AI agents in your code in an extremely simple way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 9
    Open-LLM-VTuber

    Open-LLM-VTuber

    Open source AI VTuber platform with voice chat and Live2D avatars

    Open-LLM-VTuber is an open source platform designed to create AI-powered VTuber characters that can interact with users through voice and animated avatars. It enables hands-free conversations with large language models by combining speech recognition, language processing, and text-to-speech synthesis into a single system. Users can speak directly to the AI character, and the system can respond with a generated voice while animating a Live2D avatar to simulate a talking virtual personality. Open-LLM-VTuber is modular, allowing developers to swap or configure different language models, speech recognition engines, and voice synthesis systems depending on their needs. ...
    Downloads: 29 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 10
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 98 This Week
    Last Update:
    See Project
  • 11
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Operit AI

    Operit AI

    Powerful Android AI agent with tools, automation, and Linux shell

    Operit is a full-featured AI assistant and agent platform designed specifically for Android devices, aiming to go far beyond traditional chat-based interfaces. It integrates deep system-level capabilities with a wide range of tools, allowing the AI to perform real tasks such as file management, automation, and system control directly on the device. A standout aspect of the project is its built-in Ubuntu 24 environment, which enables users to run Linux commands, scripts, and development tools...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 13
    DeepSeek-OCR 2

    DeepSeek-OCR 2

    Visual Causal Flow

    DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    Unredact is a specialized tool that attempts to reconstruct redacted or obscured text in images, PDFs, or screenshots using a combination of image processing and generative AI inference to suggest plausible completions of blurred, black-boxed, or jumbled content. Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. It accepts a variety of input formats, automatically identifies redacted regions, and then generates text suggestions that are presented alongside visual overlays so users can choose or refine outputs.
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16
    CowAgent

    CowAgent

    AI assistant based on large models that can actively think and plan

    CowAgent, based on the chatgpt-on-wechat project, is an open-source AI agent framework that integrates large language models into the WeChat ecosystem to create intelligent conversational assistants. It enables automated message handling by connecting WeChat accounts with AI models that can generate contextual replies, process voice messages, and produce images directly inside chats. The platform has evolved beyond a simple chatbot into a more autonomous agent capable of planning complex...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    ...The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual and textual information directly from document images. This allows the system to detect and extract structured elements such as tables, signatures, key fields, and layout information while maintaining semantic understanding of the document content. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    SCAIL

    SCAIL

    Towards Studio-Grade Character Animation via In-Context Learning of 3D

    SCAIL is a project developed by the ZAI Organization, focusing on AI-driven research initiatives. While specific documentation about SCAIL’s exact goals and implementation is limited from the repository context alone, the project appears to be part of a collection of machine learning and AI research tools that facilitate scalable model development, evaluation, or application workflows. Given its listing alongside other ZAI projects like speech recognition and text-to-speech systems, SCAIL...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Best-of Machine Learning with Python

    Best-of Machine Learning with Python

    A ranked list of awesome machine learning Python libraries

    This curated list contains 900 awesome open-source projects with a total of 3.3M stars grouped into 34 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! General-purpose machine learning and deep learning...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    FAY

    FAY

    Framework for building AI-powered interactive digital humans and agent

    Fay is an open source framework designed to build and deploy interactive digital humans powered by large language models. It acts as a middleware layer that connects digital character technologies with conversational AI systems and business applications. Fay supports various types of digital humans, including 2.5D and 3D avatars, and can be integrated with applications running on mobile devices, PCs, web platforms, and embedded systems. Its architecture allows developers to combine different AI components such as speech recognition, text-to-speech, and large language models to create conversational digital agents. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21

    realwatermark

    A Python application to add watermarks (text or image) to PDF files

    A Python application to add watermarks (text or image) to PDF files, converts them into image and back to PDF with options for OCR and compression.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    PyCAPGE

    PyCAPGE

    PyCAPGE - Python Classic Adventure Point and Click Game Engine

    ...Inspired by the golden age of SCUMM games, it features a customizable 9-verb interface and robust inventory management. Key features include a Scene Manager supporting parallax scrolling, walk-behind masks, and depth-based character scaling. It implements intelligent Pathfinding to navigate complex environments automatically. The engine natively supports Multi-Character gameplay, allowing dynamic switching between protagonists. Developers can build rich narratives using the branching Dialogue System and a Cutscene Manager for scripted events. The architecture is data-driven: texts and definitions are separated into YAML files for easy localization. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 415 This Week
    Last Update:
    See Project
  • 24
    PC_Workman_HCK

    PC_Workman_HCK

    AI-powered PC monitoring that explains. Not shows numbers/spikes.

    PC_Workman is what 680 hours of coding after warehouse shifts looks like. Built on a laptop hitting 94°C, this AI-powered monitoring tool does what Task Manager can't: it understands your system, not just measures it. Features: - Time travel monitoring - debug issues from hours ago - AI diagnostics with HCK_GPT - Custom fan curves with profiles - Floating always-on-top widget - 2D system map - Cross-GPU support (NVIDIA/AMD/Intel) Four complete rebuilds. 29 features killed....
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    TextureAtlas Toolbox

    TextureAtlas Toolbox

    A powerful, free and open-source tool for TextureAtlases/Spritesheets

    TextureAtlas Toolbox is an all-in-one solution for working with texture atlases and sprite sheets. Extract sprites into organized frame collections and GIF/WebP/APNG animations, generate optimized atlases from individual frames, or convert between 15+ atlas formats. Perfect for game developers, modders, and anyone creating showcases of game sprites. Formerly known as TextureAtlas to GIFs and Frames Licensed under AGPL-3.0 Third-party licenses: See...
    Leader badge
    Downloads: 50 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB