Showing 97 open source projects for "html source extractor"

View related business solutions
  • Securely stream and govern industrial data to power intelligent operations with agentic insights. Icon
    Securely stream and govern industrial data to power intelligent operations with agentic insights.

    For IoT Developers, Solution Architects, Technical Architects, CTOs, OT/IT Engineers

    Trusted MQTT Platform — Fully-managed and cloud-native MQTT platform for bi-directional IoT data movement.
    Learn More
  • Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution Icon
    Globalscape Enhanced File Transfer (EFT) is a best-in-class managed file transfer (MFT) solution

    For Windows-Centric Organizations Looking for Secure File Transfer solutions

    Globalscape’s Enhanced File Transfer (EFT) platform is a comprehensive, user-friendly managed file transfer (MFT) software. Thousands of Windows-Centric Organizations trust Globalscape EFT for their mission-critical file transfers.
    Learn More
  • 1
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    Video hard subtitle extraction, generate srt file. There is no need to apply for a third-party API, and text recognition can be implemented locally. A deep learning-based video subtitle extraction framework, including subtitle region detection and subtitle content extraction. A GUI tool for extracting hard-coded subtitles (hardsub) from videos and generating srt files. Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu...
    Downloads: 75 This Week
    Last Update:
    See Project
  • 2
    Tesseract OCR

    Tesseract OCR

    Open Source OCR Engine

    ...Tesseract can recognize over 100 languages out-of-the-box, and can be trained to recognize other languages. It supports various output formats, including plain text, HTML, PDF and more. It also has unicode (UTF-8) support.
    Downloads: 3,236 This Week
    Last Update:
    See Project
  • 3
    DeepCTR-Torch

    DeepCTR-Torch

    Easy-to-use,Modular and Extendible package of deep-learning models

    DeepCTR-Torch is an easy-to-use, Modular and Extendible package of deep-learning-based CTR models along with lots of core components layers that can be used to build your own custom model easily.It is compatible with PyTorch.You can use any complex model with model.fit() and model.predict(). With the great success of deep learning, DNN-based techniques have been widely used in CTR estimation tasks. The data in the CTR estimation task usually includes high sparse,high cardinality categorical...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    WeChatMsg

    WeChatMsg

    Project aimed at extracting, exporting, and analyzing chat records

    WeChatMsg repository hosts an open-source project aimed at extracting, exporting, and analyzing chat records from the WeChat messaging platform. It provides tools that read local WeChat database files and allow users to convert chat data into readable formats such as HTML, Word, and CSV, making it possible to inspect conversations outside the mobile app environment. Beyond simple export, the project includes mechanisms for analyzing chat histories and generating annual reports or visual summaries about messaging trends, interaction patterns, and more. ...
    Downloads: 221 This Week
    Last Update:
    See Project
  • AI Powered Global HCM for the Evolving World of Work Icon
    AI Powered Global HCM for the Evolving World of Work

    For Start-ups, SME's, Large Enterprise

    Darwinbox is a new-age & disruptive mobile-first, cloud-based HRMS platform built for the large enterprises to attract, engage and nurture their most critical resource - talent. It is an end-to-end integrated HR system that aids in streamlining activities across the employee lifecycle (Hire to Retire). Our powerful enterprise product features are built with a clear focus on intuitiveness and scalability, with standards of best in class consumer apps. Darwinbox’s motto is to engage, empower, and inspire employees on one side in addition to automating and simplifying all HR processes for the enterprise on the other. Over 350+ leading enterprises with 850k users manage their entire employee lifecycle on this unified platform.
    Learn More
  • 5
    Satori

    Satori

    Enlightened library to convert HTML and CSS to SVG

    Enlightened library to convert HTML and CSS to SVG. Satori supports the JSX syntax, which makes it very straightforward to use. Satori will render the element into a 600×400 SVG, and return the SVG string. Under the hood, it handles layout calculation, font, typography and more, to generate a SVG that matches the exact same HTML and CSS in a browser. Satori only accepts JSX elements that are pure and stateless. You can use a subset of HTML elements (see section below), or custom React...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    BudouX

    BudouX

    Standalone, small, language-neutral

    Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning-powered line break organizer tool. It is standalone. It works with no dependency on third-party word segmenters such as Google cloud natural language API. It is small. It takes only around 15 KB including its machine learning model. It's reasonable to use it even on the client-side. It is language-neutral. You can train a model for any language by feeding a dataset to BudouX’s training...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    draw-a-ui

    draw-a-ui

    Draw wireframe sketches and generate HTML with AI vision models

    draw-a-ui is an experimental open source application that converts hand-drawn interface wireframes into working HTML code using artificial intelligence. draw-a-ui combines the tldraw canvas drawing tool with a vision-capable language model to interpret user-created mockups and translate them into a single HTML layout styled with Tailwind CSS. When a user sketches a UI on the canvas, the application captures the current drawing as SVG, converts it into a PNG image, and sends that image to a vision model that generates the corresponding markup. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Text Generation Web UI

    Text Generation Web UI

    Oobabooga - The definitive Web UI for local AI, with powerful features

    A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS)....
    Downloads: 47 This Week
    Last Update:
    See Project
  • GWI: On-demand Consumer Research Icon
    GWI: On-demand Consumer Research

    For marketing agencies and media organizations requiring a solution to get consumer insights

    Need easy access to consumer insights? Our intuitive platform is the answer. Get the ultra-reliable research that brands and agencies need to stay ahead of changing consumer behavior.
    Learn More
  • 10
    Automatic text summarizer

    Automatic text summarizer

    Module for automatic summarization of text documents and HTML pages

    Sumy is an automatic text summarization library that provides multiple algorithms for extracting key content from documents and articles. Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains a simple evaluation framework for text summaries. Implemented summarization methods are described in the documentation. I also maintain a list of alternative implementations of the summarizers in various programming languages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    PasteMD

    PasteMD

    Paste Markdown and AI responses into Word Excel instantly fast

    PasteMD is a lightweight desktop utility designed to streamline the process of transferring formatted content from the clipboard into office applications such as Word, WPS, and Excel. It primarily targets users who frequently copy content from AI chat tools or web pages and encounter formatting issues, especially with Markdown, tables, and LaTeX formulas. PasteMD operates from the system tray and monitors clipboard content, automatically converting Markdown or HTML into properly formatted...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    DotVVM

    DotVVM

    Open source MVVM framework for Web Apps

    DotVVM is an open-source framework for ASP.NET. It lets you create web apps using the MVVM pattern, with just C# and HTML. DotVVM can be used to build new ASP.NET Core web apps, or to modernize legacy ASP.NET apps and migrate them to .NET 5. Save your time with GridView, FileUpload and other components shipped with the framework. Don't spend the time building an API.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    ChatGPT Exporter

    ChatGPT Exporter

    Export and Share your ChatGPT conversation history

    ChatGPT Exporter is a browser-based userscript tool designed to export ChatGPT conversations into multiple structured and shareable formats, enabling users to preserve, analyze, and reuse AI-generated content outside the ChatGPT interface. It integrates directly into the ChatGPT web environment, typically via tools like Tampermonkey, and adds export functionality without requiring backend services or complex setup. The tool supports a wide range of output formats including plain text, HTML,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Remotion

    Remotion

    Make videos programmatically with React

    Remotion is a cutting-edge library that lets developers create real videos programmatically using React components, transforming familiar UI paradigms into a flexible, code-driven video production workflow. Instead of traditional timeline editors, Remotion leverages HTML, CSS, and JavaScript to define video frames, animations, and transitions, which means developers can use states, props, loops, and component hierarchies to automate complex motion graphics. Because it integrates with the...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. It is capable of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Reader LLM

    Reader LLM

    Convert any URL to an LLM-friendly input with a simple prefix

    ...In addition to converting individual pages, the service can perform web searches and return relevant content that can be ingested directly by AI systems. The tool relies on specialized models and parsing techniques to handle complex HTML structures and extract meaningful content while preserving important context.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. LLM Scraper integrates...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Pandas Profiling

    Pandas Profiling

    Create HTML profiling reports from pandas DataFrame objects

    pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    ...Configurable label formats let you customize the visual interface to meet your specific labeling needs. Support for multiple data types including images, audio, text, HTML, time-series, and video.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 22
    Groq AppGen

    Groq AppGen

    Project showcasing Llama 3.3 70B HTML codegen abilities

    Groq AppGen is an interactive web application (built with Next.js and TypeScript) that uses Groq’s LLM API to generate or modify web application code based on natural-language prompts. Essentially, you tell the app what kind of web app or page you want (in plain English), and groq-appgen will produce HTML/JSX code scaffolding, layout, and optionally application logic accordingly. It supports iterative feedback: you can refine your prompt, adjust parameters or requirements, and have the app...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    pwa-asset-generator

    pwa-asset-generator

    Automates PWA asset generation and image declaration

    Automates PWA asset generation and image declaration. Automatically generates icon and splash screen images, favicons and mstile images. Updates manifest.json and index.html files with the generated images according to Web App Manifest specs and Apple Human Interface guidelines. When you build a PWA with a goal of providing native-like experiences on multiple platforms and stores, you need to meet with the criteria of those platforms and stores with your PWA assets; icon sizes and splash...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    POML

    POML

    Prompt Orchestration Markup Language

    POML, or Prompt Orchestration Markup Language, is a structured markup language created to improve the organization and maintainability of prompts used in large language model applications. Traditional prompt engineering often relies on unstructured text, which can become difficult to manage as prompts grow more complex and incorporate dynamic data sources. POML addresses this issue by introducing an HTML-like syntax that allows developers to organize prompts into structured components such...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    DocsGPT

    DocsGPT

    Private AI platform for agents, enterprise search and RAG pipelines

    DocsGPT is an open-source AI platform for deploying private RAG pipelines, AI agents, and enterprise search on your own infrastructure. Connect any data source (PDFs, DOCX, CSV, Excel, HTML, audio, GitHub, databases, URLs) and get accurate, hallucination-free answers with source citations. Choose your LLM: OpenAI, Anthropic, Google Gemini, or local models.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB