Showing 51 open source projects for "java html parser"

View related business solutions
  • The full-stack observability platform that protects your dataLayer, tags and conversion data Icon
    The full-stack observability platform that protects your dataLayer, tags and conversion data

    Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

    Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.
    Learn More
  • Turn traffic into pipeline and prospects into customers Icon
    Turn traffic into pipeline and prospects into customers

    For account executives and sales engineers looking for a solution to manage their insights and sales data

    Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.
    Learn More
  • 1
    BudouX

    BudouX

    Standalone, small, language-neutral

    Standalone. Small. Language-neutral. BudouX is the successor to Budou, the machine learning-powered line break organizer tool. It is standalone. It works with no dependency on third-party word segmenters such as Google cloud natural language API. It is small. It takes only around 15 KB including its machine learning model. It's reasonable to use it even on the client-side. It is language-neutral. You can train a model for any language by feeding a dataset to BudouX’s training...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    LlamaParse

    LlamaParse

    Parse files for optimal RAG

    LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Agents-Flex

    Agents-Flex

    Agents-Flex is an elegant LLM Application Framework like LangChain

    Agents-Flex includes a variety of network protocols for connecting LLMs, such as HTTP, SSE and WS. Its simple and flexible design allows developers to easily connect to various LLMs, including OpenAI, LLama, and other AI. Agents-Flex provides a rich set of development templates and Prompt Frameworks, including FEW-SHOT, CRISPE, BROKE, and ICIO. Developers can also customize their own unique prompt templates. Agents-Flex has a very flexible Function Calling component. It supports local method...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Rezku Point of Sale Icon
    Rezku Point of Sale

    Designed for Real-World Restaurant Operations

    Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.
    Learn More
  • 5
    MyBox

    MyBox

    Easy Tools of PDF, Image, File, Network, Data, and Medias

    javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    chessPDFBrowser

    chessPDFBrowser

    Chess application whichs allows working with chess PDF books and PGNs.

    Chess application which allows working with PDFs and PGNs. You can work with the chess games of the PDF and edit their tree of variants. Graphical environment. Standard PGN TAGs. PGN comments. Ocr like (Fen string detection from chess board position images). Connection to Uci chess engines (like stockfish). Position analysis, full game analysis. You can now play games against uci engines. pdf2pgn command line command included. Detailed documentation. Multilanguage...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 7
    OpenKM Document Management - DMS

    OpenKM Document Management - DMS

    Document Management System and Content Management System

    OpenKM Community Edition is a free Document Management System (DMS) that helps businesses control the production, storage, management and distribution of electronic documents, boosting effectiveness and productivity. It integrates document management, collaboration and advanced search into one easy-to-use solution, including administration tools for user roles, access control, security levels, activity logs and automation setup. With OpenKM Community Edition you can: Collect information...
    Leader badge
    Downloads: 455 This Week
    Last Update:
    See Project
  • 8
    Conversations

    Conversations

    App in java for chatting to a generative A.I. (involving tts and stt)

    ...operation=countAndForward&url=https%3A%2F%2Ffrojasg1.com%2Fdemos%2Faplicaciones%2Fchat%2F20240815.Demo.Chat.mp4%3Forigin%3Dsourceforge&origin=web More about it at this web site: https://www.frojasg1.com:8443/downloads_web/web/html/conversaciones.html?origin=sourceforge
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Iris Powered By Generali - Iris puts your customer in control of their identity. Icon
    Iris Powered By Generali - Iris puts your customer in control of their identity.

    Increase customer and employee retention by offering Onwatch identity protection today.

    Iris Identity Protection API sends identity monitoring and alerts data into your existing digital environment – an ideal solution for businesses that are looking to offer their customers identity protection services without having to build a new product or app from scratch.
    Learn More
  • 10
    TXM

    TXM

    Unicode XML TEI text analysis platform

    TXM is a free and open-source cross-platform Unicode & XML based text analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP...
    Leader badge
    Downloads: 16 This Week
    Last Update:
    See Project
  • 11
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    Pdf-extractor is a wrapper around pdf.js to generate images, svgs, html files, text files and json files from a pdf on node.js. A DOM Canvas is used to render and export the graphical layer of the pdf. Canvas exports *.png as a default but can be extended to export to other file types like .jpg. Pdf objects are converted to svg using the SVGGraphics parser of pdf.js. Pdf text is converted to HTML. This can be used as a (transparent) layer over the image to enable text selection. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    html2canvas

    html2canvas

    A JavaScript HTML screenshot renderer

    html2canvas is a JavaScript HTML renderer. The script provides you with the tools to take screenshots of webpages directly on the browser. The screenshot is based on the DOM and therefore, it may not be 100% accurate to the real representation, given that it is not an actual screenshot, but a type of screenshot built based on the available data and information of the page. The script renders such page as a canvas image, by reading the DOM and the different styles of the featured elements. It...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 13
    libpostal

    libpostal

    A C library for parsing/normalizing street addresses around the world

    A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data. libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere. Addresses and the locations they represent are essential for any application dealing with maps (place search, transportation, on-demand/delivery services,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Leseratte is a Java parser for German written language. Currently, it contains a German lexicon (based on the Wiktionary), inflexion rules, a grammar and a parser. (Semantics component planned.) Usable as a Java library, also provides a graphical UI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Provides a GUI interface to grammatical structure and relations (as parsed by the Stanford Parser) of any text.Contains grammatical relation editor to modify, import, export grammatical relation definitions (tregex patterns and features).
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    perkun

    perkun

    two experimental AI languages + zubr

    Two experimental AI languages - Perkun and its successor Wlodkowic. Attempt to maximize the expected value of the payoff function by appropriate choosing the actions (output variables values). The package contains also a tool called zubr - a Java code generator based on Perkun. Take also a look at my blog: http://pawel-biernacki.blogspot.fi/ For Windows users there is an installer: http://www.pawelbiernacki.net/perkun.msi
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    MIT Deep Learning Book

    MIT Deep Learning Book

    MIT Deep Learning Book in PDF format by Ian Goodfellow

    The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville. An MIT Press book Ian Goodfellow and Yoshua Bengio and Aaron Courville. Written by three experts in the field, Deep Learning is...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Command Line Parser GetPot

    Command Line Parser GetPot

    Tool to parse the command line and configuration files.

    Powerful command line and configuration file parsing for C++, Python, Ruby and Java (others to come). This tool provides many features, such as separate treatment for options, variables, and flags, unrecognized object detection, prefixes and much more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Panzer Combat II

    Panzer Combat II

    Computer-assisted miniature tank game.

    Panzer Combat II is a multi-player voice and webcam enabled computer-assisted distributed miniature wargame of World War II tank combat. Firing is done by placing a webcam behind the aiming unit. Distance to target is computed using computer vision. Action inside the tanks is performed on the computer screen while battlefield strategy is played on the miniature terrain. Both camps can use a different laptop or tablet, the game will interconnect. You can try it online :...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20

    smartblob

    tiny code, html webcam game or plug brain in each blob, java server

    You play a 2d blob that reshapes to grab bend bounce and swing on objects floating in midair (like a platformer game), except acrobaticly your view spins when your blob does. I'm planning a huge multiplayer world, some blobs played by people holding a bendable loop game controller (tape a 1 meter cut of thick extension cord into a loop) in front of webcam and bend it to bend your blob on screen, and other blobs controlled by AI. This is a game for general AI research in a fun way people can...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Rootvole

    a text parsing library that matches text with concepts.

    For general processing of voice queries we developed a text parsing library named 'Rootvole' that can be used to match text with semantic concepts. The algorithm was implemented in Java and can be described as a form of a parsing expression grammar, where we generate the expressions to be detected beforehand by regular expressions and store them in a vocabulary. The central class is the parser class, which is instantiated as a series of vocabularies, simple text lists that describe tokens and synonyms, and value descriptors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Intelligent Keyword Miner

    Intelligent Keyword Miner

    Intelligent SEO keyword miner and predicing tool

    THIS IS A NETBEANS 8.02 PROJECT ENGLISH ONLY This program was made to help me with the patent research. It simply generates the search keywords, based on your upvotes or a downvotes of the input parameters. It can accept a text or URL (text takes a prescedence over the URL). If you input URL, it goes to a page, and learns its text from HTML format. This program is intelligent as it predicts what you may want to search next, based on your personal trends. After searching the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Azul OS

    Azul OS

    Azul OS version dev(Linux) IA

    ... # Azul voice version windows Azul interface . Disponible # Azul dev rev 0.4.1 . Disponible [changelog] software added : php5-mysql gcc-c++ php5-gd php5-ctype perl-HTML-Tagset php5-zip php5-curl kernel-source mysql-connector-java php5-pear php5-mcrypt php5-ftp devel_C_C++ gimp gedit recode libreoffice MozillaFirefox wireshark audacity nano This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. #Blog : http://azul0.wordpress.com/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    MSTParser

    MSTParser is a non-projective dependency parser that searches for maxi

    MSTParser is a non-projective dependency parser that searches for maximum spanning trees over directed graphs. Models of dependency structure are based on large-margin discriminative training methods. Projective parsing is also supported. mstparser 0.5.1 is now available via Maven Central. If you use Maven as your build tool, then you can add it as a dependency in your pom.xml file: <dependency> <groupId>net.sourceforge.mstparser</groupId> <artifactId>mstparser</artifactId> ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB