Showing 46 open source projects for "java html parser"

View related business solutions
  • Polygon Software | Apparel Software | PLM and ERP Solutions Icon
    Polygon Software | Apparel Software | PLM and ERP Solutions

    Small to mid-sized sewn goods manufacturers and textile mills.

    PolyPM is an integrated enterprise resource planning (ERP) and product lifecycle management (PLM) solution developed by Polygon Software. Built for small to medium-sized apparel manufacturers, PolyPM enables businesses to integrate all aspects of the product development, supply chain and production processes, as well as instantly access all their style and manufacturing information anywhere in the world. This allows businesses to shorten time-to-market, incur lower development costs, and improve customer service and worker productivity.
    Learn More
  • Time tracking software for the global workforce Icon
    Time tracking software for the global workforce

    Teams of all sizes and in various industries that want the best time tracking and employee monitoring solution.

    It's easy with Hubstaff, a time-tracking and workforce management platform that automates almost every aspect of running or growing a business. Teams can track time to projects and to-dos using Hubstaff's desktop, web, or mobile applications. You'll be able to see how much time your team spends on different tasks, plus productivity metrics like activity rates and app usage through Hubstaff's online dashboard. Most of the available features are customizable on a per-user basis, so you can create the team management tool you need.
    Learn More
  • 1
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Zendesk: The Complete Customer Service Solution Icon
    Zendesk: The Complete Customer Service Solution

    Discover AI-powered, award-winning customer service software trusted by 200k customers

    Equip your agents with powerful AI tools and workflows that boost efficiency and elevate customer experiences across every channel.
    Learn More
  • 5
    JuniCoder is a Java project that uses unicode as a base for decoding and encoding formats that invented workarounds to express characters not covered by ASCII. Decoders translate those inventions to unicode. Encoders encode to these inventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Empower Your Workforce and Digitize Your Shop Floor Icon
    Empower Your Workforce and Digitize Your Shop Floor

    Benefits to Manufacturers

    Easily connect to most tools and equipment on the shop floor, enabling efficient data collection and boosting productivity with vital insights. Turn information into action to generate new ideas and better processes.
    Learn More
  • 10
    ChanDown - 4chan Image Downloader

    ChanDown - 4chan Image Downloader

    Auto Rescanning - Search Terms - Regularly Updated With New Features

    ========== NOTE: (AS OF 11/05/2015) 4chan html structure has changed, full images are downloaded as well as the thumbnail. Fix coming shortly (after my exams are over) to stop the thumbnails from downloading. ========== This is the first release of my 4chan image downloader. This downloader packs loads of great features such as the search ability. Check the features section and be sure to let me know if you want a feature added. Coming Soon: - Wiki, explaining in depth how to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Geoportal Server
    Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 13
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    ssSearchEngine

    keyword search engine for semi-structured data (Tables, lists,...)

    This application implement an approach for doing keyword based search over semi-structured data available in HTML pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    A simple repository utilite that makes a html-page with the list of the directory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    JavaPub is a one-click install BibTex-publications portal based on a simple java codebase. It features a drag-and-drop uploader module to upload BibTex files and a module that generates the html-index and entry-pages for publication listings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Webstats Solr is an attempt to make Apache Access log easier to Data Mine. By adding a powerful Search Engine (SOLR) as a Backend and using Java Script and HTML and maybe PHP I hope to out date AWStats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    EasyGIS simplifies GIS data management, sharing, and publishing. REST interfaces (json, html views). Lucene based FTS searches. Thematic maps, business cartography. Integration with external GIS data providers - Google, OSM.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    HttpFinder is web content searching tool. It enables look for text content that matches given regular expression in html pages/scripts etc. All navigation is performed with use of other regexp which describes links to visit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    JeCARS (Java Extendable Contents And Rights System) is a RESTful webservice which delivers pluggable output formats, e.g. Atom feeds or HTML. Third party applications can be plugged in. A JCR (JSR-170) repository (Jackrabbit) is used for storage.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Simple Porn Downloader is a tiny all Java based application that uses a list of keywords and starting urls to crawl webpages and branch out searching for specific media extensions which are downloaded and presented in an html page.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    The Informa library provides a convenient Java API for handling news channels and metadata about them. Different syntax formats (RSS 0.91, 1.0, 2.0 and Atom 0.3, 1.0) for feeds are supported. Also support for channel information descriptions (OPML) avail
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB