Showing 123 open source projects for "python web crawler"

View related business solutions
  • Rezku Point of Sale Icon
    Rezku Point of Sale

    Designed for Real-World Restaurant Operations

    Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.
    Learn More
  • Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight Icon
    Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight

    Lock Down Any Resource, Anywhere, Anytime

    CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.
    Learn More
  • 1
    THIS PROJECT IS DEAD. For the real appcelerator project, please visit http://www.appcelerator.com
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    A J2EE Web Dev Framework, Struts style MVC, Event Driven like JSF and Ajax-enabled Client Scripting like YUI, Fine grained event binding, access to server variables in JavaScript & Webpages, Easy integration with Struts, No custom tags & No complex API.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Retriever is a simple crawler packed as a Java library that allows developers to collect and manipulate documents reachable by a variety of protocols (e.g. http, smb). You'll easily crawl documents shared in a LAN, on the Web, and many other sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    YES Linux a distribution that is focused on ease of use, user experience, and the internet. In 3 screens a secure server is installed and administered from a browser. A user should not have to use the console, but can if they wish.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 5
    Not Another Web Server is an extensible Web Server framework, providing a basic Web Server along with a large toolkit of services supporting Bean Shell, Groovy, Python, email, ldap, and much more!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    An HTTP Web server written in the Java programming language. Currently under active development. Support for PHP, Perl, and creation of custom java plugins Planned support for Python
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    eDemPS is a dynamic web content management system built integrating several OpenSource projects. Its environment makes it an ideal tool for developing small or large community websites or portals.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    LogCrawler is an ANT task for automatic testing of web applications. Using a HTTP crawler it visits all pages of a website and checks the server logfiles for errors. Use it as a "smoketest" with your CI system like CruiseControl.
    Downloads: 0 This Week
    Last Update:
    See Project
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 10
    JLink lets users author flow charts based on ISO 5807 and IBM standards. Developers can use JLink to add flowcharts to applications, serve a flow chart over the web in PDF or PNG, or dynamically create a flowchart with Javascript, Python or Ruby scripts
    Leader badge
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Crow - Computational Representation Of Whatever. A platform for the integration and mining of complex and distributed data. Represents cross-linked semantic web documents as a network of software objects and offers easy ways to filter, and sort them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    The goal of zAutomation project is to design/implement hardware, firmware and software for remote control and monitoring of physical objects, by using the ZigBee technology and internet. The field of application is robotics and domotics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Course Crawler is an application to compile term-definition pair from multiple web glossaries into a centralized, stable, and searchable location.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Crawl-By-Example runs a crawl, which classifies the processed pages by subjects and finds the best pages according to examples provided by the operator. Crawl-By-Example is a plugin to the Heritrix crawler, and was done as a part of GSoC06 program.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CAjax is a library for building AJAX style web applications, which consists of a javascript package for easy writing object-oriented AJAX logic in javascript, and a set of server-side packages dealing with unified communication details.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    J-Obey is a Java Library/package, which allows people writing their own crawlers to have a stable Robots.txt parser, if you are writing a web crawler of some sort you can use J-Obey to take out the hassle of writing a Robots.txt parser/intrepreter.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    deface-no-tnx is an anti-defacement system that monitors your Web files and notifies you about unallowed changes. It also replaces the defaced page with a standard "error" page,so that no offensive/joking content can be frauodolently added to your site
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    XSDB XML is to DATA as HTML is to DOCUMENT. Publish and combine data as easily as HTML format and web browsers publish and view documents. Implementations in Python, javascript, java, C#/.NET.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A drop-in framework for adding tagging (folksonomy) capabilities to existing applications
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Webitable is a free open source content management system that allows you to rapidly build web-sites and web-applications by mouse clicks (high-level) or Java programming (low-level) or in between. Written in Java and may be rewritten in Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    SmartCrawler is a java-based fully configurable, multi-threaded and extensible crawler, which is able to fetch and analyze the contents of a web site by using dinamically pluggable filters
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    "Virtual Infrastructure for Applications and Services Over IP" ViaSIP_NG using latest OpenCloudComputing recommendations to develop Scalable Private-Public cloud platforms. ViaSIP is leveraging ODS - LinkedData, CouchDB, Eucalypus, DatR.ws & Web2Py.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    #ProgX is a light, bleeding edge engine targeted for easy and rapid web site building. Based on PHP, MySQL, XML and XSLT, this web site provides the minimal requirements to create your web site in a few minutes. Sources are only avalaible on CVS.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB