Showing 123 open source projects for "python web crawler"

View related business solutions
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • The full-stack observability platform that protects your dataLayer, tags and conversion data Icon
    The full-stack observability platform that protects your dataLayer, tags and conversion data

    Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

    Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.
    Learn More
  • 1
    dynamide
    dynamide is a dynamic web application framework for handling the presentation and business layers in a traditional web app. See http://dynamide.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Constellio Enterprise Search engine

    Constellio Enterprise Search engine

    Open source Search Engine and Enterprise Search

    Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    PACS VM

    PACS VM

    ISO - Customized version of dcm4chee 2.17.3 for MySQL.

    1. Add JBoss Application Server 4.2.3.GA for JDK 6. 2. Cleanup for Windows and deprecated files. 3. Off CONSOLE records - http://forums.dcm4che.org/jiveforums/thread.jspa?messageID=4787
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.
    Downloads: 4 This Week
    Last Update:
    See Project
  • SoftCo: Enterprise Invoice and P2P Automation Software Icon
    SoftCo: Enterprise Invoice and P2P Automation Software

    For companies that process over 20,000 invoices per year

    SoftCo Accounts Payable Automation processes all PO and non-PO supplier invoices electronically from capture and matching through to invoice approval and query management. SoftCoAP delivers unparalleled touchless automation by embedding AI across matching, coding, routing, and exception handling to minimize the number of supplier invoices requiring manual intervention. The result is 89% processing savings, supported by a context-aware AI Assistant that helps users understand exceptions, answer questions, and take the right action faster.
    Learn More
  • 5
    Screenshot Paste plugin for Trac

    Screenshot Paste plugin for Trac

    A Trac plugin to allow pasting screenshots or images with one click

    A Trac plugin to allow pasting screenshots or other images captured or copied in the clipboard directly as attachements to tickets, Wiki pages, etc., without the need to first saving as images and then uploading them. Once the plugin is installed in Trac, you can easily attach a screenshot or any image you have in the clipboard to a Ticket or Wiki page, with one click.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Seven-Labs

    Seven-Labs

    Application Development

    This repository serves as our entire project space which contains all of the open-source projects we've worked on. - C/C++ - C#/.NET - PHP - HTML5/CSS3
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    ERP / BI / NFe
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    This is the Open Source RESTful client for the take.io platform.
    Downloads: 0 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 10
    I host the global virtual machine here! It is a virtual machine build on top of JVM, which provides a unified access to resources including threads and files in on vevery node.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Ex-Crawler
    Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    A school project consisting of a crawler, a server and a searchpage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    ItSucks
    This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Reporting engine library written in C. Create one XML file and generate PDF, HTML, TXT, and CSV reports based on queries. Has support for MySQL, PostgreSQL, ODBC. Bindings for PHP, Java, Python.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Freejack is a MVC framework for quickly building dynamic websites using Freemarker as the template engine and Python for the controllers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    MaxMind GeoIP is a set of APIs for looking up the location of an IP address, including the country, region, city, latitude, and longitude. Free GeoLite databases are available at http://www.maxmind.com/app/geolitecity
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MuSE-CIR is a Multigram-based Search Engine and Collaborative Information Retrieval system. Written in Java /JSP, supports any JDBC connectable database - thoroughly tested only with OracleXE, and somewhat with MySQL, JSP on Apache Tomcat 5.5
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    JyWeb is a lightweight webserver written in Jython. It also uses Jython which is in the archives for your convenience and in the SVN in the release folder so you can build your own jars.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ZK Light is renamed to ZKuery and moved to http://code.google.com/p/zkuery/. ZK Light is a client-only version of ZK; Support Java, C, PHP, Python...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    jSEO -- Pluggable SEO (Search Engine Optimization) for dynamic JEE web applications
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    nxs crawler is a program to crawl the internet. The program generates random ip numbers and attempts to connect to the hosts. If the host will answer, the result will be saved in a xml file. After than the crawler will disconnect... Additionally you can
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    Barding is the quest to find and interact with interesting people. It is like geocaching, but with humans. The official goal of the game is to find Bards and collect their stories, songs, poems and artistic expressions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MyOODB is a Database, Web and Application Framework. A holistic approach to software development. Bringing the power of Object-Oriented-Design back to Software Development (Java, Jython, and JavaScript/AJAX).
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB