Showing 10 open source projects for "web crawler source code"

View related business solutions
  • Quality and compliance software for growing life science companies Icon
    Quality and compliance software for growing life science companies

    Unite quality management, product lifecycle, and compliance intelligence to stay continuously audit-ready and accelerate market entry

    Automate gap analysis across FDA, ISO 13485, MDR, and 28+ regulatory standards. Cross-map evidence once, reuse across submissions. Get real-time risk alerts and board-ready dashboards, so you can expand into new markets with confidence
    Learn More
  • AI Powered Global HCM for the Evolving World of Work Icon
    AI Powered Global HCM for the Evolving World of Work

    For Start-ups, SME's, Large Enterprise

    Darwinbox is a new-age & disruptive mobile-first, cloud-based HRMS platform built for the large enterprises to attract, engage and nurture their most critical resource - talent. It is an end-to-end integrated HR system that aids in streamlining activities across the employee lifecycle (Hire to Retire). Our powerful enterprise product features are built with a clear focus on intuitiveness and scalability, with standards of best in class consumer apps. Darwinbox’s motto is to engage, empower, and inspire employees on one side in addition to automating and simplifying all HR processes for the enterprise on the other. Over 350+ leading enterprises with 850k users manage their entire employee lifecycle on this unified platform.
    Learn More
  • 1
    Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    elk is a powerful open-source python based command-line web crawler that can recursively search for files and text on websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    "Filtered communication" is the source code for a website which facilitates collaborative filtering of information on the internet. Users can create "filters", criteria which are defined in English. Activity mode (http://bayleshanks.com/pamv1): aslee
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    FTP crawler is designed to provide an easy web interface to searching files on the FTP and a crawler to index files on FTP servers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Haystack is a modern, engaging, and intuitive intranet platform that employees actually use. Icon
    Haystack is a modern, engaging, and intuitive intranet platform that employees actually use.

    You Deserve the Best Intranet Experience

    With customizable iOS and Android mobile apps, Slack and Microsoft Teams integrations, and an intuitive design employees love, Haystack brings an outstanding digital employee experience to your entire workforce, no matter where their work takes them.
    Learn More
  • 5
    zSearch is a simple python based crawler and search engine. Raw HTML are stored in bzip2 archives, the index is created using pylucene, and twsited is used to provide internal http server. Results are sent back as XML over HTTP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Webcomic Archive and News Generator (WANG) is a database driven PHP application built for both aspiring and existing web comics. Written with a focus on security and speed, the code is built to be easy to use for code novices and experts alike.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PySMBSearch is a crawler and search engine for SMB shares. It consists of a crawler script, which creates an index and stores it in an SQL database, and a CGI script that can be used to extract queries from the database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Frosttie (FROnt-end SchemaTron Text Internet Engine) takes XHTML pages and processes them with various user-definable filters such a W3C's WAI, Section 508 (US) web usability compliance, ad removal, etc. It can be used with zKnowMan.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Feroot AI automates website security with 24/7 monitoring Icon
    Feroot AI automates website security with 24/7 monitoring

    Trusted by enterprises, healthcare providers, retailers, SaaS platforms, payment service providers, and public sector organizations.

    Feroot unifies JavaScript behavior analysis, web compliance scanning, third-party script monitoring, consent enforcement, and data privacy posture management to stop Magecart, formjacking, and unauthorized tracking.
    Learn More
  • 10
    Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB