Showing 113 open source projects for "pentaho data integration"

View related business solutions
  • Rezku Point of Sale Icon
    Rezku Point of Sale

    Designed for Real-World Restaurant Operations

    Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.
    Learn More
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 1
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    fluentbit

    fluentbit

    Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

    ...No more OOM errors! Integration with all your technology, cloud-native services, containers, streaming processors, and data backends. Fully event-driven design leverages the operating system API for performance and reliability. All operations to collect and deliver data are asynchronous.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 4
    SkyCrypt

    SkyCrypt

    A Hypixel skyblock stats website

    SkyCrypt is a web-based application that allows players of Hypixel SkyBlock to view and share detailed information about their in-game profiles through a visually rich interface. It aggregates data from the Hypixel API and presents it in an organized format, including player statistics, skills, equipment, and inventory details. The project is built with a Node.js-based stack and integrates additional technologies such as MongoDB and Redis to handle data storage and caching. SkyCrypt enhances...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Field Sales+ for MS Dynamics 365 and Salesforce Icon
    Field Sales+ for MS Dynamics 365 and Salesforce

    Maximize your sales performance on the go.

    Bring Dynamics 365 and Salesforce wherever you go with Resco’s solution. With powerful offline features and reliable data syncing, your team can access CRM data on mobile devices anytime, anywhere. This saves time, cuts errors, and speeds up customer visits.
    Learn More
  • 5
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Endroid QR Code

    Endroid QR Code

    QR Code Generator

    Endroid QR Code is a PHP library that allows developers to generate QR codes with customizable parameters. It supports creating QR codes in various formats, including PNG and SVG, and offers options for encoding URLs, text, or other data. The library is flexible and easy to integrate into applications that require QR code generation, such as ticketing systems or payment gateways.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Open Web Analytics (OWA)

    Open Web Analytics (OWA)

    Official repository for Open Web Analytics

    Open Web Analytics (OWA) is an open-source web analytics framework that tracks and analyzes visitor behavior on websites and applications. It provides insights into page views, user demographics, and engagement metrics. OWA can be self-hosted, giving users full control over their data. It is an alternative to commercial analytics platforms and supports integration with WordPress and other CMS platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SearXNG

    SearXNG

    Free internet metasearch engine which aggregates

    SearXNG is a free and open-source metasearch engine designed to aggregate results from multiple search engines while prioritizing user privacy and anonymity. Instead of maintaining its own index, it queries numerous external search providers and merges the results into a single interface, increasing coverage and diversity of information. One of its core principles is privacy, as it does not track users, store personal data, or create search profiles, making it a strong alternative to...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 9
    EmDash

    EmDash

    EmDash is a full-stack TypeScript CMS based on Astro

    ...It emphasizes modularity and extensibility, allowing developers to define custom content types, editing experiences, and workflows tailored to their needs. The system likely includes a rich editing interface, enabling users to create complex documents with structured data and formatting. It separates content from presentation, making it easier to reuse and distribute content across different platforms. The architecture supports integration with APIs and external services, enabling it to function as part of a larger content ecosystem. It is particularly useful for applications that require dynamic content management and scalable publishing workflows. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • 10
    YourInfo

    YourInfo

    Real-time browser fingerprinting demo with cross-browser tracking

    YourInfo is a personal information management tool designed to let users securely store, structure, and retrieve their key data — such as contacts, credentials, personal notes, and preferences — while also enabling AI-assisted queries or reminders using that data. The platform prioritizes privacy by focusing on local storage or user-controlled databases, ensuring sensitive data stays under the user’s control rather than in third-party servers. Users can define types of information, tag...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ZetaJS

    ZetaJS

    JS wrapper for ZetaOffice in the browser

    The zeta.js library provides the facilities to run an instance of ZetaOffice integrated into your web site, allowing you to control it with JavaScript code via the LibreOffice UNO technology. Use cases range from an in-browser office suite that looks and feels just like its desktop counterpart, to fine-tuned custom text editing and spreadsheet capabilities embedded in your website, to a headless zetajs instance that does document conversion in the background.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    fess

    fess

    Open source enterprise search server for websites, files, and data

    Fess is an open source enterprise search server designed to provide powerful full-text search capabilities across multiple data sources. It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    BrowserOS

    BrowserOS

    Agentic browser; privacy-first alternative to ChatGPT Atlas

    BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 14
    Scira

    Scira

    AI-powered search engine that helps you find information

    Scira is an open source AI-powered search and research assistant designed to provide fast, conversational answers grounded in web and knowledge sources. The project combines a modern web interface with retrieval-augmented generation techniques to deliver responses that are both natural language friendly and evidence oriented. It is built for developers who want to deploy their own Perplexity-style or AI search experience without relying on proprietary hosted services. Scira emphasizes speed,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Shynet

    Shynet

    Modern, privacy-friendly, and detailed web analytics

    Modern, privacy-friendly, and detailed web analytics that works without cookies or JS. There are a lot of web analytics tools. Unfortunately, most of them come with the following caveats. They require handing all of your visitors' info to a third-party company They use cookies to track visitors across sessions, so you need to have those annoying cookie notices. They collect so much personal data that even the NSA is jealous. They are closed source and/or expensive, often with limited data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Grafana Mimir

    Grafana Mimir

    Grafana Mimir provides long-term storage for Prometheus

    Grafana Mimir is an open-source, horizontally scalable, long-term storage solution for Prometheus metrics. Built by Grafana Labs, Mimir is designed to handle massive volumes of time-series data efficiently while maintaining high availability and reliability. It enables organizations to scale their Prometheus infrastructure without the typical limitations of single-server setups. Mimir is used to power Grafana Cloud Metrics and is built to be fully compatible with Prometheus, allowing easy integration into existing monitoring workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    ...SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit, and OpenCV. These tools enable powerful and highly-scalable predictive and analytical models for a variety of data sources. SynapseML also brings new networking capabilities to the Spark Ecosystem. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    CloudServer

    CloudServer

    Zenko CloudServer open-source Node.js implementation of S3 protocol

    Zenko CloudServer, an open-source Node.js implementation of the Amazon S3 protocol on the front-end and backend storage capabilities to multiple clouds, including Azure and Google.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    GoatCounter

    GoatCounter

    Easy web analytics. No tracking of personal data

    GoatCounter is an open-source web analytics platform available as a hosted service (free for non-commercial use) or self-hosted app. It aims to offer easy-to-use and meaningful privacy-friendly web analytics as an alternative to Google Analytics or Matomo. Privacy-aware; doesn’t track users with unique identifiers and doesn't need a GDPR notice. Fine-grained control over which data is collected. Also see the privacy policy and GDPR consent notices. Lightweight and fast; adds just ~3.5KB of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    AWS X-Ray Daemon

    AWS X-Ray Daemon

    The AWS X-Ray daemon listens for traffic on UDP port 2000

    The AWS X-Ray daemon listens for traffic on UDP port 2000, gathers raw segment data, and relays it to the AWS X-Ray API. The daemon works in conjunction with the AWS X-Ray SDKs and must be running so that data sent by the SDKs can reach the X-Ray service. The X-Ray SDK sends segment documents to the daemon to avoid making calls to AWS directly. You can send the segment/subsegment in JSON over UDP port 2000 to the X-Ray daemon, prepended by the daemon header.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    MyBatis Pagination

    MyBatis Pagination

    A pagination plugin

    If you are using MyBatis, it is recommended to try this pagination plugin. This must be the most convenient pagination plugin. PageHelper supports any complex single-table, multi-table queries. As to some special cases, please refer to the Important notes. Through a comprehensive sample code and test, the basic usage of adding, deleting, modifying, and checking operations in the MyBatis XML mode and annotation mode is explained, and the application of dynamic SQL in different aspects and the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Surmon.me

    Surmon.me

    Personal website and blog

    Surmon.me is a full-featured personal website and blog platform built with Vue and designed as part of a larger ecosystem of interconnected applications and services. The project functions as a server-side rendered (SSR) web application that delivers content dynamically while maintaining performance and SEO optimization. It is powered by a dedicated backend service called NodePress, which provides RESTful APIs for content management, data retrieval, and system operations. The platform is not...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB