Search Results for "data processing" - Page 5

Showing 1673 open source projects for "data processing"

View related business solutions
  • Complete Data Management for Nonprofits Icon
    Complete Data Management for Nonprofits

    Designed to fit with multi-level non-profit organization, across any sector

    NewOrg is a robust platform built with enhanced features to help non-profit organizations that capture and integrate the information from all of their operational areas to better manage volunteers, clients, programs, outcome reporting, activity sign-ups & scheduling, communications, surveys, fundraising activities and Development campaigns. NewOrg can truly deliver an intuitive product that will help manage your Committees, Donors, Events, and Memberships so that the organization runs efficiently.
    Learn More
  • Intelligent Retail Management Icon
    Intelligent Retail Management

    Retail space, product categories, planograms, automatic ordering, and shelf labels management

    Quant offers a wide range of solutions for retail. Within one integrated software system, it allows you to efficiently combine the management of retail space, shelf labels and marketing materials with task management, reporting and automatic replenishment.
    Learn More
  • 1
    Meetily

    Meetily

    Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper

    This project is a privacy-first AI meeting assistant that captures meeting audio, produces real-time transcripts, and generates summaries while keeping processing entirely on your own machine or infrastructure. It’s built for organizations that want meeting intelligence without sending recordings or transcripts to third-party cloud services, which helps address compliance and data sovereignty requirements. The app supports live transcription with local model options (including Whisper- and Parakeet-based workflows) and presents the transcript as the meeting happens, making it useful both for note-taking and accessibility. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 2
    Kingfisher

    Kingfisher

    Lightweight, pure-Swift library for downloading images from the web

    Kingfisher is a powerful, pure-Swift library for downloading and caching images from the web. It provides you a chance to use a pure-Swift way to work with remote images in your next app. Asynchronous image downloading and caching. Loading image from either URLSession-based networking or local provided data. Useful image processors and filters provided. Multiple-layer hybrid cache for both memory and disk. Fine control on cache behavior. Customizable expiration date and size limit....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    ...The architecture is modular, allowing developers to build customizable processing pipelines that integrate with external tools and data extraction frameworks. Sparrow also includes workflow orchestration tools that allow multiple extraction tasks to be combined into automated pipelines for large-scale document processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    pg_analytics

    pg_analytics

    DuckDB-powered analytics for Postgres

    pg_analytics (formerly named pg_lakehouse) puts DuckDB inside Postgres. With pg_analytics installed, Postgres can query foreign object stores like AWS S3 and table formats like Iceberg or Delta Lake. Queries are pushed down to DuckDB, a high-performance analytical query engine. By transforming Postgres into a performant search and analytics engine, ParadeDB frees your team from the pain of scaling and syncing Elasticsearch.
    Downloads: 50 This Week
    Last Update:
    See Project
  • Cloud-hosted construction project information management for improved communication, and increased efficiency. Icon
    Cloud-hosted construction project information management for improved communication, and increased efficiency.

    Ideal for on-premise project information management.

    Newforma empowers over 4M professionals and 1,500 AECO firms worldwide by revolutionizing Project Information Management. We transform vast amounts of project data into a meticulously organized, easily accessible, and fully searchable resource—all from a single, centralized platform. From pre-construction to years after completion, Newforma ensures you have the critical information you need at every stage of your projects.
    Learn More
  • 5
    AionUi

    AionUi

    Free, local, open-source Cowork for Gemini CLI, Claude Code, Codex

    ...Instead of forcing users to work in separate terminals for each tool, AionUi automatically detects installed CLI tools and provides a central visual workspace where sessions can run in parallel, contexts are preserved, and conversations are saved locally without sending data to external servers. It enhances productivity by offering smart file management features like batch renaming, automatic organization, and intelligent file classification, thereby reducing manual overhead when working with large datasets or complex document structures. AionUi also supports a remote WebUI mode, allowing users to access their local AI tools securely over a network from other devices while keeping all processing and data on their own hardware.
    Downloads: 46 This Week
    Last Update:
    See Project
  • 6
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    WebP Codec

    WebP Codec

    Library to encode and decode images in WebP format

    libwebp is the reference codec library for Google’s WebP image format, providing both encoding and decoding along with command-line tools. It supplies cwebp to compress images into WebP and dwebp to decompress them back, making it easy to test quality/size trade-offs across presets and tuning parameters. The GitHub repository is a mirror; the canonical source of truth lives on Chromium’s git, and developer docs are hosted on WebP’s portal. The project underpins WebP support across browsers,...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 8
    CocoIndex

    CocoIndex

    ETL framework to index data for AI, such as RAG

    CocoIndex is an open-source framework designed for building powerful, local-first semantic search systems. It lets users index and retrieve content based on meaning rather than keywords, making it ideal for modern AI-based search applications. CocoIndex leverages vector embeddings and integrates with various models and frameworks, including OpenAI and Hugging Face, to provide high-quality semantic understanding. It’s built for transparency, ease of use, and local control over your search...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Apache Flink

    Apache Flink

    Stream processing framework with powerful stream

    Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Create engaging surveys on your tablet or computer with ease. Icon
    Create engaging surveys on your tablet or computer with ease.

    Choose any of our carefully designed themes, or easily customize colors, fonts, and more to reflect your brand's true look and feel.

    Create great-looking surveys, forms, polls, voting, questionnaires, NPS, customer satisfaction, customer experience, employee satisfaction surveys... on your computer or tablet, customize the look of your survey however you like, & display collected data with eye-catching and insightful graphics.
    Learn More
  • 10
    PHP Code Coverage

    PHP Code Coverage

    Collection, processing, and rendering functionality for PHP code

    The php-code-coverage library, authored by Sebastian Bergmann, enables collection, processing, and rendering of PHP code coverage data. It integrates with PHPUnit or other testing frameworks to track which lines, methods, or classes are executed during tests. The library supports generating detailed reports in formats like HTML, Clover, or XML, helping teams understand test completeness and identify untested code paths.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    mediasoup

    mediasoup

    Cutting Edge WebRTC Video Conferencing

    mediasoup is a Node.js library that provides a cutting-edge WebRTC server capable of handling real-time communications with efficient media routing and processing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Jimp

    Jimp

    An image processing library written entirely in JavaScript for Node

    An image processing library for Node written entirely in JavaScript, with zero native dependencies. If you're using this library with TypeScript the method of importing slightly differs from JavaScript. Instead of using require, you must import it with ES6 default import scheme. If you're using a web bundles (webpack, rollup, parcel) you can benefit from using the module build of jimp. Using the module build will allow your bundler to understand your code better and exclude things you aren't...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Apache Sedona

    Apache Sedona

    Cluster computing framework for processing large-scale geospatial data

    Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Fast CSV

    Fast CSV

    CSV parser and formatter for node

    A high-performance Node.js library for parsing and formatting CSV data efficiently.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Kestra

    Kestra

    Kestra is an infinitely scalable orchestration and scheduling platform

    Build reliable workflows, blazingly fast, deploy in just a few clicks. Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    ScreenPipe

    ScreenPipe

    AI app store powered by 24/7 desktop history. open source

    Screenpipe is an AI app store powered by continuous desktop history recording. It operates entirely locally, offering developers a platform to build, distribute, and monetize AI applications that leverage comprehensive contextual data from users' desktop activities. ​
    Downloads: 35 This Week
    Last Update:
    See Project
  • 21
    DeepBI

    DeepBI

    LLM based data scientist, AI native data application

    DeepBI is an AI-native data analysis platform. DeepBI leverages the power of large language models to explore, query, visualize, and share data from any data source. Users can use DeepBI to gain data insight and make data-driven decisions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    ...The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    CBIG

    CBIG

    Computational Brain Imaging Group tools

    CBIG is a comprehensive toolkit maintained by Thomas Yeo’s Computational Brain Imaging Group containing tools for processing and analyzing neuroimaging data—including fMRI preprocessing pipelines, brain parcellation algorithms, mental disorder subtyping models, fMRI dynamic models, registrations between brain spaces, and phenotypic prediction algorithms. After cloning/downloading this repository, please see README inside setup directory to see instructions on how to set up your local environment to be compatible with our repository. ...
    Downloads: 105 This Week
    Last Update:
    See Project
  • 24
    Nuclio

    Nuclio

    High-Performance Serverless event and data processing platform

    Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 25
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB