Search Results for "data processing" - Page 5

Showing 1649 open source projects for "data processing"

View related business solutions
  • Gearset | The complete Salesforce DevOps solution Icon
    Gearset | The complete Salesforce DevOps solution

    Salesforce DevOps done right.

    Gearset is the only platform you need for unparalleled deployment success, continuous delivery, automated testing and backups.
    Learn More
  • Save up to 90% off rates for USPS, UPS, DHL Express, and more with the best multi-carrier shipping software for e-commerce businesses. Icon
    Save up to 90% off rates for USPS, UPS, DHL Express, and more with the best multi-carrier shipping software for e-commerce businesses.

    For Small / Medium E-Commerce Businesses

    Whether you're established or just getting started, Shippo is the best shipping software for growing e-commerce brands that need to save time and money, fulfill and ship at scale, and delight customers. Create shipping labels for all carriers & save money with discounted rates. See all your online sales channels in one place and automatically access discounted USPS and DHL Express rates, or use your own carrier accounts. Sign up is free and there are no monthly fees or cancellation fees.
    Learn More
  • 1
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    sharp

    sharp

    High performance Node.js image processing module

    The typical use case for this high speed Node.js module is to convert large images in common formats to smaller, web-friendly JPEG, PNG, AVIF and WebP images of varying dimensions. Resizing an image is typically 4x-5x faster than using the quickest ImageMagick and GraphicsMagick settings due to its use of libvips. Colour spaces, embedded ICC profiles and alpha transparency channels are all handled correctly. Lanczos resampling ensures quality is not sacrificed for speed. As well as image...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    E2M

    E2M

    E2M converts various file types (doc, docx, epub, html, htm, url

    E2M is a SourceForge mirror of the e2m open-source project, which focuses on providing tools or services designed to convert or process content between different formats or systems. Projects with similar naming conventions typically emphasize automation workflows where input data from one environment is transformed into another representation or output structure. The mirrored repository allows users to access the project’s codebase independently from its original hosting platform while preserving the development history and release artifacts. Systems like e2m often serve as middleware components that connect different software systems or facilitate data processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    JOSE JWT

    JOSE JWT

    Ultimate Javascript Object Signing and Encryption (JOSE)

    Minimalistic zero-dependency library for generating, decoding, and encryption JSON Web Tokens. Supports full suite of JSON Web Algorithms and Json Web Keys. JSON parsing agnostic can plug any desired JSON processing library. Extensively tested for compatibility with jose.4.j, Nimbus-JOSE-JWT, and json-jwt libraries. JWE JSON Serialization cross-tested with JWCrypto.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Cortex: Boost Developer Coding Skills Icon
    Cortex: Boost Developer Coding Skills

    Cortex makes coding easier and faster for developers. See how our portal connects tools and cuts busywork.

    Cortex is a simple portal that helps developers work smarter by linking all your tools, setting clear rules, and slashing repetitive tasks. It speeds up onboarding, updates old code, and fixes issues fast. Over 100 big companies use it to save time and get better results.
    Try it now!
  • 5
    Jimp

    Jimp

    An image processing library written entirely in JavaScript for Node

    An image processing library for Node written entirely in JavaScript, with zero native dependencies. If you're using this library with TypeScript the method of importing slightly differs from JavaScript. Instead of using require, you must import it with ES6 default import scheme. If you're using a web bundles (webpack, rollup, parcel) you can benefit from using the module build of jimp. Using the module build will allow your bundler to understand your code better and exclude things you aren't...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Logstash

    Logstash

    Centralize, transform and stash your data

    Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Fast CSV

    Fast CSV

    CSV parser and formatter for node

    A high-performance Node.js library for parsing and formatting CSV data efficiently.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    TDengine

    TDengine

    Open-source time-series database with high-performance and scalability

    Enables efficient, real-time data ingestion, processing and monitoring of TB and even PB scale data per day, generated by billions of sensors and data collectors. TDengine can be widely applied to IoT, Industrial Internet, Connected Vehicles, DevOps, Energy , Finance and many other use-cases. TDengine’s innovative design and purpose-built storage engine outperforms other time-series databases for data ingestion, querying and data compression while significantly reducing storage and computing costs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Meetily

    Meetily

    Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper

    This project is a privacy-first AI meeting assistant that captures meeting audio, produces real-time transcripts, and generates summaries while keeping processing entirely on your own machine or infrastructure. It’s built for organizations that want meeting intelligence without sending recordings or transcripts to third-party cloud services, which helps address compliance and data sovereignty requirements. The app supports live transcription with local model options (including Whisper- and Parakeet-based workflows) and presents the transcript as the meeting happens, making it useful both for note-taking and accessibility. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • Powerful Website Security | Continuous Web Threat Platform Icon
    Powerful Website Security | Continuous Web Threat Platform

    Continuously detect, prioritize, and validate web threats to quickly mitigate security, privacy, and compliance risks.

    Reflectiz is a comprehensive web exposure management platform that helps organizations proactively identify, monitor, and mitigate security, privacy, and compliance risks across their online environments. Designed to address the growing complexity of modern websites, Reflectiz provides full visibility and control over first, third, and even fourth-party components, such as scripts, trackers, and open-source libraries that often evade traditional security tools.
    Learn More
  • 10
    Kestra

    Kestra

    Kestra is an infinitely scalable orchestration and scheduling platform

    Build reliable workflows, blazingly fast, deploy in just a few clicks. Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    pg_analytics

    pg_analytics

    DuckDB-powered analytics for Postgres

    pg_analytics (formerly named pg_lakehouse) puts DuckDB inside Postgres. With pg_analytics installed, Postgres can query foreign object stores like AWS S3 and table formats like Iceberg or Delta Lake. Queries are pushed down to DuckDB, a high-performance analytical query engine. By transforming Postgres into a performant search and analytics engine, ParadeDB frees your team from the pain of scaling and syncing Elasticsearch.
    Downloads: 55 This Week
    Last Update:
    See Project
  • 13
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    PHP Code Coverage

    PHP Code Coverage

    Collection, processing, and rendering functionality for PHP code

    The php-code-coverage library, authored by Sebastian Bergmann, enables collection, processing, and rendering of PHP code coverage data. It integrates with PHPUnit or other testing frameworks to track which lines, methods, or classes are executed during tests. The library supports generating detailed reports in formats like HTML, Clover, or XML, helping teams understand test completeness and identify untested code paths.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    Apache Flink

    Apache Flink

    Stream processing framework with powerful stream

    Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Apache Sedona

    Apache Sedona

    Cluster computing framework for processing large-scale geospatial data

    Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AionUi

    AionUi

    Free, local, open-source Cowork for Gemini CLI, Claude Code, Codex

    ...Instead of forcing users to work in separate terminals for each tool, AionUi automatically detects installed CLI tools and provides a central visual workspace where sessions can run in parallel, contexts are preserved, and conversations are saved locally without sending data to external servers. It enhances productivity by offering smart file management features like batch renaming, automatic organization, and intelligent file classification, thereby reducing manual overhead when working with large datasets or complex document structures. AionUi also supports a remote WebUI mode, allowing users to access their local AI tools securely over a network from other devices while keeping all processing and data on their own hardware.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 20
    WebP Codec

    WebP Codec

    Library to encode and decode images in WebP format

    libwebp is the reference codec library for Google’s WebP image format, providing both encoding and decoding along with command-line tools. It supplies cwebp to compress images into WebP and dwebp to decompress them back, making it easy to test quality/size trade-offs across presets and tuning parameters. The GitHub repository is a mirror; the canonical source of truth lives on Chromium’s git, and developer docs are hosted on WebP’s portal. The project underpins WebP support across browsers,...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 21
    CBIG

    CBIG

    Computational Brain Imaging Group tools

    CBIG is a comprehensive toolkit maintained by Thomas Yeo’s Computational Brain Imaging Group containing tools for processing and analyzing neuroimaging data—including fMRI preprocessing pipelines, brain parcellation algorithms, mental disorder subtyping models, fMRI dynamic models, registrations between brain spaces, and phenotypic prediction algorithms. After cloning/downloading this repository, please see README inside setup directory to see instructions on how to set up your local environment to be compatible with our repository. ...
    Downloads: 121 This Week
    Last Update:
    See Project
  • 22
    DeepBI

    DeepBI

    LLM based data scientist, AI native data application

    DeepBI is an AI-native data analysis platform. DeepBI leverages the power of large language models to explore, query, visualize, and share data from any data source. Users can use DeepBI to gain data insight and make data-driven decisions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ScreenPipe

    ScreenPipe

    AI app store powered by 24/7 desktop history. open source

    Screenpipe is an AI app store powered by continuous desktop history recording. It operates entirely locally, offering developers a platform to build, distribute, and monetize AI applications that leverage comprehensive contextual data from users' desktop activities. ​
    Downloads: 34 This Week
    Last Update:
    See Project
  • 25
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    ...It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. It separates client-side media handling from backend AI processing, reducing data exposure while still enabling transcription and document generation. AI-Media2Doc supports flexible customization through prompts, allowing users to tailor output styles based on their needs. ...
    Downloads: 4 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB