Showing 455 open source projects for "data processing"

View related business solutions
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
    Learn More
  • The full-stack observability platform that protects your dataLayer, tags and conversion data Icon
    The full-stack observability platform that protects your dataLayer, tags and conversion data

    Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

    Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.
    Learn More
  • 1
    Apache Bigtop

    Apache Bigtop

    Bigtop is an Apache Foundation project for Infrastructure Engineers

    ...Developers and operators can use Bigtop to assemble customized Hadoop distributions tailored to their infrastructure and workloads. Its focus on reproducibility and packaging reduces friction in deploying large-scale data processing systems and ensures that different components of the Hadoop ecosystem work well together.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Thulite

    Thulite

    Web framework designed for speed, security, and SEO

    Thulite is an AI-powered search and recommendation engine that enhances search functionality in applications. It provides intelligent query processing, result ranking, and personalized recommendations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    RuVector

    RuVector

    Self-Learning, Vector Graph Neural Network, and Database built in Rust

    RuVector is part of the broader rUv ecosystem of AI engineering tools and focuses on enabling advanced vector-based processing and intelligent system development within agentic and AI-driven pipelines. The project fits into a larger vision of modular, composable AI infrastructure designed to support autonomous agents, data retrieval, and intelligent automation workflows. It emphasizes extensibility and interoperability with modern AI stacks, allowing developers to integrate vector operations into search, reasoning, or generative systems. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Llama Cloud Services

    Llama Cloud Services

    Knowledge Agents and Management in the Cloud

    Llama Cloud Services is a suite of tools designed to facilitate the integration of large language models (LLMs) into applications. It offers components for parsing, extracting, and reporting on complex documents, streamlining the process of preparing data for LLM consumption.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 5
    protoactor-go

    protoactor-go

    Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin

    Built on cloud-native technologies. Taking advantage of proven stability and performance. Asynchronous and Distributed by design. High-level abstractions like Actors and Virtual Grains. Capable of millions of messages per second cross-process communication. Write systems that self-heal using supervisor hierarchies. The Actor Model provides a higher level of abstraction for writing concurrent and distributed systems. It alleviates the developer from having to deal with explicit locking and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    MathPHP

    MathPHP

    Powerful modern math library for PHP

    Math PHP is a library that brings advanced mathematical functions and data analysis capabilities to PHP applications. It covers a wide range of topics, including linear algebra, calculus, statistics, probability, and numerical analysis. Math PHP is designed for developers and data scientists who require precise and efficient mathematical computations in PHP, making it suitable for scientific computing and data processing.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Colly

    Colly

    Elegant Scraper and Crawler Framework for Golang

    Colly provides a clean interface to write any kind of crawler/scraper/spider. With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving. Clean API. Fast (>1k request/sec on a single core) Manages request delays and maximum concurrency per domain. Automatic cookie and session handling. Sync/async/parallel scraping. Distributed scraping. Caching, automatic encoding of non-unicode responses. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    OfficeCLI

    OfficeCLI

    OfficeCLI is the first and best command-line tool

    OfficeCLI is a command-line productivity tool designed to bring AI-powered automation into everyday office workflows, enabling users to perform tasks such as document generation, data processing, and communication management directly from the terminal. It focuses on simplifying repetitive business operations by translating natural language commands into structured actions. The system likely integrates with common office tools and formats, allowing seamless interaction with documents, spreadsheets, and communication platforms. ...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 9
    h265web.js

    h265web.js

    A HEVC/H.265 Web Player

    ...Its architecture separates parsing, decoding, and rendering, giving developers fine-grained control over how video data is handled and displayed. The system is designed to work with streaming data, allowing incremental feeding of video chunks and real-time decoding, which is useful for surveillance, streaming platforms, and custom media pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Data management solutions for confident marketing Icon
    Data management solutions for confident marketing

    For companies wanting a complete Data Management solution that is native to Salesforce

    Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
    Learn More
  • 10
    AudioCraft

    AudioCraft

    Audiocraft is a library for audio processing and generation

    ...The repo provides inference scripts, checkpoints, and simple Python APIs so you can generate clips from prompts or incorporate the models into applications. It also contains training code and recipes, so researchers can fine-tune on custom data or explore new objectives without building infrastructure from scratch. Example notebooks, CLI tools, and audio utilities help with prompt design, conditioning on reference audio, and post-processing to produce ready-to-share outputs.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    AIOHTTP

    AIOHTTP

    Asynchronous HTTP client/server framework for asyncio and Python

    ...A long awaited new feature is tracing client request life cycle to figure out when and why client request spends a time waiting for connection establishment, getting server response headers etc. Now it is possible by registering special signal handlers on every request processing stage. The main change is dropping yield from support and using async/await everywhere. Farewell, Python 3.4. You often want to send some sort of data in the URL’s query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dict, using the params keyword argument. aiohttp internally performs URL canonicalization before sending request.
    Downloads: 55 This Week
    Last Update:
    See Project
  • 12
    Gson

    Gson

    A Java serialization/deserialization library to convert Java Objects

    Gson is a Java library developed by Google that allows conversion between Java objects and JSON. It enables serialization and deserialization of Java classes to and from JSON format, handling complex and generic types, nulls, custom naming policies, and more. Gson is lightweight, easy to use, and does not require annotation-based configuration, making it a popular choice for JSON processing in Java applications.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 13
    Oban

    Oban

    Robust job processing in Elixir, backed by modern PostgreSQL

    ...It provides a simple and consistent API for scheduling and performing jobs, and it is built to be fault-tolerant and easy to monitor. Oban is fundamentally different from other background job processing tools because it retains job data for historic metrics and inspection. You can leave your application running indefinitely without worrying about jobs being lost or orphaned due to crashes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ...The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 15
    hash-wasm

    hash-wasm

    Lightning fast hash functions using hand-tuned WebAssembly binaries

    ...The library supports a wide range of algorithms, including MD5, SHA variants, BLAKE, Argon2, bcrypt, and xxHash, making it suitable for applications ranging from security to data processing. By compiling optimized C implementations into WebAssembly, hash-wasm achieves significantly better performance compared to pure JavaScript alternatives while maintaining portability across platforms. It supports both simple one-shot hashing and advanced streaming modes, allowing developers to process large datasets incrementally. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Matter AI

    Matter AI

    Matter AI is open-source AI Code Reviewer Agent

    Matter AI is an AI-powered platform designed to enhance productivity through automated content generation, data analysis, and decision support. It leverages machine learning models to process text, analyze patterns, and generate insights, making it suitable for businesses looking to optimize data-driven decision-making. Matter AI integrates with various data sources and provides customizable AI workflows tailored to different industries.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    ReductStore

    ReductStore

    The fastest time series object store for Edge AI

    History storage and management of images, vibration data, text, labels, and more - all in one place with the highest performance. Merge blob and time series functionalities, reducing the need for multiple databases. Customize real-time data retention policies and replication strategies. Store billions of time-stamped blobs with AI labels and access them with low latency. Outperform other databases with a customized solution for time-series object data. Capture and access blob data as time...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    YoutubeExplode

    YoutubeExplode

    Abstraction layer over YouTube's internal API

    YoutubeExplode is a .NET library that provides a high-level abstraction for interacting with YouTube data, enabling developers to retrieve metadata and download media streams programmatically. The project exposes a clean API that allows applications to query videos, playlists, channels, and search results without relying on the official YouTube Data API. Under the hood, the library parses raw page data and leverages reverse-engineered internal endpoints to obtain structured information and stream manifests. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    JC

    JC

    CLI tool and python library

    ...This allows piping of output to tools like jq and simplifying automation scripts. jc JSONifies the output of many CLI tools and file types for easier parsing in scripts. This allows further command-line processing of output with tools like jq or jello by piping commands. The JC parsers can also be used as python modules. In this case, the output will be a python dictionary, or a list of dictionaries, instead of JSON. Two representations of the data are available. The default representation uses a strict schema per parser and converts known numbers to int/float JSON values. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Argo Workflows

    Argo Workflows

    Workflow engine for Kubernetes

    ...Model multi-step workflows as a sequence of tasks or capture the dependencies between tasks using a directed acyclic graph (DAG). Easily run compute intensive jobs for machine learning or data processing in a fraction of the time using Argo Workflows on Kubernetes. Run CI/CD pipelines natively on Kubernetes without configuring complex software development products. Argo Workflows is the most popular workflow execution engine for Kubernetes. It can run 1000s of workflows a day, each with 1000s of concurrent tasks. Our users say it is lighter-weight, faster, more powerful, and easier to use. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Google Cloud Dataflow Template Pipelines

    Google Cloud Dataflow Template Pipelines

    Cloud Dataflow Google-provided templates for solving data tasks

    DataflowTemplates is the source repository for Google-provided Dataflow templates that are intended to solve large-scale in-cloud data processing tasks without requiring users to build everything from scratch in a full development environment. The repository is centered on templated pipelines powered by Google Cloud Dataflow and Apache Beam, making it easier to run common integration and movement jobs such as data import, export, backup, restore, and bulk API operations. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    FFTW.jl

    FFTW.jl

    Julia bindings to the FFTW library for fast Fourier transforms

    This package provides Julia bindings to the FFTW library for fast Fourier transforms (FFTs), as well as functionality useful for signal processing. These functions were formerly a part of Base Julia. Users with a build of Julia based on Intel's Math Kernel Library (MKL) can use MKL for FFTs by setting a preference in their top-level project by either using the FFTW.set_provider!() method, or by directly setting the preference using Preferences.jl. Note that this choice will be recorded for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    ElasticJob

    ElasticJob

    Distributed scheduled job framework

    ElasticJob is a distributed scheduling solution consisting of two separate projects, ElasticJob-Lite and ElasticJob-Cloud. ElasticJob-Lite is a lightweight, decentralized solution that provides distributed task sharding services. ElasticJob-Cloud uses Mesos to manage and isolate resources. It uses a unified job API for each project. Developers only need code one time and can deploy at will. Support job sharding and high availability in distributed system. Scale out for throughput and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Google Highway

    Google Highway

    Performance-portable, length-agnostic SIMD with runtime dispatch

    ...This portability is achieved through dynamic or static dispatch mechanisms that select the best available instruction set at runtime or compile time. The library is designed for developers who need to maximize CPU performance in domains such as image processing, compression, cryptography, and scientific computing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    XZ Utils

    XZ Utils

    Open-source compression utility and library

    xz is a widely used open-source compression utility and library that implements the high-ratio LZMA and LZMA2 compression algorithms. It provides both command-line tools and a reusable C library, enabling developers and system administrators to compress and decompress files efficiently across many environments. The project is known for delivering strong compression performance while maintaining reasonable memory usage, making it suitable for software distribution, backups, and archival...
    Downloads: 6 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB