Showing 60 open source projects for "big data"

View related business solutions
  • Loan management software that makes it easy. Icon
    Loan management software that makes it easy.

    Ideal for lending professionals who are looking for a feature rich loan management system

    Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.
    Learn More
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • 1
    data.table

    data.table

    Extends base R’s data for high-performance data manipulation

    data.table is an R package that extends base R’s data.frame for high-performance data manipulation. It offers concise syntax, blazing speed, and memory-efficient operations. It supports fast file reading/writing, joins, grouping, reshaping, and updates by reference. It is heavily used in large data workflows, big data in R, production pipelines, etc. Extremely efficient grouping/aggregation/summarization; can handle very large datasets (hundreds of millions to billions of rows) in memory (if available). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FinMind

    FinMind

    Open Data, more than 50 financial data

    In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    JuiceFS

    JuiceFS

    JuiceFS is a distributed POSIX file system built on top of Redis

    ...Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility, availability, scalability and strong consistency for your data-intensive applications. Purposely built to serve big data scenarios such as self-driving model training, recommendation engine, and Next-generation Gene Sequencing, JuiceFS specializes in high performance and easier management of tens of billion of files management. We bring JuiceFS to developers with the hope that it will be easy to use, reliable, high-performance, and solve all your file storage problems in a cloud environment.
    Downloads: 38 This Week
    Last Update:
    See Project
  • 4
    Apache Bigtop

    Apache Bigtop

    Bigtop is an Apache Foundation project for Infrastructure Engineers

    Apache Bigtop is a project focused on building and packaging the Hadoop ecosystem and related big data components. It provides a consistent framework for testing, packaging, and deploying Hadoop distributions, including tools like HDFS, YARN, Spark, Hive, HBase, and more. By maintaining cross-platform builds (RPMs, DEBs, Docker images, and Kubernetes support), Bigtop makes it easier for organizations to deploy big data stacks in different environments. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 5
    XCharts

    XCharts

    A charting and data visualization library for Unity

    A charting and data visualization library for Unity. Unity data visualization chart plugin. A UGUIpowerful, easy-to-use, parameter-configurable data visualization chart plug-in. It supports ten built-in charts. A powerful, easy-to-use, configurable charting and data visualization library for Unity. Visual configuration of parameters, real-time preview of effects, and pure code drawing without additional resources. Support ten built-in charts such as line chart, column chart, pie chart, radar...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 6
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    ...InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach. You can even combine both approaches efficiently in the same query, something no other engine can do....
    Downloads: 14 This Week
    Last Update:
    See Project
  • 8
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    ElasticJob

    ElasticJob

    Distributed scheduled job framework

    ElasticJob is a distributed scheduling solution consisting of two separate projects, ElasticJob-Lite and ElasticJob-Cloud. ElasticJob-Lite is a lightweight, decentralized solution that provides distributed task sharding services. ElasticJob-Cloud uses Mesos to manage and isolate resources. It uses a unified job API for each project. Developers only need code one time and can deploy at will. Support job sharding and high availability in distributed system. Scale out for throughput and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 10
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Fishing Funds

    Fishing Funds

    Fund, big market, stock, virtual currency status bar display for apps

    Display real-time trends of Chinese funds in the menubar. Fund, big market, stock, virtual currency status bar displays small applications, developed based on Electron, supports MacOS, Windows, Linux clients, data sources come from Tiantian Fund, Ant Fund, Love Fund, Tencent Securities, Sina Fund, etc. This project refers to electron-react-boilerplate-menubar, which is developed based on Electron React Boilerplate and menubar.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    testng

    testng

    TestNG testing framework

    TestNG is a testing framework inspired from JUnit and NUnit but introduces some new functionalities that make it more powerful and easier to use. Run your tests in arbitrarily big thread pools with various policies available (all methods in their own thread, one thread per test class, etc...).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Bacalhau

    Bacalhau

    Community-driven, simple, yet powerful framework

    Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. It’s especially useful for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Volcano

    Volcano

    A Cloud Native Batch System (Project under CNCF)

    ...It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workload including machine learning/deep learning, bioinformatics/genomics, and other "big data" applications. These types of applications typically run on generalized domain frameworks like TensorFlow, Spark, Ray, PyTorch, MPI, etc, which Volcano integrates with. Volcano builds upon a decade and a half of experience running a wide variety of high-performance workloads at scale using several systems and platforms, combined with best-of-breed ideas and practices from the open-source community. ...
    Downloads: 184 This Week
    Last Update:
    See Project
  • 15
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Ptah.sh

    Ptah.sh

    Self-hosted alternative to Heroku

    Ptah.sh is a Fair Source self-hosting deployment platform - an alternative to Heroku/Vercel and other Big Corp software. We believe that indie, startups, and small to medium businesses must not suffer from unpredicted billing or bare-metal/VPS configurations. Designed for indie developers and SMBs, Ptah.sh offers the simplest hosting experience.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    applied-ml

    applied-ml

    Papers & tech blogs by companies sharing their work on data science

    ...For someone designing—or planning to build—a production ML system, this repo provides patterns, precedents, and lessons learned from firms that operate at big scale.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    CubeFS

    CubeFS

    cloud-native file store

    CubeFS is a new generation cloud-native storage that supports access protocols such as S3, HDFS, and POSIX. It is widely applicable in various scenarios such as big data, AI/LLMs, container platforms, separation of storage and computing for databases and middleware, data sharing and protection, etc. Compatible with various access protocols such as S3, POSIX, HDFS, etc., and the access between protocols can be interoperable. Support replicas and erasure coding engines, users can choose flexibly according to business scenarios. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    xxHash

    xxHash

    Extremely fast non-cryptographic hash algorithm

    ...It is proposed in four flavors (XXH32, XXH64, XXH3_64bits and XXH3_128bits). The latest variant, XXH3, offers improved performance across the board, especially on small data. It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical across all platforms (little / big endian). Performance on large data is only one part of the picture. Hashing is also very useful in constructions like hash tables and bloom filters. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    MobX

    MobX

    A Simple, scalable state management

    MobX is a battle tested library that makes state management simple and scalable by transparently applying functional reactive programming (TFRP). Write minimalistic, boilerplate free code that captures your intent. Trying to update a record field? Use the good old JavaScript assignment. Updating data in an asynchronous process? No special tools are required, the reactivity system will detect all your changes and propagate them out to where they are being used. All changes to and uses of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Foundatio

    Foundatio

    Pluggable foundation blocks for building distributed apps

    Pluggable foundation blocks for building loosely coupled distributed apps. Includes implementations in Redis, Azure, AWS, RabbitMQ and in memory (for development). When building several big cloud applications we found a lack of great solutions (that's not to say there aren't solutions out there) for many key pieces to building scalable distributed applications while keeping the development experience simple. Wanted to build against abstract interfaces so that we could easily change...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 22
    huihut interview

    huihut interview

    A summary of C/C++ technical interview basics

    ...It’s organized to be approachable whether you’re a student preparing for your first internship or an experienced engineer brushing up on fundamentals before a big interview round.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches. It implements multiple PDF decompression filters and handles common font encoding pathways, which are essential for turning raw PDF content streams into readable text. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Nano Events

    Nano Events

    Simple and tiny (107 bytes) event emitter library for JavaScript

    Nano Events is a minimalistic, high-performance event emitter library for JavaScript. Its goal is to provide the simplest possible API to add pub/sub capabilities (emitters and listeners) to any JS object or application, while keeping overhead and bundle size extremely small. Rather than offering many complex features, nanoevents focuses on the core primitives: creating an emitter, subscribing to named events, emitting events with arbitrary data, and unsubscribing. Because of its minimal API...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Downloads: 54 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB