Showing 67 open source projects for "big data"

View related business solutions
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design. Icon
    MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design.

    Microstation enables architects, engineers, and designers to create precise 2D and 3D drawings that bring complex projects to life.

    MicroStation is the only computer-aided design software for infrastructure design, helping architects and engineers like you bring their vision to life, present their designs to their clients, and deliver their projects to the community.
    Learn More
  • 1
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach. You can even combine both approaches efficiently in the same query, something no other engine can do....
    Downloads: 23 This Week
    Last Update:
    See Project
  • 4
    HugeGraph

    HugeGraph

    A graph database that supports more than 100+ billion data

    ...HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports Gremlin graph query language and RESTful API but also provides commonly used graph algorithm APIs. To help users easily implement various queries and analyses, HugeGraph has a full range of accessory tools, such as supporting distributed storage, data replication, scaling horizontally, and supports many built-in backends of storage engines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Failed Payment Recovery for Subscription Businesses Icon
    Failed Payment Recovery for Subscription Businesses

    For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

    FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
    Learn More
  • 5
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    ...InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    ODD Platform

    ODD Platform

    First open-source data discovery and observability platform

    Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Apache RocketMQ

    Apache RocketMQ

    Distributed messaging and streaming platform with low latency

    ...A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built-in message tracing capability, also support opentracing. Versatile big-data and streaming ecosytem integration. Message retroactivity by time or offset. Reliable FIFO and strict ordered messaging in the same queue. Efficient pull and push consumption model. Million-level message accumulation capacity in a single queue. Multiple messaging protocols like JMS and OpenMessaging. Flexible distributed scale-out deployment architecture. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    ElasticJob

    ElasticJob

    Distributed scheduled job framework

    ElasticJob is a distributed scheduling solution consisting of two separate projects, ElasticJob-Lite and ElasticJob-Cloud. ElasticJob-Lite is a lightweight, decentralized solution that provides distributed task sharding services. ElasticJob-Cloud uses Mesos to manage and isolate resources. It uses a unified job API for each project. Developers only need code one time and can deploy at will. Support job sharding and high availability in distributed system. Scale out for throughput and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Turn traffic into pipeline and prospects into customers Icon
    Turn traffic into pipeline and prospects into customers

    For account executives and sales engineers looking for a solution to manage their insights and sales data

    Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.
    Learn More
  • 10
    Apache Polaris

    Apache Polaris

    Apache Polaris, the interoperable, open source catalog

    Apache Polaris is an open-source metadata catalog and data management service designed to manage Apache Iceberg tables in modern data lakehouse environments. It provides a centralized catalog that allows multiple compute engines and analytics systems to interact with the same datasets through a standardized interface. By implementing the Iceberg REST catalog API, Polaris enables distributed data platforms to access shared table metadata without tightly coupling storage systems and query...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Apache Iceberg

    Apache Iceberg

    Apache Iceberg

    Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data while making it possible for engines like Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time. The core Java library that tracks table snapshots and metadata is complete, but still evolving. Current work is focused on adding row-level deletes and upserts, and integration work with new engines like Flink and Hive. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    testng

    testng

    TestNG testing framework

    TestNG is a testing framework inspired from JUnit and NUnit but introduces some new functionalities that make it more powerful and easier to use. Run your tests in arbitrarily big thread pools with various policies available (all methods in their own thread, one thread per test class, etc...).
    Downloads: 10 This Week
    Last Update:
    See Project
  • 13
    LakeSoul

    LakeSoul

    An end-to-end, realtime and cloud native Lakehouse framework

    LakeSoul is a high-performance, unified table storage framework for big data lakes, supporting both streaming and batch data in a single format. Built on top of Apache Spark and leveraging Apache Arrow and Parquet, LakeSoul provides ACID transactions, schema evolution, and time travel. It is designed for large-scale data lake architectures that require consistency, efficiency, and easy integration with modern data stacks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    Planetiler

    Planetiler

    Flexible tool to build planet-scale vector tilesets

    ...Planetiler packages tiles into an MBTiles (SQLite) or PMTiles file that can be served using tools like TileServer GL or Martin or even queried directly from the browser. See awesome-vector-tiles for more projects that work with data in this format. Planetiler works by mapping input elements to vector tile features, flattening them into a big list, and then sorting by tile ID to group them into tiles.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    Angel

    Angel

    A Flexible and Powerful Parameter Server for large-scale ML

    Angel is a high-performance distributed machine learning and graph computing platform based on the philosophy of Parameter Server. It is tuned for performance with big data from Tencent and has a wide range of applicability and stability, demonstrating an increasing advantage in handling higher-dimension models. Angel is jointly developed by Tencent and Peking University, taking account of both high availability in industry and innovation in academia. With a model-centered core design concept, Angel partitions the parameters of complex models into multiple parameter-server nodes and implements a variety of machine learning algorithms and graph algorithms using efficient model-updating interfaces and functions, as well as a flexible consistency model for synchronization. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Downloads: 43 This Week
    Last Update:
    See Project
  • 17
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    Parkiet

    Parkiet

    Parquet format file GUI editor

    Parquet file viewer and editor written in Java and SWT. It uses Apache Avro library for reading and writing edited parquet files. Only Parquet files with simple data type columns are supported.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Pentaho

    Pentaho

    Pentaho offers comprehensive data integration and analytics platform.

    Pentaho couples data integration with business analytics in a modern platform to easily access, visualize and explore data that impacts business results. Use it as a full suite or as individual components that are accessible on-premise, in the cloud, or on-the-go (mobile). Pentaho enables IT and developers to access and integrate data from any source and deliver it to your applications all from within an intuitive and easy to use graphical tool. The Pentaho Enterprise Edition Free Trial...
    Leader badge
    Downloads: 1,598 This Week
    Last Update:
    See Project
  • 20
    DataSophon

    DataSophon

    The next generation of cloud-native big data management expert

    Aiming at quickly deploying, managing, monitoring and automating the operation and maintenance of Big Data service components and nodes, helping you quickly build stable, efficient Big Data cluster services. The Three-Body Problem, a Hugo Award-winning work of the world's highest science fiction literature, is known for its stunning "hard science fiction" style, and its author Liu Cixin is credited with "single-handedly raising Chinese science fiction to a world-class level". ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    miRDeep*

    miRDeep*

    MiRDeep*

    Please cite: An, J., Lai, J., Lehman, M.L. and Nelson, C.C. (2013) miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data. Nucleic Acids Res, 41, 727-737. We will create index for you if you tell us your interested species (j.an@qut.edu.au). download command line version "MDS_command_line_Vxx.zip" clicking "Browse All Files" please find miRPlant in sourceforge for plant miRNA prediction.
    Leader badge
    Downloads: 57 This Week
    Last Update:
    See Project
  • 22
    Advanced Trigonometry Calculator

    Advanced Trigonometry Calculator

    Precision Trigonometry: Advanced Calculator for Complex Math

    Advanced Trigonometry Calculator is equipped with a user-friendly interface that allows for easy input of problems and instant computation. Professionals such as engineers who need to perform advanced trigonometric calculations in their work will find this tool extremely useful. ATC Online Alpha: https://advantrigoncalc.sourceforge.io/atc/ More info by clicking below: https://advantrigoncalc.sourceforge.io/ Advanced Trigonometry Calculator was only and always only developed by...
    Leader badge
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    Alink

    Alink

    Alink is the Machine Learning algorithm platform based on Flink

    Alink is Alibaba’s scalable machine learning algorithm platform built on Apache Flink, designed for batch and stream data processing. It provides a wide variety of ready-to-use ML algorithms for tasks like classification, regression, clustering, recommendation, and more. Written in Java and Scala, Alink is suitable for enterprise-grade big data applications where performance and scalability are crucial. It supports model training, evaluation, and deployment in real-time environments and integrates seamlessly into Alibaba’s cloud ecosystem.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    The Art of Programming

    The Art of Programming

    A collection of practical tips can be found at the bottom of this page

    The Art of Programming (Second Edition) is a curated collection of programming problems and solutions originally derived from the Microsoft 100 Interview Questions blog series, later refined into a long-running tutorial and ultimately a published book. Created by July, the series began in 2010 and has since evolved into an in-depth exploration of algorithmic thinking, data structures, and coding interview preparation. The repository brings together 42 classic programming problems from the original series, enhanced with detailed explanations, formula derivations, and optimized solutions. In July 2023, work on the second edition was announced, which expands the project with updated content, new problems inspired by recent big-tech interviews, and introductions to modern machine learning techniques such as XGBoost, CNNs, RNNs, and LSTMs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    ...It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB