Showing 19 open source projects for "big data"

View related business solutions
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 1
    DATA SCIENCE ROADMAP

    DATA SCIENCE ROADMAP

    Data Science Roadmap from A to Z

    DATA SCIENCE ROADMAP is an educational repository designed to guide learners through the process of becoming proficient in data science and machine learning. The project presents a structured roadmap that outlines the knowledge and skills required for different stages of a data science career. Topics typically include programming with Python, statistics, mathematics, machine learning algorithms, data visualization, and big data technologies. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    ROOT

    ROOT

    Analyzing, storing and visualizing big data, scientifically

    ROOT is a unified software package for the storage, processing, and analysis of scientific data: from its acquisition to the final visualization in the form of highly customizable, publication-ready plots. It is reliable, performant and well supported, easy to use and obtain, and strives to maximize the quantity and impact of scientific results obtained per unit cost, both of human effort and computing resources. ROOT provides a very efficient storage system for data models, that...
    Downloads: 3 This Week
    Last Update:
    See Project
  • B2i offers full-service IR websites, widgets and plugins Icon
    B2i offers full-service IR websites, widgets and plugins

    Built for IR professionals who work for, or support public companies

    B2i Technologies provides the most robust and versatile tools to manage your Corporate website, Investor Relations website and email communications. Our Investor Relations Software solutions work through automation and implements into existing systems with ease in only a few steps. Our solutions not only help you stay compliant but save valuable time while reporting and delivering critical financial data and press release activities to investors. B2i's Investor Relations Solution provides highly reliable and customizable data for corporate websites including press releases, stock data, charting, and SEC filings within SOX compliance standards. Our investor relations software displays real-time data on your website without requiring additional work on your behalf. Once you have completed your filings and press releases they are automatically loaded onto your website and formatted for easy access.
    Learn More
  • 5
    NeuroMatch Academy (NMA)

    NeuroMatch Academy (NMA)

    NMA Computational Neuroscience course

    NMA Computational Neuroscience course. We have curated a curriculum that spans most areas of computational neuroscience (a hard task in an increasingly big field!). We will expose you to both theoretical modeling and more data-driven analyses. The Neuro Video Series is a series of 12 videos that covers basic neuroscience concepts and neuroscience methods. These videos are completely optional and do not need to be watched in a fixed order so you can pick and choose which videos will help you brush up on your knowledge. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    AlphaTree

    AlphaTree

    DNN && GAN && NLP && BIG DATA

    AlphaTree is an educational repository that provides a visual roadmap of deep learning models and related artificial intelligence technologies. The project focuses on explaining the historical development and relationships between major neural network architectures used in modern machine learning. It presents diagrams and documentation describing the evolution of models such as LeNet, AlexNet, VGG, ResNet, DenseNet, and Inception networks. The repository organizes these architectures into a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Angel

    Angel

    A Flexible and Powerful Parameter Server for large-scale ML

    Angel is a high-performance distributed machine learning and graph computing platform based on the philosophy of Parameter Server. It is tuned for performance with big data from Tencent and has a wide range of applicability and stability, demonstrating an increasing advantage in handling higher-dimension models. Angel is jointly developed by Tencent and Peking University, taking account of both high availability in industry and innovation in academia. With a model-centered core design concept, Angel partitions the parameters of complex models into multiple parameter-server nodes and implements a variety of machine learning algorithms and graph algorithms using efficient model-updating interfaces and functions, as well as a flexible consistency model for synchronization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    HPCC Systems

    HPCC Systems

    End-to-end big data in a massively scalable supercomputing platform.

    Important: As of April 20, 2026, this project can now be found at https://github.com/hpcc-systems/HPCC-Platform/releases. HPCC Systems® (www.hpccsystems.com) from LexisNexis® Risk Solutions is a proven, open source solution for Big Data insights that can be implemented by businesses of all sizes. With HPCC Systems, developers can design applications with Big Data at their core, enabling businesses to better analyze and understand data at scale, improving business time to results and decisions. HPCC Systems offers a consistent data-centric programming language, two processing platforms and a single, complete end-to-end architecture for efficient processing.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9

    Faum

    Fast Autonomous Unsupervised Multidimiensional Classification

    This is the proof-of-concept implementation of the FAUM Clustering method. This implementation was used to perform the published results and is now released in the hope that it will be useful.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Dominate AI Search Results Icon
    Dominate AI Search Results

    Generative Al is shaping brand discovery. AthenaHQ ensures your brand leads the conversation.

    AthenaHQ is a cutting-edge platform for Generative Engine Optimization (GEO), designed to help brands optimize their visibility and performance across AI-driven search platforms like ChatGPT, Google AI, and more.
    Learn More
  • 10
    Alink

    Alink

    Alink is the Machine Learning algorithm platform based on Flink

    Alink is Alibaba’s scalable machine learning algorithm platform built on Apache Flink, designed for batch and stream data processing. It provides a wide variety of ready-to-use ML algorithms for tasks like classification, regression, clustering, recommendation, and more. Written in Java and Scala, Alink is suitable for enterprise-grade big data applications where performance and scalability are crucial. It supports model training, evaluation, and deployment in real-time environments and integrates seamlessly into Alibaba’s cloud ecosystem.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    SparrowRecSys

    SparrowRecSys

    A Deep Learning Recommender System

    SparrowRecSys is an open-source deep learning recommendation system framework designed to demonstrate the architecture and implementation of modern industrial-scale recommender systems. The project integrates multiple machine learning models and data processing pipelines to simulate how real-world recommendation platforms operate. It includes components for offline data processing, feature engineering, model training, real-time data updates, and online recommendation services. SparrowRecSys...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    surpriver

    surpriver

    Find big moving stocks before they move using machine learning

    surpriver is a machine learning project designed to identify unusual stock market activity that may precede large price movements. The system analyzes historical stock price and volume data to detect anomalies that could indicate potential trading opportunities. By applying machine learning techniques to market indicators, the tool attempts to identify patterns in trading behavior that deviate significantly from normal market activity. These anomalies are interpreted as signals that a stock...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Easy Machine Learning

    Easy Machine Learning

    Easy Machine Learning is a general-purpose dataflow-based system

    Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from being realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    ...Of course you may specify JASP for advanced data editing and RapidMiner for advanced prediction modeling. DSTK is written in C#, Java and Python to interface with R, NLTK, and Weka. It can be expanded with plugins using R Scripts. We have also created plugins for more statistical functions, and Big Data Analytics with Microsoft Azure HDInsights (Spark Server) with Livy.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    H2O-3

    H2O-3

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning

    ...H2O-3 integrates with big data technologies such as Hadoop and Apache Spark, enabling organizations to run machine learning workflows on large-scale data infrastructure. The platform also includes a web-based interface called Flow that allows users to build models interactively through notebooks and visual tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Spark Python Notebooks

    Spark Python Notebooks

    Apache Spark & Python (pySpark) tutorials for Big Data Analysis

    Spark Python Notebooks is a curated collection of example Jupyter notebooks designed to help developers and data engineers learn Apache Spark using Python in an interactive environment. Rather than only providing static code files, this project uses notebooks to teach practical data processing workflows, exposing users to real Spark programming patterns like working with RDDs, DataFrames, and distributed computations. These notebooks often demonstrate how to transform, analyze, and visualize large datasets using PySpark APIs, which mirrors many real-world big data use cases. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    Chordalysis

    Log-linear analysis (data modelling) for high-dimensional data

    ...However, due to its exponential nature, previous approaches did not allow scale-up to more than a dozen variables. We present here Chordalysis, a log-linear analysis method for big data. Chordalysis exploits recent discoveries in graph theory by representing complex models as compositions of triangular structures, also known as chordal graphs. Chordalysis makes it possible to discover the structure of datasets with thousands of variables on a standard desktop computer. Associated papers at ICDM 2013, ICDM 2014 and SDM 2015 can be found at http://www.francois-petitjean.com/Research/ YourKit is supporting Chordalysis open source project with its full-featured Java Profiler. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Flamingo Project

    Flamingo Project

    Workflow Designer, Hive Editor, Pig Editor, File System Browser

    Flamingo is a open-source Big Data Platform that combine a Ajax Rich Web Interface + Workflow Engine + Workflow Designer + MapReduce + Hive Editor + Pig Editor. 1. Easy Tool for big data 2. Use comfortable in Hadoop EcoSystem projects 3. Based GPL V3 License Supporting Pig IDE, Hive IDE, HDFS Browser, Scheduler, Hadoop Job Monitoring, Workflow Engine, Workflow Designer, MapReduce.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    pyIRDG

    pyIRDG

    IMDb Relational Dataset Generator

    pyIRDG is a program written in Python to generate relational datasets in Prolog format. It uses data from the Internet Movie Database in combination with IMDbPY as backend. A graphical user interface written in pyQt allows the user to link multiple entities together as model for the generation process. The big four entities are Title, Person, Company and Character. Many attributes can be chosen for adding to the output .pl file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next