Showing 13 open source projects for "python data analysis"

View related business solutions
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • Powerful Website Security | Continuous Web Threat Platform Icon
    Powerful Website Security | Continuous Web Threat Platform

    Continuously detect, prioritize, and validate web threats to quickly mitigate security, privacy, and compliance risks.

    Reflectiz is a comprehensive web exposure management platform that helps organizations proactively identify, monitor, and mitigate security, privacy, and compliance risks across their online environments. Designed to address the growing complexity of modern websites, Reflectiz provides full visibility and control over first, third, and even fourth-party components, such as scripts, trackers, and open-source libraries that often evade traditional security tools.
    Learn More
  • 1
    Deequ

    Deequ

    Deequ is a library built on top of Apache Spark

    ...It also includes a little domain-specific language called DQDL (Data Quality Definition Language) which allows declarative specification of quality rules. Users typically run Deequ before feeding data downstream (to ML pipelines, analytics, or production systems), enabling early detection and isolation of data errors. There is also a Python wrapper, PyDeequ, for users who prefer working from Python environments.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    X's Recommendation Algorithm

    X's Recommendation Algorithm

    Source code for the X Recommendation Algorithm

    The Algorithm is Twitter’s open source release of the core ranking system that powers the platform’s home timeline. It provides transparency into how tweets are selected, prioritized, and surfaced to users, reflecting Twitter’s move toward openness in recommendation algorithms. The repository contains the recommendation pipeline, which incorporates signals such as engagement, relevance, and content features, and demonstrates how they combine to form ranked outputs. Written primarily in...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Spark NLP

    Spark NLP

    State of the Art Natural Language Processing

    Experience the power of large language models like never before, unleashing the full potential of Natural Language Processing (NLP) with Spark NLP, the open source library that delivers scalable LLMs. The full code base is open under the Apache 2.0 license, including pre-trained models and pipelines. The only NLP library built natively on Apache Spark. The most widely used NLP library in the enterprise. Spark ML provides a set of machine learning applications that can be built using two main...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Managed Cybersecurity Platform Built for MSPs Icon
    Managed Cybersecurity Platform Built for MSPs

    Discover the cyber platform that secures and insures SMEs

    In a world that lives and breathes all things digital, every business is at risk. Cybersecurity has become a major problem for small and growing businesses due to limited budgets, resources, time, and training. Hackers are leveraging these vulnerabilities, and most of the existing cybersecurity solutions on the market are too cumbersome, too complicated, and far too costly.
    Learn More
  • 5
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    rocket-bi

    rocket-bi

    An open-source web-based self-service BI for analytical databases

    Rocket.BI is a free, open-source, web-based business intelligence solution specifically designed for analytical databases. It enables data analysts and business users alike to easily integrate different data sources, perform advanced data analysis, ad hoc, and more. With an easy-to-use editor, you can create personalized reports, build interactive business dashboards and generate actionable business insights. Rocket.BI also allows collaboration as working together with other people in the organization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    CoolplaySpark

    CoolplaySpark

    Spark Cool Play: Spark source code analysis, Spark class library, etc.

    CoolplaySpark is a learning and practice repository designed to help users understand and work with Apache Spark. It serves as a companion resource for the book 深入理解Spark核心思想与源码分析 (In-Depth Understanding of Spark’s Core Concepts and Source Code Analysis). The project contains annotated examples, explanations, and exercises that guide learners through Spark’s architecture, execution model, and source code internals. It is particularly valuable for developers who want to strengthen their understanding of Spark by not only using it as a data processing engine but also exploring how its internals function. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    Cosmos DB Spark

    Cosmos DB Spark

    Apache Spark Connector for Azure Cosmos DB

    Azure Cosmos DB Spark is the official connector for Azure CosmosDB and Apache Spark. The connector allows you to easily read to and write from Azure Cosmos DB via Apache Spark DataFrames in Python and Scala. It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Waterloo

    Java-based scientific graphics

    Java-based scientific graphics with support for Java, Groovy, MATLAB, Python, the R statistical environment, Scala and SciLab.
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • PeerGFS PEER Software - File Sharing and Collaboration Icon
    PeerGFS PEER Software - File Sharing and Collaboration

    One Solution to Simplify File Management and Orchestration Across Edge, Data Center, and Cloud Storage

    PeerGFS is a software-only solution developed to solve file management/file replication challenges in multi-site, multi-platform, and hybrid multi-cloud environments.
    Learn More
  • 10

    Deem

    Analyze time-course data with significance tests, clustering, modeling

    Use statistical methods to analyze time-course data (gene expression microarray and RNA-seq data in particular, but not limited to). Apply significance tests to filter out only significant genes or time series. Cluster time series into similar groups. Generate network models, including linear or non-linear models. Variable selection and optimization routines included. Written in Scala and R. The application is a cross-platform desktop app with a simple GUI and is fully functional...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Infinispan

    Infinispan

    High performance distributed in-memory key/value store

    Infinispan is an open source, Java based data grid platform. ***IMPORTANT*** Starting with Infinispan 5.0.0.FINAL, Infinispan releases are no longer hosted in Sourceforge. They can now be located in www.jboss.org/infinispan/downloads
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Cane is a Data Manipulation Interface(DMI) based on code behaviour analysis. Cane enable developers manipulate data in the way close to natural logical. Developers can finish their job in one continuous operation supplied by Cane's chain operation
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Java OpenCL Process Virtual Machine. Spring IoC based framework for complex data analysis with OpenCL computing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB