Showing 1057 open source projects for "python data analysis"

View related business solutions
  • Jscrambler: Pioneering Client-Side Protection Platform Icon
    Jscrambler: Pioneering Client-Side Protection Platform

    Jscrambler offers an exclusive blend of cutting-edge first-party JavaScript obfuscation and state-of-the-art third-party tag protection.

    Jscrambler is the leader in Client-Side Protection and Compliance. We were the first to merge advanced polymorphic JavaScript obfuscation with fine-grained third-party tag protection in a unified Client-Side Protection and Compliance Platform. Our integrated solution ensures a robust defense against current and emerging client-side cyber threats, data leaks, and IP theft, empowering software development and digital teams to innovate securely. With Jscrambler, businesses adopt a unified, future-proof client-side security policy all while achieving compliance with emerging security standards including PCI DSS v4.0. Trusted by digital leaders worldwide, Jscrambler gives businesses the freedom to innovate securely.
    Learn More
  • Managed File Transfer Software Icon
    Managed File Transfer Software

    Products to help you get data where it needs to go—securely and efficiently.

    For too many businesses, complex file transfer needs make it difficult to create, manage and support data flows to and from internal and external systems. Progress® MOVEit® empowers enterprises to take control of their file transfer workflows with solutions that help secure, simplify and centralize data exchanges throughout the organization.
    Learn More
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    ...The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. The architecture is designed to be lightweight and easily deployable, making it suitable for both local installations and cloud environments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 101 This Week
    Last Update:
    See Project
  • 3
    Director

    Director

    AI video agents framework for next-gen video interactions

    Director is a video database management system designed to organize, search, and retrieve large collections of video content efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    HolmesGPT

    HolmesGPT

    CNCF Sandbox Project

    HolmesGPT is an open-source AI agent designed to help DevOps and site reliability engineering teams diagnose and resolve production incidents. The system aggregates signals from observability tools such as logs, metrics, alerts, and distributed traces, then analyzes them using large language models to identify potential root causes. Rather than requiring engineers to manually correlate large volumes of monitoring data, HolmesGPT automatically synthesizes evidence and presents explanations in...
    Downloads: 22 This Week
    Last Update:
    See Project
  • Cloud-Based Software Licensing - Zentitle by Nalpeiron Icon
    Cloud-Based Software Licensing - Zentitle by Nalpeiron

    The #1 Software Licensing Solution. Release new Software License Models fast with no engineering. Increase software sales and drive up revenues.

    1000’s software companies have used Zentitle to launch new software products fast and control their entitlements easily - many going from startup to IPO on our platform. Our software monetization infrastructure allows you to easily build or
    Learn More
  • 5
    Vulnhuntr

    Vulnhuntr

    AI tool for detecting complex vulnerabilities in Python codebases

    Vulnhuntr is an open source security tool that uses large language models to analyze codebases and identify remotely exploitable vulnerabilities. It focuses on Python projects and applies static code analysis combined with LLM reasoning to trace how user input flows through an application. Instead of scanning entire repositories at once, it builds call chains step by step, allowing deeper inspection of complex, multi-stage issues that traditional tools may miss. Vulnhuntr can generate detailed findings, including vulnerability explanations and potential exploit paths, helping developers and security teams understand risks faster. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    chatd

    chatd

    Chat with your documents using local AI

    chatd is an open-source desktop application that allows users to interact with their documents through a locally running large language model. The software focuses on privacy and security by ensuring that all document processing and inference occur entirely on the user’s computer without sending data to external cloud services. It includes a built-in integration with the Ollama runtime, which provides a cross-platform environment for running large language models locally. The application...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    Excel MCP Server

    Excel MCP Server

    A Model Context Protocol server for Excel file manipulation

    The Excel MCP Server is a Python-based implementation of the Model Context Protocol that provides Excel file manipulation capabilities without requiring Microsoft Excel installation. It enables workbook creation, data manipulation, formatting, and advanced Excel features.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    NannyML

    NannyML

    Detecting silent model failure. NannyML estimates performance

    NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    PySyft

    PySyft

    Data science on data without acquiring a copy

    Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Deliver and Track Online and Live Training Fast and Easy with Axis LMS! Icon
    Deliver and Track Online and Live Training Fast and Easy with Axis LMS!

    Axis LMS targets HR departments for employee or customer training,

    Axis LMS enables you to deliver learning and training everywhere through a flexible and easy-to-use LMS that is designed to enhance your training, automate your workflows, and engage your learners.
    Learn More
  • 10
    MiroFish

    MiroFish

    A Simple and Universal Swarm Intelligence Engine

    MiroFish is a next-generation artificial intelligence prediction engine that leverages multi-agent technology and swarm-intelligence simulation to model, simulate, and forecast complex real-world scenarios. The system extracts “seed” information from sources such as breaking news, policy documents, and market signals to construct a high-fidelity digital parallel world populated by thousands of virtual agents with independent memory and behavior rules. Users can inject variables or conditions...
    Downloads: 1,477 This Week
    Last Update:
    See Project
  • 11
    BruteForceAI

    BruteForceAI

    Advanced LLM-powered brute-force tool combining AI intelligence

    BruteForceAI is an open-source security testing tool that applies large language models to the analysis of login forms and authentication flows in web applications. At a high level, the project uses AI to inspect HTML content, identify the relevant form elements, and automate selector discovery so that a tester does not need to hand-map every field before evaluation. It combines that analysis layer with automated credential testing workflows, framing itself as a more adaptive alternative to...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    TPOT

    TPOT

    A Python Automated Machine Learning tool that optimizes ML

    Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assistant. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    SuperDuperDB

    SuperDuperDB

    Integrate, train and manage any AI models and APIs with your database

    Build and manage AI applications easily without needing to move your data to complex pipelines and specialized vector databases. Integrate AI and vector search directly with your database including real-time inference and model training. Just using Python. A single scalable deployment of all your AI models and APIs which is automatically kept up-to-date as new data is processed immediately. No need to introduce an additional database and duplicate your data to use vector search and build on top of it. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    Ploomber

    Ploomber

    The fastest way to build data pipelines

    Ploomber is an open-source framework designed to simplify the development and deployment of data science and machine learning pipelines. It allows developers to transform exploratory data analysis workflows into production-ready pipelines without rewriting large portions of code. The system integrates with common development environments such as Jupyter Notebook, VS Code, and PyCharm, enabling data scientists to continue working with familiar tools while building scalable workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 16
    NLP

    NLP

    Open source NLP guide with models, methods, and real use cases

    NLP is an open source introductory resource for natural language processing, presented as a continuously updated book hosted on GitHub. It explains how machines process and understand human language, combining theory with practical examples. Its covers core NLP concepts such as text representation, feature extraction, and model evaluation, alongside hands-on implementations using tools like Word2Vec, TF-IDF, and FastText. It also introduces topic modeling with LDA, keyword extraction...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 17
    Materials Discovery: GNoME

    Materials Discovery: GNoME

    AI discovers 520000 stable inorganic crystal structures for research

    Materials Discovery (GNoME) is a large-scale research initiative by Google DeepMind focused on applying graph neural networks to accelerate the discovery of stable inorganic crystal materials. The project centers on Graph Networks for Materials Exploration (GNoME), a message-passing neural network architecture trained on density functional theory (DFT) data to predict material stability and energy formation. Using GNoME, DeepMind identified 381,000 new stable materials, later expanding the...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    MLJAR Studio

    MLJAR Studio

    Python package for AutoML on Tabular Data with Feature Engineering

    We are working on new way for visual programming. We developed a desktop application called MLJAR Studio. It is a notebook-based development environment with interactive code recipes and a managed Python environment. All running locally on your machine. We are waiting for your feedback. The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameter tuning to find the best model. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    Label Studio

    Label Studio

    Label Studio is a multi-type data labeling and annotation tool

    The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 20
    Fli

    Fli

    Google Flights MCP and Python Library

    Fli is a powerful Python library and command-line tool that provides direct programmatic access to Google Flights data through reverse-engineered API interactions rather than traditional web scraping. This approach enables faster, more reliable, and more stable access to flight information, avoiding the fragility associated with HTML parsing and UI changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    FISSURE

    FISSURE

    The RF and reverse engineering framework for everyone

    FISSURE is an open-source radio frequency analysis and signal intelligence framework built to support software-defined radio research, wireless security experimentation, and protocol reverse engineering. The project brings together tools for capturing, inspecting, decoding, replaying, and analyzing RF signals across a wide range of wireless technologies. It is designed as a practical environment for researchers and operators who need to move from raw spectrum observation to structured...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    Integuru v0

    Integuru v0

    The first AI agent that builds permissionless integrations

    Integuru is an open-source AI agent designed to automatically create integrations between software platforms by reverse-engineering their internal APIs. Instead of relying on official developer documentation or publicly available APIs, the system analyzes network traffic generated by user interactions within a web application. Developers capture browser requests and authentication data, which the agent then uses to infer the structure of the platform’s internal API endpoints. Based on this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Toloka-Kit

    Toloka-Kit

    Toloka-Kit is a Python library for working with Toloka API

    ...Toloka entities are represented as Python classes. You can use them instead of accessing the API using JSON representations. There’s no need to validate JSON files and work with them directly. Support of both synchronous and asynchronous (via async/await) executions. Streaming support: build complex pipelines which send and receive data in real-time. For example, you can pass data between two related projects: one for data labeling, and another for its validation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    Airweave

    Airweave

    Airweave lets agents search any app

    Airweave is an open-source platform that enables agents to semantically search across various applications, databases, and APIs. By transforming disparate data sources into a unified, searchable knowledge base, Airweave facilitates intelligent information retrieval through REST APIs or the MCP protocol. It's particularly useful for building AI agents that require access to structured and unstructured data across multiple platforms.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Magentic UI

    Magentic UI

    A research prototype of a human-centered web agent

    Magentic-UI is a research prototype developed by Microsoft that serves as a human-centered interface powered by a multi-agent system. It enables users to automate complex web tasks, such as browsing, form filling, and data analysis, while maintaining control over the process. The system emphasizes transparency and user involvement, making it suitable for tasks requiring both automation and human oversight.
    Downloads: 2 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB