Open Source Python Reinforcement Learning Frameworks

Python Reinforcement Learning Frameworks

View 28 business solutions

Browse free open source Python Reinforcement Learning Frameworks and projects below. Use the toggles on the left to filter open source Python Reinforcement Learning Frameworks by OS, license, language, programming language, and project status.

  • Failed Payment Recovery for Subscription Businesses Icon
    Failed Payment Recovery for Subscription Businesses

    For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

    FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
    Learn More
  • The full-stack observability platform that protects your dataLayer, tags and conversion data Icon
    The full-stack observability platform that protects your dataLayer, tags and conversion data

    Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

    Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.
    Learn More
  • 1
    DeepSeek-V3

    DeepSeek-V3

    Powerful AI language model (MoE) optimized for efficiency/performance

    DeepSeek-V3 is a robust Mixture-of-Experts (MoE) language model developed by DeepSeek, featuring a total of 671 billion parameters, with 37 billion activated per token. It employs Multi-head Latent Attention (MLA) and the DeepSeekMoE architecture to enhance computational efficiency. The model introduces an auxiliary-loss-free load balancing strategy and a multi-token prediction training objective to boost performance. Trained on 14.8 trillion diverse, high-quality tokens, DeepSeek-V3 underwent supervised fine-tuning and reinforcement learning to fully realize its capabilities. Evaluations indicate that it outperforms other open-source models and rivals leading closed-source models, achieving this with a training duration of 55 days on 2,048 Nvidia H800 GPUs, costing approximately $5.58 million.
    Downloads: 133 This Week
    Last Update:
    See Project
  • 2
    DeepSeek R1

    DeepSeek R1

    Open-source, high-performance AI model with advanced reasoning

    DeepSeek-R1 is an open-source large language model developed by DeepSeek, designed to excel in complex reasoning tasks across domains such as mathematics, coding, and language. DeepSeek R1 offers unrestricted access for both commercial and academic use. The model employs a Mixture of Experts (MoE) architecture, comprising 671 billion total parameters with 37 billion active parameters per token, and supports a context length of up to 128,000 tokens. DeepSeek-R1's training regimen uniquely integrates large-scale reinforcement learning (RL) without relying on supervised fine-tuning, enabling the model to develop advanced reasoning capabilities. This approach has resulted in performance comparable to leading models like OpenAI's o1, while maintaining cost-efficiency. To further support the research community, DeepSeek has released distilled versions of the model based on architectures such as LLaMA and Qwen.
    Downloads: 120 This Week
    Last Update:
    See Project
  • 3
    TorchRL

    TorchRL

    A modular, primitive-first, python-first PyTorch library

    TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. TorchRL provides PyTorch and python-first, low and high-level abstractions for RL that are intended to be efficient, modular, documented, and properly tested. The code is aimed at supporting research in RL. Most of it is written in Python in a highly modular way, such that researchers can easily swap components, transform them, or write new ones with little effort.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 4
    LightZero

    LightZero

    [NeurIPS 2023 Spotlight] LightZero

    LightZero is an efficient, scalable, and open-source framework implementing MuZero, a powerful model-based reinforcement learning algorithm that learns to predict rewards and transitions without explicit environment models. Developed by OpenDILab, LightZero focuses on providing a highly optimized and user-friendly platform for both academic research and industrial applications of MuZero and similar algorithms.
    Downloads: 31 This Week
    Last Update:
    See Project
  • Rezku Point of Sale Icon
    Rezku Point of Sale

    Designed for Real-World Restaurant Operations

    Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.
    Learn More
  • 5
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    Agent S is an open-source agentic framework designed to enable autonomous computer use through an Agent-Computer Interface (ACI). Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. With optional local code execution, reflection mechanisms, and compositional planning, Agent S provides a scalable and research-driven framework for building advanced computer-use agents.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    Brax

    Brax

    Massively parallel rigidbody physics simulation

    Brax is a fast and fully differentiable physics engine for large-scale rigid body simulations, built on JAX. It is designed for research in reinforcement learning and robotics, enabling efficient simulations and gradient-based optimization.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    RWARE

    RWARE

    MuA multi-agent reinforcement learning environment

    robotic-warehouse is a simulation environment and framework for robotic warehouse automation, enabling research and development of AI and robotic agents to manage warehouse logistics, such as item picking and transport.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    Stable Baselines3

    Stable Baselines3

    PyTorch version of Stable Baselines

    Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines. You can read a detailed presentation of Stable Baselines3 in the v1.0 blog post or our JMLR paper. These algorithms will make it easier for the research community and industry to replicate, refine, and identify new ideas, and will create good baselines to build projects on top of. We expect these tools will be used as a base around which new ideas can be added, and as a tool for comparing a new approach against existing ones. We also hope that the simplicity of these tools will allow beginners to experiment with a more advanced toolset, without being buried in implementation details.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    TextWorld

    TextWorld

    ​TextWorld is a sandbox learning environment for the training

    TextWorld is a learning environment designed to train reinforcement learning agents to play text-based games, where actions and observations are entirely in natural language. Developed by Microsoft Research, TextWorld focuses on language understanding, planning, and interaction in complex, narrative-driven environments. It generates games procedurally, enabling scalable testing of agents’ natural language processing and decision-making abilities.
    Downloads: 9 This Week
    Last Update:
    See Project
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • 10
    Weights and Biases

    Weights and Biases

    Tool for visualizing and tracking your machine learning experiments

    Use W&B to build better models faster. Track and visualize all the pieces of your machine learning pipeline, from datasets to production models. Quickly identify model regressions. Use W&B to visualize results in real time, all in a central dashboard. Focus on the interesting ML. Spend less time manually tracking results in spreadsheets and text files. Capture dataset versions with W&B Artifacts to identify how changing data affects your resulting models. Reproduce any model, with saved code, hyperparameters, launch commands, input data, and resulting model weights. Set wandb.config once at the beginning of your script to save your hyperparameters, input settings (like dataset name or model type), and any other independent variables for your experiments. This is useful for analyzing your experiments and reproducing your work in the future. Setting configs also allows you to visualize the relationships between features of your model architecture or data pipeline and model performance.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Alibi Explain

    Alibi Explain

    Algorithms for explaining machine learning models

    Alibi is a Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-quality implementations of black-box, white-box, local and global explanation methods for classification and regression models.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    Cosmos-RL

    Cosmos-RL

    Cosmos-RL is a flexible and scalable Reinforcement Learning framework

    Cosmos-RL is a scalable reinforcement learning framework designed specifically for physical AI systems such as robotics, autonomous agents, and multimodal models. It provides a distributed training architecture that separates policy learning and environment rollout processes, enabling efficient and asynchronous reinforcement learning at scale. The framework supports multiple parallelism strategies, including tensor, pipeline, and data parallelism, allowing it to leverage large GPU clusters effectively. It is built with compatibility in mind, supporting popular model families such as LLaMA, Qwen, and diffusion-based world models, as well as integration with Hugging Face ecosystems. cosmos-rl also includes support for advanced RL algorithms, low-precision training, and fault-tolerant execution, making it suitable for large-scale production workloads.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    DI-engine

    DI-engine

    OpenDILab Decision AI Engine

    DI-engine is a unified reinforcement learning (RL) platform for reproducible and scalable RL research. It offers modular pipelines for various RL algorithms, with an emphasis on production-level training and evaluation.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    BindsNET

    BindsNET

    Simulation of spiking neural networks (SNNs) using PyTorch

    A Python package used for simulating spiking neural networks (SNNs) on CPUs or GPUs using PyTorch Tensor functionality. BindsNET is a spiking neural network simulation library geared towards the development of biologically inspired algorithms for machine learning. This package is used as part of ongoing research on applying SNNs to machine learning (ML) and reinforcement learning (RL) problems in the Biologically Inspired Neural & Dynamical Systems (BINDS) lab.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 15
    Deep Reinforcement Learning for Keras

    Deep Reinforcement Learning for Keras

    Deep Reinforcement Learning for Keras.

    keras-rl implements some state-of-the-art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library Keras. Furthermore, keras-rl works with OpenAI Gym out of the box. This means that evaluating and playing around with different algorithms is easy. Of course, you can extend keras-rl according to your own needs. You can use built-in Keras callbacks and metrics or define your own. Even more so, it is easy to implement your own environments and even algorithms by simply extending some simple abstract classes. Documentation is available online.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    Multi-Agent Orchestrator

    Multi-Agent Orchestrator

    Flexible and powerful framework for managing multiple AI agents

    Multi-Agent Orchestrator is an AI coordination framework that enables multiple intelligent agents to work together to complete complex, multi-step workflows.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    VectorizedMultiAgentSimulator (VMAS)

    VectorizedMultiAgentSimulator (VMAS)

    VMAS is a vectorized differentiable simulator

    VectorizedMultiAgentSimulator is a high-performance, vectorized simulator for multi-agent systems, focusing on large-scale agent interactions in shared environments. It is designed for research in multi-agent reinforcement learning, robotics, and autonomous systems where thousands of agents need to be simulated efficiently.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    AgentUniverse

    AgentUniverse

    agentUniverse is a LLM multi-agent framework

    AgentUniverse is a multi-agent AI framework that enables coordination between multiple intelligent agents for complex task execution and automation.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 19
    AnyTrading

    AnyTrading

    The most simple, flexible, and comprehensive OpenAI Gym trading

    gym-anytrading is an OpenAI Gym-compatible environment designed for developing and testing reinforcement learning algorithms on trading strategies. It simulates trading environments for financial markets, including stocks and forex.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    Atropos

    Atropos

    Language Model Reinforcement Learning Environments frameworks

    Atropos is a comprehensive open-source framework for reinforcement learning (RL) environments tailored specifically to work with large language models (LLMs). Designed as a scalable ecosystem of environment microservices, Atropos allows researchers and developers to collect, evaluate, and manage trajectories (sequences of actions and outcomes) generated by LLMs across a variety of tasks—from static dataset benchmarks to dynamic interactive games and real-world scenario environments. It provides foundational tooling for asynchronous RL loops where environment services communicate with trainers and inference engines, enabling complex workflow orchestration in distributed and parallel setups. This framework facilitates experimentation with RLHF (Reinforcement Learning from Human Feedback), RLAIF, or multi-turn training approaches by abstracting environment logic, scoring, and logging into reusable components.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    CleanRL

    CleanRL

    High-quality single file implementation of Deep Reinforcement Learning

    CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. CleanRL is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's variant or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have to do a lot of subclassing like sometimes in modular DRL libraries).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Google Research Football

    Google Research Football

    Check out the new game server

    Google Research Football is a reinforcement learning environment simulating soccer matches. It focuses on learning complex behaviors such as team collaboration and strategy formation in competitive settings.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    H2O LLM Studio

    H2O LLM Studio

    Framework and no-code GUI for fine-tuning LLMs

    Welcome to H2O LLM Studio, a framework and no-code GUI designed for fine-tuning state-of-the-art large language models (LLMs). You can also use H2O LLM Studio with the command line interface (CLI) and specify the configuration file that contains all the experiment parameters. To finetune using H2O LLM Studio with CLI, activate the pipenv environment by running make shell. With H2O LLM Studio, training your large language model is easy and intuitive. First, upload your dataset and then start training your model. Start by creating an experiment. You can then monitor and manage your experiment, compare experiments, or push the model to Hugging Face to share it with the community.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    MedicalGPT

    MedicalGPT

    MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training

    MedicalGPT training medical GPT model with ChatGPT training pipeline, implementation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. MedicalGPT trains large medical models, including secondary pre-training, supervised fine-tuning, reward modeling, and reinforcement learning training.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    PaLM + RLHF - Pytorch

    PaLM + RLHF - Pytorch

    Implementation of RLHF (Reinforcement Learning with Human Feedback)

    PaLM-rlhf-pytorch is a PyTorch implementation of Pathways Language Model (PaLM) with Reinforcement Learning from Human Feedback (RLHF). It is designed for fine-tuning large-scale language models with human preference alignment, similar to OpenAI’s approach for training models like ChatGPT.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB