Search Results for "8-puzzle reinforcement learning python" - Page 5

Showing 230 open source projects for "8-puzzle reinforcement learning python"

View related business solutions
  • ToogleBox: Simplify, Automate and Improve Google Workspace Functionalities Icon
    ToogleBox: Simplify, Automate and Improve Google Workspace Functionalities

    The must-have platform for Google Workspace

    ToogleBox was created as a solution to address the challenges faced by Google Workspace Super Admins. We developed a premium and secure Software-as-a-Service (SaaS) product completely based on specific customer needs. ToogleBox automates most of the manual processes when working with Google Workspace functionalities and includes additional features to improve the administrator experience.
    Learn More
  • Rev Your Digital Product Delivery Engine Icon
    Rev Your Digital Product Delivery Engine

    Enterprise-grade platform designed to connect strategy, planning, and execution across digital product development and software delivery

    Planview links your technology vision directly to teams' daily work, providing complete visibility and control over your digital product delivery ecosystem.
    Learn More
  • 1
    SetFit

    SetFit

    Efficient few-shot learning with Sentence Transformers

    SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers. It achieves high accuracy with little labeled data - for instance, with only 8 labeled examples per class on the Customer Reviews sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the full training set of 3k examples.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    LatentMAS

    LatentMAS

    Latent Collaboration in Multi-Agent Systems

    LatentMAS is an advanced framework for multi-agent reinforcement learning (MARL) that uses latent variable modeling to bridge perception and decision-making in environments where agents must coordinate under uncertainty. It provides mechanisms for agents to learn high-level latent representations of states, which simplifies complex sensory inputs into compact, actionable embeddings that facilitate both individual policy learning and inter-agent coordination. Using this latent space, the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ML for Beginners

    ML for Beginners

    12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all

    ML-For-Beginners is a structured, project-driven curriculum that teaches foundational machine learning concepts with approachable math and lots of code. Organized as a multi-week course, it mixes short lectures with labs in notebooks so learners practice regression, classification, clustering, and recommendation techniques on real datasets. Each lesson aims to connect the algorithm to a relatable scenario, reinforcing intuition before diving into parameters, metrics, and trade-offs. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    Agent S is an open-source agentic framework designed to enable autonomous computer use through an Agent-Computer Interface (ACI). Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Easily build robust connections between Salesforce and any platform Icon
    Easily build robust connections between Salesforce and any platform

    We help companies using Salesforce connect their data with a no-code Salesforce-native solution.

    Like having Postman inside Salesforce! Declarative Webhooks allows users to quickly and easily configure bi-directional integrations between Salesforce and external systems using a point-and-click interface. No coding is required, making it a fast and efficient and as a native solution, Declarative Webhooks seamlessly integrates with Salesforce platform features such as Flow, Process Builder, and Apex. You can also leverage the AI Integration Agent feature to automatically build your integration templates by providing it with links to API documentation.
    Learn More
  • 5
    MuJoCo Playground

    MuJoCo Playground

    An open source library for GPU-accelerated robot learning

    MuJoCo Playground, developed by Google DeepMind, is a GPU-accelerated suite of simulation environments for robot learning and sim-to-real research, built on top of MuJoCo MJX. It unifies a range of control, locomotion, and manipulation tasks into a consistent and scalable framework optimized for JAX and Warp backends. The project includes classic control benchmarks from dm_control, advanced quadruped and bipedal locomotion systems, and dexterous as well as non-prehensile manipulation setups....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Reflexion

    Reflexion

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Reflexion is a research-oriented AI framework that focuses on improving the reasoning and problem-solving capabilities of language model agents through iterative self-reflection and feedback loops. Instead of relying solely on a single-pass response, Reflexion enables agents to evaluate their own outputs, identify errors, and refine their reasoning over multiple iterations, leading to more accurate and reliable results. The framework introduces a mechanism where agents maintain a memory of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GLM-4.5V

    GLM-4.5V

    GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning

    GLM-4.5V is the preceding iteration in the GLM-V series that laid much of the groundwork for general multimodal reasoning and vision-language understanding. It embodies the design philosophy of mixing visual and textual modalities into a unified model capable of general-purpose reasoning, content understanding, and generation, while already supporting a wide variety of tasks: from image captioning and visual question answering to content recognition, GUI-based agents, video understanding,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    GLM-4

    GLM-4

    GLM-4 series: Open Multilingual Multimodal Chat LMs

    GLM-4 is a family of open models from ZhipuAI that spans base, chat, and reasoning variants at both 32B and 9B scales, with long-context support and practical local-deployment options. The GLM-4-32B-0414 models are trained on ~15T high-quality data (including substantial synthetic reasoning data), then post-trained with preference alignment, rejection sampling, and reinforcement learning to improve instruction following, coding, function calling, and agent-style behaviors. The...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    minbpe

    minbpe

    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

    minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Accounting practice management software Icon
    Accounting practice management software

    Accountants, accounting firms, tax attorneys, tax professionals

    Canopy is a cloud-based practice management software for accounting and tax firms, offering tools for client engagement, document management, workflow automation, and time & billing. Its Client Engagement platform centralizes interactions with a secure portal, customizable branding, and email integration, while the Document Management system enables organized, paperless file storage. The Workflow module enhances visibility into tasks and projects through templates, task assignments, and automation, reducing human error. Additionally, the Time & Billing feature tracks billable hours, generates invoices, and processes payments, ensuring accurate financial management. With its comprehensive features, Canopy streamlines operations, reduces stress, and enhances client experiences.
    Learn More
  • 10
    Open X-Embodiment

    Open X-Embodiment

    Unified open dataset enabling cross-embodiment learning for robotics

    Open X-Embodiment is a large-scale collaborative initiative led by Google DeepMind to unify robotic learning datasets into a consistent and standardized format, simplifying access and usage across the robotics research community. Its primary goal is to make all available open-source robotic data interoperable by representing them using the RLDS (Reinforcement Learning Dataset Structure) episode format. This enables seamless integration for training, evaluation, and model development across...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OpenManus

    OpenManus

    Open-source AI agent framework

    OpenManus is an open-source AI agent framework designed to autonomously execute complex, multi-step tasks by combining reasoning, planning, and tool use. It enables developers to build agents that can think, act, and iterate toward goals rather than simply responding to prompts. The platform emphasizes task decomposition, allowing agents to break down objectives into smaller steps and execute them sequentially or recursively. OpenManus supports integration with external tools, APIs, and...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 12
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    PaSa

    PaSa

    An advanced paper search agent powered by large language models

    PaSa is an open-source “paper search agent” built around large language models (LLMs), designed to automate the process of academic literature retrieval with human-like decision making. Instead of simply translating a query into keywords and returning a flat list of matching papers, PaSa uses a dual-agent architecture (Crawler + Selector) that can iteratively search, read, analyze, and filter academic publications — simulating how a researcher might dig through citation networks, expand...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    openage

    openage

    Open source clone of the Age of Empires II engine

    openage is a free cross-platform RTS game engine that provides the mechanics of Age of Empires. Using modern technologies as C++17, OpenGL/GLSL, Python, Qt5 and CMake allows people using GNU/Linux, BSD, macOS or Windows to play the game natively. Our aim is to make openage a platform for the original Age of Empires games providing the same look and feel, but with more features for modding and multiplayer. openage uses an open API powered by our human-readable configuration language nyan. We...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    RecBole

    RecBole

    A unified, comprehensive and efficient recommendation library

    A unified, comprehensive and efficient recommendation library. We design general and extensible data structures to unify the formatting and usage of various recommendation datasets. We implement more than 100 commonly used recommendation algorithms and provide formatted copies of 28 recommendation datasets. We support a series of widely adopted evaluation protocols or settings for testing and comparing recommendation algorithms. RecBole is developed based on Python and PyTorch for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    MaxText

    MaxText

    A simple, performant and scalable Jax LLM

    MaxText is a high-performance, highly scalable open-source framework designed to train and fine-tune large language models using the JAX ecosystem. The project acts as both a reference implementation and a practical training library that demonstrates best practices for building and scaling transformer-based language models on modern accelerator hardware. It is optimized to run efficiently on Google Cloud TPUs and GPUs, enabling researchers and engineers to train models ranging from small...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Skywork-R1V4

    Skywork-R1V4

    Skywork-R1V is an advanced multimodal AI model series

    Skywork-R1V is an open-source multimodal reasoning model designed to extend the capabilities of large language models into vision-language tasks that require complex logical reasoning. The project introduces a model architecture that transfers the reasoning abilities of advanced text-based models into visual domains so the system can interpret images and perform multi-step reasoning about them. Instead of retraining both language and vision models from scratch, the framework uses a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    The Alignment Handbook

    The Alignment Handbook

    Robust recipes to align language models with human and AI preferences

    The Alignment Handbook is an open-source resource created to provide practical guidance for aligning large language models with human preferences and safety requirements. The project focuses on the post-training stage of model development, where models are refined after pre-training to behave more helpfully, safely, and reliably in real-world applications. It provides detailed training recipes that explain how to perform tasks such as supervised fine-tuning, preference modeling, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    AReal

    AReal

    Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible

    AReaL is an open source, fully asynchronous reinforcement learning training system. AReal is designed for large reasoning and agentic models. It works with models that perform reasoning over multiple steps, agents interacting with environments. It is developed by the AReaL Team at Ant Group (inclusionAI) and builds upon the ReaLHF project. Release of training details, datasets, and models for reproducibility. It is intended to facilitate reproducible RL training on reasoning / agentic tasks,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Step-Audio-EditX

    Step-Audio-EditX

    LLM-based Reinforcement Learning audio edit model

    Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Z80-μLM

    Z80-μLM

    Z80-μLM is a 2-bit quantized language model

    Z80-μLM is a retro-computing AI project that demonstrates a tiny language model (Z80-μLM) engineered to run on an 8-bit Z80 CPU by aggressively quantizing weights down to 2-bit precision. The repository provides a complete workflow where you train or fine-tune conversational models in Python, then export them into a format that can be executed on classic Z80 systems. A key deliverable is producing CP/M-compatible .COM binaries, enabling a genuinely vintage “chat with your computer” experience on real hardware or accurate emulators. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    FireRed-Image-Edit

    FireRed-Image-Edit

    General-purpose image editing model that delivers high-fidelity

    FireRed-Image-Edit is an open-source general-purpose image editing model and toolset designed to deliver high-fidelity, visually coherent edits across a wide range of editing tasks, from simple object modifications to complex enhancements like restoration and style preservation. It is built on a flexible text-to-image foundation model that has been extended with training paradigms including pretraining, supervised fine-tuning, and reinforcement learning to imbue the system with strong...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Ling-V2

    Ling-V2

    Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI

    Ling-V2 is an open-source family of Mixture-of-Experts (MoE) large language models developed by the InclusionAI research organization with the goal of combining state-of-the-art performance, efficiency, and openness for next-generation AI applications. It introduces highly sparse architectures where only a fraction of the model’s parameters are activated per input token, enabling models like Ling-mini-2.0 to achieve reasoning and instruction-following capabilities on par with much larger...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB