Open Source Python Artificial Intelligence Software - Page 3

Sort By:

Python Artificial Intelligence Software

Artificial Intelligence Python Clear Filters

Browse free open source Python Artificial Intelligence Software and projects below. Use the toggles on the left to filter open source Python Artificial Intelligence Software by OS, license, language, programming language, and project status.

Award-Winning Medical Office Software Designed for Your Specialty
Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.

Learn More
Simplify Purchasing For Your Business
Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.

Learn More
1

Memvid

Video-based AI memory library. Store millions of text chunks in MP4

Memvid encodes text chunks as QR codes within MP4 frames to build a portable “video memory” for AI systems. This innovative approach uses standard video containers and offers millisecond-level semantic search across large corpora with dramatically less storage than vector DBs. It's self-contained—no DB needed—and supports features like PDF indexing, chat integration, and cloud dashboards.

Downloads: 61 This Week

Last Update: 2026-03-13
See Project
2

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything

X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for tasks such as object detection, segmentation, classification, tracking, and pose estimation. The tool is built with an interactive graphical interface that simplifies annotation workflows and allows users to draw and edit labels directly on visual data. It also supports a wide range of export formats compatible with popular machine learning pipelines, making it easier to integrate with training frameworks.

Downloads: 61 This Week

Last Update: 2026-03-26
See Project
3

Determined

Determined, deep learning training platform

The fastest and easiest way to build deep learning models. Distributed training without changing your model code. Determined takes care of provisioning machines, networking, data loading, and fault tolerance. Build more accurate models faster with scalable hyperparameter search, seamlessly orchestrated by Determined. Use state-of-the-art algorithms and explore results with our hyperparameter search visualizations. Interpret your experiment results using the Determined UI and TensorBoard, and reproduce experiments with artifact tracking. Deploy your model using Determined's built-in model registry. Easily share on-premise or cloud GPUs with your team. Determined’s cluster scheduling offers first-class support for deep learning and seamless spot instance support. Check out examples of how you can use Determined to train popular deep learning models at scale.

Downloads: 57 This Week

Last Update: 2025-03-19
See Project
4

GPT-SoVITS

1 min voice data can also be used to train a good TTS model

GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.

Downloads: 57 This Week

Last Update: 2025-07-29
See Project
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
5

Lyrebird

Simple and powerful voice changer for Linux, written with Python & GTK

Simple and powerful voice changer for Linux, written with Python & GTK.

Downloads: 53 This Week

Last Update: 2024-06-27
See Project
6

FLUX.2

Official inference repo for FLUX.2 models

FLUX.2 is a state-of-the-art open-weight image generation and editing model released by Black Forest Labs aimed at bridging the gap between research-grade capabilities and production-ready workflows. The model offers both text-to-image generation and powerful image editing, including editing of multiple reference images, with fidelity, consistency, and realism that push the limits of what open-source generative models have achieved. It supports high-resolution output (up to ~4 megapixels), which allows for photography-quality images, detailed product shots, infographics or UI mockups rather than just low-resolution drafts. FLUX.2 is built with a modern architecture (a flow-matching transformer + a revamped VAE + a strong vision-language encoder), enabling strong prompt adherence, correct rendering of text/typography in images, reliable lighting, layout, and physical realism, and consistent style/character/product identity across multiple generations or edits.

Downloads: 51 This Week

Last Update: 2026-03-12
See Project
7

KaTrain

Improve your Baduk skills by training with KataGo

KaTrain is an advanced training and analysis tool for the board game Go that leverages the powerful KataGo AI engine to provide real-time feedback and in-depth game review capabilities. It is designed to help players of all skill levels improve by identifying mistakes, analyzing move efficiency, and offering alternative strategies based on AI evaluation. The application allows users to play against AI opponents with adjustable difficulty, including intentionally weakened versions of the engine that simulate human-like play styles. One of its key strengths is its ability to generate detailed post-game analyses, highlighting the moves that resulted in the greatest loss of points and suggesting improvements. KaTrain also includes interactive learning features such as retrying moves, exploring variations, and visualizing territory control probabilities.

Downloads: 50 This Week

Last Update: 2026-03-19
See Project
8

FlashAttention

Fast and memory-efficient exact attention

FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations. The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. By improving both forward and backward pass efficiency, it enables training and inference of large language models with longer sequence lengths and higher throughput. The library integrates with PyTorch and supports various attention configurations, including causal masking, multi-query attention, and rotary embeddings.

Downloads: 49 This Week

Last Update: 2026-03-18
See Project
9

Mycroft

Mycroft Core, the Mycroft Artificial Intelligence platform

Mycroft is the world’s leading open source voice assistant. It is private by default and completely customizable. Our software runs on many platforms, on desktop, our reference hardware, a Raspberry Pi, or your own custom hardware. Our open-source, modular system can be ported to your device or environment, at any price point. Whether you make voice-assistants, televisions, or microwaves. Whether you have a 5-room BnB or a 1000-room hotel. Your customers will get access to all the necessities of a voice assistant. Our software and essential services are free (as in freedom) and also gratis (at no cost to you or them). And especially not at the cost of their (or your) privacy! Your customers will be able to upgrade their experience with premium content and services. The Mycroft open source voice stack can be freely remixed, extended, and deployed anywhere. Mycroft may be used in anything from a science project to a global enterprise environment.

Downloads: 48 This Week

Last Update: 2023-03-21
See Project
Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
10

MarkItDown

Python tool for converting files and office documents to Markdown

MarkItDown is a lightweight Python utility developed by Microsoft for converting various files and office documents to Markdown format. It is particularly useful for preparing documents for use with large language models and related text analysis pipelines.

1 Review

Downloads: 47 This Week

Last Update: 2026-02-20
See Project
11

TorchRL

A modular, primitive-first, python-first PyTorch library

TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. TorchRL provides PyTorch and python-first, low and high-level abstractions for RL that are intended to be efficient, modular, documented, and properly tested. The code is aimed at supporting research in RL. Most of it is written in Python in a highly modular way, such that researchers can easily swap components, transform them, or write new ones with little effort.

Downloads: 47 This Week

Last Update: 2026-02-05
See Project
12

TurboQuant+

Implementation of TurboQuant (ICLR 2026)

TurboQuant Plus is an extended and enhanced version of quantization tooling aimed at improving neural network efficiency through advanced compression and optimization strategies. It builds upon the concept of reducing model precision to accelerate inference while attempting to maintain or recover accuracy through refined techniques. The project explores additional enhancements such as improved calibration, adaptive quantization, and potentially hybrid precision approaches that combine multiple levels of compression. It is designed to be used in conjunction with modern machine learning workflows, particularly those involving large models that require optimization for deployment. TurboQuant Plus focuses on experimentation and performance tuning, allowing developers to test different configurations and evaluate trade-offs. Its architecture supports extensibility, enabling further development of quantization methods and integration with existing ML pipelines.

Downloads: 47 This Week

Last Update: 5 days ago
See Project
13

LMDeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs

LMDeploy is a toolkit designed for compressing, deploying, and serving large language models (LLMs). It offers tools and workflows to optimize LLMs for production environments, ensuring efficient performance and scalability. LMDeploy supports various model architectures and provides deployment solutions across different platforms.

Downloads: 45 This Week

Last Update: 6 days ago
See Project
14

Mistral Vibe CLI

Minimal CLI coding agent by Mistral

Mistral Vibe is an AI-powered “vibe-coding” command-line interface (CLI) and coding-assistant framework built by Mistral AI to let developers write, refactor, search, and manage code through natural language and context-aware automation, rather than manual typing only. It aims to take developers out of repetitive boilerplate and let them stay “in the flow”: you can ask the tool to generate functions, refactor code, search across the codebase, manipulate files, commit changes via Git, or run commands — all from a unified CLI interface. Behind the scenes, it leverages Mistral’s coding-optimized LLM stack (including models tuned for code understanding and generation), with project-wide context awareness: it scans your file structure, Git status, and recent history to inform suggestions so that generated code aligns with existing context.

Downloads: 44 This Week

Last Update: 5 hours ago
See Project
15

vLLM

A high-throughput and memory-efficient inference and serving engine

vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.

Downloads: 42 This Week

Last Update: 2026-04-03
See Project
16

Basic Pitch

A lightweight audio-to-MIDI converter with pitch bend detection

Basic Pitch is a Python library for Automatic Music Transcription (AMT), using lightweight neural network developed by Spotify's Audio Intelligence Lab. It's small, easy-to-use, pip install-able and npm install-able via its sibling repo. Basic Pitch may be simple, but it's is far from "basic"! basic-pitch is efficient and easy to use, and its multi pitch support, its ability to generalize across instruments, and its note accuracy compete with much larger and more resource-hungry AMT systems. Provide a compatible audio file and a basic-pitch will generate a MIDI file, complete with pitch bends. The basic pitch is instrument-agnostic and supports polyphonic instruments, so you can freely enjoy transcription of all your favorite music, no matter what instrument is used. Basic pitch works best on one instrument at a time.

Downloads: 40 This Week

Last Update: 2024-08-16
See Project
17

Frigate

NVR with realtime local object detection for IP cameras

Frigate - NVR With Realtime Object Detection for IP Cameras A complete and local NVR designed for Home Assistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras. Use of a Google Coral Accelerator is optional, but highly recommended. The Coral will outperform even the best CPUs and can process 100+ FPS with very little overhead.

Downloads: 40 This Week

Last Update: 2026-03-19
See Project
18

Kimi Code CLI

Kimi Code CLI is your next CLI agent

Kimi CLI is a command-line AI agent that brings an intelligent software development assistant directly into your terminal, helping you with coding tasks, shell operations, and workflow automation without leaving your command prompt. It supports an interactive shell-like user interface where you can chat with the agent, request code edits, run shell commands, and receive contextual suggestions as you work, creating a seamless blend of AI-augmented development and traditional terminal usage. The tool includes integration with Zsh so that users can activate AI assistance via a hotkey while staying within their favorite shell environment, and it can serve as an Agent Client Protocol (ACP) server to bridge AI functionality into compatible IDEs and editors. Its support for well-established MCP tool configuration conventions lets developers connect the CLI to external tools and services during workflows, expanding its capabilities beyond simple queries into orchestrated development tasks.

Downloads: 40 This Week

Last Update: 22 hours ago
See Project
19

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. The framework targets both interactive graphical applications and media-rich experiences, making it a solid foundation for games, creative tools, or visualization systems that demand both performance and flexibility. While being low-level, it also provides sensible defaults and helper abstractions that reduce boilerplate and help teams maintain clear, maintainable code.

Downloads: 40 This Week

Last Update: 2026-03-30
See Project
20

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.

Downloads: 40 This Week

Last Update: 2026-01-15
See Project
21

DiffSinger

Singing Voice Synthesis via Shallow Diffusion Mechanism

DiffSinger is an open-source PyTorch implementation of a diffusion-based acoustic model for singing-voice synthesis (SVS) and also text-to-speech (TTS) in a related variant. The core idea is to view generation of a sung voice (mel-spectrogram) as a diffusion process: starting from noise, the model iteratively “denoises” while being conditioned on a music score (lyrics, pitch, musical timing). This avoids some of the typical problems of prior SVS models — like over-smoothing or unstable GAN training — and produces more realistic, expressive, and natural-sounding singing. The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.

Downloads: 39 This Week

Last Update: 2025-11-28
See Project
22

WanGP

AI video generator optimized for low VRAM and older GPUs use

Wan2GP is an open source AI video generation toolkit designed to make modern generative models accessible on consumer-grade hardware with limited GPU memory. It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and certain AMD GPUs. Wan2GP provides a full web-based interface that simplifies interaction with complex generative pipelines, making it easier to configure prompts, models, and rendering settings. It also integrates a wide range of utilities such as prompt enhancement, mask editing, motion design, and extraction tools for pose, depth, and flow data to support advanced video workflows.

Downloads: 39 This Week

Last Update: 5 days ago
See Project
23

Coqui TTS

A deep learning toolkit for Text-to-Speech, battle-tested in research

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN) Fast and efficient model training. Detailed training logs on the terminal and Tensorboard. Support for Multi-speaker TTS. Efficient, flexible, and lightweight but feature complete Trainer API. Released and ready-to-use models. Tools to curate Text2Speech datasets underdataset_analysis. Utilities to use and test your models.

Downloads: 38 This Week

Last Update: 2023-12-12
See Project
24

EasyOCR

Ready-to-use OCR with 80+ supported languages

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc. EasyOCR is a python module for extracting text from image. It is a general OCR that can read both natural scene text and dense text in document. We are currently supporting 80+ languages and expanding. Second-generation models: multiple times smaller size, multiple times faster inference, additional characters and comparable accuracy to the first generation models. EasyOCR will choose the latest model by default but you can also specify which model to use. Model weights for the chosen language will be automatically downloaded or you can download them manually from the model hub. The idea is to be able to plug-in any state-of-the-art model into EasyOCR. There are a lot of geniuses trying to make better detection/recognition models, but we are not trying to be geniuses here. We just want to make their works quickly accessible to the public.

Downloads: 38 This Week

Last Update: 2024-09-24
See Project
25

edge-tts

Use Microsoft Edge's online text-to-speech service from Python

edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. It also supports generating subtitle files (such as SRT or VTT) alongside the speech, which is handy for video narration, e-learning, or accessibility workflows. From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. The library is asynchronous under the hood, which makes it efficient for batch jobs or web services that need to synthesize many utterances concurrently.

Downloads: 38 This Week

Last Update: 2025-12-12
See Project