Showing 290 open source projects for "python voice synthesis"

View related business solutions
  • Time tracking software for the global workforce Icon
    Time tracking software for the global workforce

    Teams of all sizes and in various industries that want the best time tracking and employee monitoring solution.

    It's easy with Hubstaff, a time-tracking and workforce management platform that automates almost every aspect of running or growing a business. Teams can track time to projects and to-dos using Hubstaff's desktop, web, or mobile applications. You'll be able to see how much time your team spends on different tasks, plus productivity metrics like activity rates and app usage through Hubstaff's online dashboard. Most of the available features are customizable on a per-user basis, so you can create the team management tool you need.
    Learn More
  • Dragonfly | An In-Memory Data Store without Limits Icon
    Dragonfly | An In-Memory Data Store without Limits

    Dragonfly Cloud is engineered to handle the heaviest data workloads with the strictest security requirements.

    Dragonfly is a drop-in Redis replacement that is designed for heavy data workloads running on modern cloud hardware. Migrate in less than a day and experience up to 25X the performance on half the infrastructure.
    Learn More
  • 1
    ebook2audiobook

    ebook2audiobook

    Generate audiobooks from e-books, voice cloning & 1107+ languages

    ebook2audiobook is a tool to convert legally obtained eBooks (non-DRM) into fully narrated audiobooks, complete with chapters and metadata. It automates the pipeline: it reads the eBook file, splits it into appropriate segments (chapters, paragraphs), uses text-to-speech (TTS) models to synthesize audio, optionally applies voice cloning, and outputs a final audiobook — ideal for people who prefer listening over reading, or for accessibility purposes. The tool supports a wide array of...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 2
    Sopro TTS

    Sopro TTS

    A lightweight text-to-speech model with zero-shot voice cloning

    ...The model is designed to work with a small set of dependencies and to be accessible for developers who want offline TTS with customizable voice style, including options for streaming or non-streaming generation modes. Users can install it with standard Python tools, run a demo server locally, and experiment with CLI or Python API usage for producing synthetic speech.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Rhino

    Rhino

    On-device Speech-to-Intent engine powered by deep learning

    Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of interest, in real-time. The end-to-end platform for embedding private voice AI into any software in a few lines of code. Design with no limits on top of a modular platform. Create use-case-specific voice AI models in seconds. Develop voice features with a few lines of code using intuitive and cross-platform SDKs. Deliver voice AI everywhere: on-device, mobile, web browsers,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    OpenMontage

    OpenMontage

    World's first open-source, agentic video production system

    OpenMontage is an open-source, agent-driven video production system that transforms AI coding assistants into fully automated multimedia creation pipelines. Instead of focusing on a single capability such as text-to-video generation, it treats video production as a structured, multi-stage workflow that mirrors how a real production team operates, including research, scripting, asset generation, editing, and final rendering. The system orchestrates a large collection of tools and models...
    Downloads: 53 This Week
    Last Update:
    See Project
  • A privacy-first API that predicts global consumer preferences Icon
    A privacy-first API that predicts global consumer preferences

    Qloo AI adds value to a wide range of Fortune 500 companies in the media, technology, CPG, hospitality, and automotive sectors.

    Through our API, we provide contextualized personalization and insights based on a deep understanding of consumer behavior and more than 575 million people, places, and things.
    Learn More
  • 5
    Rasa

    Rasa

    Open source machine learning framework to automate text conversations

    ...Rasa uses Poetry for packaging and dependency management. If you want to build it from the source, you have to install Poetry first. By default, Poetry will try to use the currently activated Python version to create the virtual environment for the current project automatically.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    MetaVoice-1B

    MetaVoice-1B

    Foundational model for human-like, expressive TTS

    MetaVoice — in the form of its source repository “metavoice-src” — is a large-scale text-to-speech (TTS) model. Specifically, the base model (MetaVoice-1B) uses around 1.2 billion parameters and has been trained on a massive dataset — reportedly around 100,000 hours of speech data. The goal is to provide human-like, expressive, and flexible TTS: able to generate natural-sounding speech that can handle diverse inputs and likely generalize over voice styles, intonation, prosody, and perhaps...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OpenAI-Compatible Edge-TTS API

    OpenAI-Compatible Edge-TTS API

    Free, high-quality text-to-speech API endpoint to replace OpenAI

    OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    DragonianVoice

    DragonianVoice

    C++ inference library for multiple SVC/TTS

    DragonianVoice is a C++ inference library that unifies multiple speech synthesis, voice conversion, and singing voice synthesis models under a single, high-performance ONNX-based framework. It focuses on being a reusable native library rather than a full UI product, with bindings for C, C++, and C# so it can be embedded into other applications or engines. The project supports a wide range of model families: TTS models such as Tacotron2, VITS, EmotionalVITS, BERTVits2, GPT-SoVITS, SVC systems like SoVitsSvc (v2/v3/v4), RVC, DiffSvc, DiffusionSvc, FishDiffusion, ReflowSvc, and even singing systems like DiffSinger and related pitch/feature extractors like FCPE and RMVPE. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Bailing

    Bailing

    Bailing is a voice dialogue robot similar to GPT-4o

    Bailing is an open-source voice-dialogue assistant designed to deliver natural voice-based conversations by combining automatic speech recognition (ASR), voice activity detection (VAD), a large language model (LLM), and text-to-speech (TTS) in a single pipeline. Its goal is to offer a “voice-first” chat experience similar to what one might expect from a system like GPT-4o, but fully open and deployable by users. The project is modular: each core function — ASR, VAD, LLM, TTS — exists as a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • DataImpulse - Ethical Provider of Residential, Mobile, and Data Center IPs Icon
    DataImpulse - Ethical Provider of Residential, Mobile, and Data Center IPs

    For anyone looking for residential proxies, mobile proxies, and datacenter proxies

    DataImpulse (dataimpulse.com) is a proxy provider offering a pool of over 90 million ethically sourced residential, mobile, and data center IP addresses from 195 countries. Pricing for residential proxies starts at $1 per GB with a pay-as-you-go model; there are no subscriptions or traffic expiration dates.
    Learn More
  • 10
    OpenHome Abilities

    OpenHome Abilities

    Open-source abilities for OpenHome agents

    OpenHome Abilities is an open-source repository of modular voice AI plugins created for OpenHome agents, giving developers a lightweight way to extend what an agent can do through spoken triggers. Each ability is intentionally simple in structure, centering on a single main.py file that contains the core Python logic, which lowers the barrier to building and sharing custom behaviors. The system is meant to support a wide range of voice-driven actions, from API calls and media playback to quiz flows, device control, and multi-turn conversations, so it functions as a practical extension framework rather than a narrow template library. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    ShortGPT

    ShortGPT

    AI framework for automated short video creation and editing tools

    ...It can automatically assemble videos by combining generated scripts, sourced media assets, captions, and synthesized voice narration. A modular editing system based on structured markup and JSON allows editing steps to be broken into manageable components that can be interpreted by language models.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    ACE-Step 1.5

    ACE-Step 1.5

    The most powerful local music generation model

    ACE-Step 1.5 is an advanced open-source foundation model for AI-driven music generation that pushes beyond traditional limitations in speed, musical coherence, and controllability by innovating in architecture and training design. It integrates cutting-edge generative techniques—such as diffusion-based synthesis combined with compressed autoencoders and lightweight transformer elements—to produce high-quality full-length music tracks with rapid inference times, capable of generating a...
    Downloads: 80 This Week
    Last Update:
    See Project
  • 13
    Hunyuan3D-2.1

    Hunyuan3D-2.1

    From Images to High-Fidelity 3D Assets

    ...It improves on prior versions by using a PBR texture pipeline (enabling realistic material effects like reflections and subsurface scattering) and allowing community fine-tuning and extension. It supports both shape generation (mesh geometry) and texture generation modules. Physically Based Rendering texture synthesis to model realistic material effects, including reflections, subsurface scattering, etc. Cross-platform support (MacOS, Windows, Linux) via Python / PyTorch, including diffusers-style APIs.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 14
    PyGPT

    PyGPT

    Open source personal AI Assistant for Linux, Windows and Mac

    ...It allows you to talk in chat mode and in completion mode, as well as generate images using DALL-E 2. PyGPT also adds access to the Internet for GPT via Google Custom Search API and Wikipedia API and includes voice synthesis using Microsoft Azure Text-to-Speech API. Moreover, the application has implemented context memory support, context storage, history of contexts, which can be restored at any time and e.g. continue the conversation from point in history, and also has a convenient and intuitive system of presets that allows you to quickly and pleasantly create and manage your prompts. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    CodeGen

    CodeGen

    Open-source model for program synthesis

    CodeGen is a family of open-source large language models designed specifically for program synthesis and code generation tasks. Developed by Salesforce Research, the models are trained on large datasets containing both natural language and programming language content. This allows them to translate natural language descriptions into functional code across a variety of programming languages. CodeGen supports multi-turn program synthesis, meaning it can generate complex programs through a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 17
    Ultravox

    Ultravox

    Fast multimodal LLM for real-time voice interaction and AI apps

    Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline. Internally, it leverages pretrained...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    FastKoko

    FastKoko

    Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

    FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    WhisperX

    WhisperX

    Automatic Speech Recognition with Word-level Timestamps

    WhisperX is an advanced speech recognition system built on top of OpenAI’s Whisper model, designed to improve transcription accuracy and timing precision for long-form audio. It addresses key limitations of standard Whisper implementations by introducing voice activity detection and forced alignment techniques to produce word-level timestamps. The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy. It is particularly effective for...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 21
    Orpheus TTS

    Orpheus TTS

    Towards Human-Sounding Speech

    ...Inference is provided through a Python package that uses vLLM under the hood for high-throughput, low-latency generation, including streaming examples that show how to generate audio chunks in real time. The maintainers provide Colab notebooks, a standardized prompting format, and one-click deployment via Baseten for production-grade, FP8/FP16 optimized inference with ~200 ms streaming latency.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    SafeClaw

    SafeClaw

    Chat with it via text and voice

    SafeClaw is an open-source, entirely local alternative to cloud-based AI assistants like OpenClaw, enabling users to build a personal assistant that runs on their own machine without incurring API usage charges or exposing data to third-party services. It emphasizes privacy and predictability by using traditional programming, rule-based intent parsing, and established machine learning tools rather than large language models, meaning there are no per-token API costs and deterministic...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    TEN

    TEN

    Open-source framework for conversational voice AI agents

    TEN (Transformative Extensions Network) is an open source framework designed to empower developers to build real-time multimodal AI agents capable of voice, video, text, image, and data-stream interaction with ultra-low latency. It includes a full ecosystem, TEN Turn Detection, TEN Agent, and TMAN Designer, allowing developers to rapidly assemble human-like, responsive agents that can see, speak, hear, and interact. With support for languages like Python, C++, and Go, it offers flexible deployment on both edge and cloud environments. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports...
    Downloads: 82 This Week
    Last Update:
    See Project
  • 25
    MCP Server Home Assistant

    MCP Server Home Assistant

    A Model Context Protocol Server for Home Assistant

    The Home Assistant MCP Server is an MCP server that integrates with Home Assistant, enabling AI assistants to interact with smart home devices and systems. It exposes Home Assistant voice intents through the Model Context Protocol for enhanced home control. ​
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB