Showing 243 open source projects for "python voice synthesis"

View related business solutions
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 1
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    PaddleSpeech

    PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model

    PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with state-of-art and influential models. Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Pedalboard

    Pedalboard

    A Python library for audio

    pedalboard is a Python library for working with audio: reading, writing, rendering, adding effects, and more. It supports the most popular audio file formats and a number of common audio effects out of the box and also allows the use of VST3® and Audio Unit formats for loading third-party software instruments and effects. pedalboard was built by Spotify’s Audio Intelligence Lab to enable using studio-quality audio effects from within Python and TensorFlow.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    hls4ml

    hls4ml

    Machine learning on FPGAs using HLS

    hls4ml is an open-source framework that enables machine learning models to be implemented directly on hardware such as FPGAs and ASICs using high-level synthesis techniques. The system converts trained neural network models from common machine learning frameworks into hardware description code suitable for ultra-low-latency inference. This approach allows machine learning algorithms to run directly on specialized hardware, making them suitable for applications that require extremely fast...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 5
    DeepSearcher

    DeepSearcher

    Open Source Deep Research Alternative to Reason and Search

    DeepSearcher is an open-source “deep research” style system that combines retrieval with evaluation and reasoning to answer complex questions using private or enterprise data. It is designed around the idea that high-quality answers require more than top-k retrieval, so it orchestrates multi-step search, evidence collection, and synthesis into a comprehensive response. The project integrates with vector databases (including Milvus and related options) so organizations can index internal...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    stt

    stt

    Voice Recognition to Text Tool

    stt is a standalone speech recognition tool that locally converts spoken content in audio or video files into textual formats without requiring internet access, giving users control over their data and reducing reliance on external APIs. It leverages open-source speech models such as Faster-Whisper to recognize and transcribe human speech into plain text, structured JSON objects, or subtitle files with time codes, making it suitable for both personal and professional transcription tasks. The...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    CodeGeeX

    CodeGeeX

    CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

    CodeGeeX is a large-scale multilingual code generation model with 13 billion parameters, trained on 850B tokens across more than 20 programming languages. Developed with MindSpore and later made PyTorch-compatible, it is capable of multilingual code generation, cross-lingual code translation, code completion, summarization, and explanation. It has been benchmarked on HumanEval-X, a multilingual program synthesis benchmark introduced alongside the model, and achieves state-of-the-art...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 9
    Story Flicks

    Story Flicks

    Generate high-definition story short videos with one click using AI

    Story Flicks is another open-source project in the AI-assisted video generation / editing space, focused on creating short, story-style videos from script or prompt inputs. It aims to let users generate high-definition short movies or video stories with minimal manual effort, using AI models under the hood to assemble visuals, timing, and possibly narration or subtitles. For creators who want to produce narrative short-form content — whether for social media, storytelling, or prototyping...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Jscrambler: Pioneering Client-Side Protection Platform Icon
    Jscrambler: Pioneering Client-Side Protection Platform

    Jscrambler offers an exclusive blend of cutting-edge first-party JavaScript obfuscation and state-of-the-art third-party tag protection.

    Jscrambler is the leader in Client-Side Protection and Compliance. We were the first to merge advanced polymorphic JavaScript obfuscation with fine-grained third-party tag protection in a unified Client-Side Protection and Compliance Platform. Our integrated solution ensures a robust defense against current and emerging client-side cyber threats, data leaks, and IP theft, empowering software development and digital teams to innovate securely. With Jscrambler, businesses adopt a unified, future-proof client-side security policy all while achieving compliance with emerging security standards including PCI DSS v4.0. Trusted by digital leaders worldwide, Jscrambler gives businesses the freedom to innovate securely.
    Learn More
  • 10
    DeepCode

    DeepCode

    DeepCode: Open Agentic Coding

    DeepCode is an agentic coding platform built around a multi-agent architecture that turns high-level inputs, including research papers, documents, and natural-language requirements, into working software artifacts. It positions itself as an “open agentic coding” system that can handle tasks like paper-to-code reproduction, frontend generation, and backend implementation by decomposing problems into structured steps and coordinating specialized agents. The system description highlights an...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Whisper-WebUI

    Whisper-WebUI

    A Web UI for easy subtitle using whisper model

    Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    SoniTranslate

    SoniTranslate

    Synchronized Translation for Videos

    SoniTranslate is a video translation and dubbing system that produces synchronized target-language audio tracks for existing video content. It provides a web UI built with Gradio, allowing users to upload a video, choose source and target languages, and then run a pipeline that handles transcription, translation and re-synthesis of speech. Under the hood, it uses advanced speech and diarization models to separate speakers, align audio with timecodes and respect subtitle timing, which lets...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 13
    SenseVoice

    SenseVoice

    Multilingual speech recognition and audio understanding model

    SenseVoice is a speech foundation model designed to perform multiple voice understanding tasks from audio input. It provides capabilities such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection within a single system. SenseVoice is trained on more than 400,000 hours of speech data and supports over 50 languages for multilingual recognition tasks. It is built to achieve high transcription accuracy while maintaining efficient...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    Featuretools

    Featuretools

    An open source python library for automated feature engineering

    An open source Python framework for automated feature engineering. Featuretools automatically creates features from temporal and relational datasets. Featuretools uses DFS for automated feature engineering. You can combine your raw data with what you know about your data to build meaningful features for machine learning and predictive modeling. Featuretools provides APIs to ensure only valid data is used for calculations, keeping your feature vectors safe from common label leakage problems....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    FAY

    FAY

    Framework for building AI-powered interactive digital humans and agent

    Fay is an open source framework designed to build and deploy interactive digital humans powered by large language models. It acts as a middleware layer that connects digital character technologies with conversational AI systems and business applications. Fay supports various types of digital humans, including 2.5D and 3D avatars, and can be integrated with applications running on mobile devices, PCs, web platforms, and embedded systems. Its architecture allows developers to combine different...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Step1X-3D

    Step1X-3D

    High-Fidelity and Controllable Generation of Textured 3D Assets

    Step1X-3D is an open-source framework for generating high-fidelity textured 3D assets from scratch — both their geometry and surface textures — using modern generative AI techniques. It combines a hybrid architecture: a geometry generation stage using a VAE-DiT model to output a watertight 3D representation (e.g. TSDF surface), and a texture synthesis stage that conditions on geometry and optionally reference input (or prompts) to produce view-consistent textures using a diffusion-based...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 18
    AgenticSeek

    AgenticSeek

    Fully Local Manus AI. No APIs, No $200 monthly bills

    AgenticSeek is a fully local autonomous AI assistant designed as a privacy-focused alternative to cloud-based agent platforms. It runs entirely on the user’s hardware and can autonomously browse the web, write code, and plan multi-step tasks without sending data to external services. The system is optimized for local reasoning models and emphasizes zero cloud dependency to maintain full user control. AgenticSeek includes intelligent agent selection, allowing it to determine the best internal...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    Open Interpreter

    Open Interpreter

    A natural language interface for computers

    Open Interpreter is an open-source tool that provides a natural-language interface for interacting with your computer. It lets large language models (LLMs) run code locally (Python, JavaScript, shell, etc.), enabling you to ask your computer to do tasks like data analysis, file manipulation, browsing, etc. in human terms (“chat with your computer”), with safeguards. Runs locally or via configured remote LLM servers/inference backends, giving flexibility to use models you trust or have...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    Paper2GUI

    Paper2GUI

    Convert AI papers to GUI

    Convert AI papers to GUI,Make it easy and convenient for everyone to use artificial intelligence technology。让每个人都简单方便的使用前沿人工智能技术 Paper2GUI: An AI desktop APP toolbox for ordinary people. It can be used immediately without installation. It already supports 40+ AI models, covering AI painting, speech synthesis, video frame complementing, video super-resolution, object detection, and image stylization. , OCR recognition and other fields. Support Windows, Mac, Linux systems. Paper2GUI:...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 21
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    ComfyUI-HunyuanVideoWrapper

    ComfyUI-HunyuanVideoWrapper

    ComfyUI wrapper nodes for HunyuanVideo

    The ComfyUI-HunyuanVideoWrapper project is a ComfyUI extension that integrates Hunyuan-based multimodal video generation models into node-based workflows. It allows users to generate or manipulate video content by combining text prompts with one or more input images, enabling flexible conditioning of outputs. The system introduces specialized nodes such as text-image encoders that allow multiple image inputs to be referenced directly within prompts. This makes it possible to guide generation...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    LLM Council

    LLM Council

    LLM Council works together to answer your hardest questions

    LLM Council is a creative open-source web application by Andrej Karpathy that lets you consult multiple large language models together to answer questions more reliably than querying a single model. Instead of relying on one provider, this application sends your query simultaneously to several LLMs supported via OpenRouter, collects each model’s independent response, and then orchestrates a multi-stage evaluation where the models critique and rank each other’s outputs anonymously. After this...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Aider

    Aider

    Aider is AI pair programming in your terminal

    Aider is an AI pair programming tool that runs directly in your terminal, helping developers build new projects or extend existing codebases faster and more confidently. It works alongside you like a coding partner, using powerful large language models to understand your code and implement precise changes. Aider creates a structured map of your entire repository, allowing it to handle large and complex projects effectively. It supports over 100 programming languages, making it flexible for...
    Downloads: 12 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB