Showing 22 open source projects for "python voice synthesis"

View related business solutions
  • Data management solutions for confident marketing Icon
    Data management solutions for confident marketing

    For companies wanting a complete Data Management solution that is native to Salesforce

    Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
    Learn More
  • Failed Payment Recovery for Subscription Businesses Icon
    Failed Payment Recovery for Subscription Businesses

    For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

    FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
    Learn More
  • 1
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. The system...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    RunAnywhere

    RunAnywhere

    Production ready toolkit to run AI locally

    RunAnywhere SDKs are a set of cross-platform development tools that enable applications to run artificial intelligence models directly on user devices instead of relying on cloud infrastructure. The toolkit allows developers to integrate language models, speech recognition, and voice synthesis capabilities into mobile or desktop applications while keeping all computation local. By running models entirely on device, the platform eliminates network latency and protects user data because information does not leave the device. The SDK supports popular open-source models such as Llama, Mistral, and Qwen, enabling developers to build AI-powered features such as chat interfaces and voice assistants with minimal external dependencies. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    CodeGen

    CodeGen

    Open-source model for program synthesis

    CodeGen is a family of open-source large language models designed specifically for program synthesis and code generation tasks. Developed by Salesforce Research, the models are trained on large datasets containing both natural language and programming language content. This allows them to translate natural language descriptions into functional code across a variety of programming languages. CodeGen supports multi-turn program synthesis, meaning it can generate complex programs through a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight Icon
    Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight

    Lock Down Any Resource, Anywhere, Anytime

    CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.
    Learn More
  • 5
    CodeLlama

    CodeLlama

    Inference code for CodeLlama models

    Code Llama is a family of Llama-based code models optimized for programming tasks such as code generation, completion, and repair, with variants specialized for base coding, Python, and instruction following. The repo documents the sizes and capabilities (e.g., 7B, 13B, 34B) and highlights features like infilling and large input context to support real IDE workflows. It targets both general software synthesis and language-specific productivity, offering strong performance among open models at release time. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Streamer-Sales

    Streamer-Sales

    LLM Large Model of Selling Anchor

    ...By analyzing product characteristics and marketing information, the model can produce engaging explanations that emphasize benefits, features, and emotional appeal to encourage viewers to make purchasing decisions. The system integrates multiple AI technologies including retrieval-augmented generation to incorporate product knowledge, speech synthesis to convert generated scripts into voice output, and digital human generation to create virtual hosts. It also supports automatic speech recognition and agent-based tools that can retrieve additional information such as logistics or product details during live sessions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    MetaScreener

    MetaScreener

    AI-powered tool for efficient abstract and PDF screening

    MetaScreener is an open-source AI-assisted tool designed to streamline the screening process in systematic literature reviews and academic research workflows. The system helps researchers analyze large collections of academic abstracts and research papers to determine which studies are relevant for inclusion in evidence synthesis projects. Instead of manually reviewing hundreds or thousands of documents, researchers can use MetaScreener to apply machine learning techniques that assist with...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    Synthetic Data Generator is an open-source framework designed to generate high-quality synthetic tabular datasets that replicate the statistical characteristics of real data while avoiding privacy risks. The platform enables developers and data scientists to create artificial datasets that preserve important relationships between variables without containing sensitive personal information. This makes the generated data suitable for tasks such as machine learning model training, testing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 10
    DeepSearcher

    DeepSearcher

    Open Source Deep Research Alternative to Reason and Search

    DeepSearcher is an open-source “deep research” style system that combines retrieval with evaluation and reasoning to answer complex questions using private or enterprise data. It is designed around the idea that high-quality answers require more than top-k retrieval, so it orchestrates multi-step search, evidence collection, and synthesis into a comprehensive response. The project integrates with vector databases (including Milvus and related options) so organizations can index internal...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    CodeGeeX

    CodeGeeX

    CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)

    CodeGeeX is a large-scale multilingual code generation model with 13 billion parameters, trained on 850B tokens across more than 20 programming languages. Developed with MindSpore and later made PyTorch-compatible, it is capable of multilingual code generation, cross-lingual code translation, code completion, summarization, and explanation. It has been benchmarked on HumanEval-X, a multilingual program synthesis benchmark introduced alongside the model, and achieves state-of-the-art...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio. It uses multi-task training over many different audio tasks (30+), and achieves strong multi-benchmarks performance...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    LLM Council

    LLM Council

    LLM Council works together to answer your hardest questions

    LLM Council is a creative open-source web application by Andrej Karpathy that lets you consult multiple large language models together to answer questions more reliably than querying a single model. Instead of relying on one provider, this application sends your query simultaneously to several LLMs supported via OpenRouter, collects each model’s independent response, and then orchestrates a multi-stage evaluation where the models critique and rank each other’s outputs anonymously. After this...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation. The project explores how scaling autoregressive models and improving image...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Magicoder

    Magicoder

    Empowering Code Generation with OSS-Instruct

    Magicoder is an open-source family of large language models designed specifically for code generation and software development tasks. The project focuses on improving the quality and diversity of code generation by training models with a novel dataset construction approach known as OSS-Instruct. This technique uses open-source code repositories as a foundation for generating more realistic and diverse instruction datasets for training language models. By grounding training data in real...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Tongyi DeepResearch

    Tongyi DeepResearch

    Tongyi Deep Research, the Leading Open-source Deep Research Agent

    DeepResearch (Tongyi DeepResearch) is an open-source “deep research agent” developed by Alibaba’s Tongyi Lab designed for long-horizon, information-seeking tasks. It’s built to act like a research agent: synthesizing, reasoning, retrieving information via the web and documents, and backing its outputs with evidence. The model is about 30.5 billion parameters in size, though at any given token only ~3.3B parameters are active. It uses a mix of synthetic data generation, fine-tuning and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB