Showing 5 open source projects for "decoder"

View related business solutions
  • More Bookings. Better Experience. Icon
    More Bookings. Better Experience.

    For tour and activity providers

    The all-in-one solution built to help you stay organised and get more bookings with thousands of connections to online travel agencies (OTAs), resellers and suppliers.
    Learn More
  • Fax.Cloud delivers encrypted, point-to-point faxing with guaranteed delivery and built-in audit trails Icon
    Fax.Cloud delivers encrypted, point-to-point faxing with guaranteed delivery and built-in audit trails

    For organizations in regulated industries needing a solution to replace traditional fax infrastructure and integrate with email or online portals

    Unlike email or file-sharing tools, Fax.Cloud doesn’t bounce around the internet, exposed and vulnerable. It’s direct, encrypted, and verified. You get delivery confirmation, audit trails, and peace of mind, without the spam filters, metadata leaks, or cyber threats.
    Learn More
  • 1
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    ...It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ...ESPnet provides many ready-to-run recipes for popular academic benchmarks, making it straightforward to reproduce published results or serve as baselines for new research. The toolkit also hosts numerous pretrained models and example configs, ranging from Transformer and Conformer architectures to various attention-based encoder-decoder models.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    ...The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.
    Downloads: 43 This Week
    Last Update:
    See Project
  • Find out just how much your login box can do for your customer | Auth0 Icon
    Find out just how much your login box can do for your customer | Auth0

    With over 53 social login options, you can fast-track the signup and login experience for users.

    From improving customer experience through seamless sign-on to making MFA as easy as a click of a button – your login box must find the right balance between user convenience, privacy and security.
    Sign up
  • 5
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    OpenSeq2Seq is a TensorFlow-based toolkit for efficient experimentation with sequence-to-sequence models across speech and NLP tasks. Its core goal is to give researchers a flexible, modular framework for building and training encoder–decoder architectures while fully leveraging distributed and mixed-precision training. The toolkit includes ready-made models for neural machine translation, automatic speech recognition, speech synthesis, language modeling, and additional NLP tasks such as sentiment analysis. It supports multi-GPU and multi-node data-parallel training, and integrates with Horovod to scale out across large GPU clusters. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB