Showing 34 open source projects for "decoder"

View related business solutions
  • Loan management software that makes it easy. Icon
    Loan management software that makes it easy.

    Ideal for lending professionals who are looking for a feature rich loan management system

    Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.
    Learn More
  • Field Sales+ for MS Dynamics 365 and Salesforce Icon
    Field Sales+ for MS Dynamics 365 and Salesforce

    Maximize your sales performance on the go.

    Bring Dynamics 365 and Salesforce wherever you go with Resco’s solution. With powerful offline features and reliable data syncing, your team can access CRM data on mobile devices anytime, anywhere. This saves time, cuts errors, and speeds up customer visits.
    Learn More
  • 1
    Pytorch-toolbelt

    Pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

    ...Extras for Catalyst library (Visualization of batch predictions, additional metrics). By design, both encoder and decoder produces a list of tensors, from fine (high-resolution, indexed 0) to coarse (low-resolution) feature maps. Access to all intermediate feature maps is beneficial if you want to apply deep supervision losses on them or encoder-decoder of object detection task.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    ...A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.
    Downloads: 80 This Week
    Last Update:
    See Project
  • 3
    Step1X-Edit

    Step1X-Edit

    A SOTA open-source image editing model

    Step1X-Edit is a state-of-the-art open-source image editing model/framework that uses a multimodal large language model (LLM) together with a diffusion-based image decoder to let users edit images simply via natural-language instructions plus a reference image. You supply an existing image and a textual command — e.g. “add a ruby pendant on the girl’s neck” or “make the background a sunset over mountains” — and the model interprets the instruction, computes a latent embedding combining the image content and user intent, then decodes a new image implementing the edit. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    GLM-OCR

    GLM-OCR

    Accurate × Fast × Comprehensive

    GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 5
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    x-transformers

    x-transformers

    A simple but complete full-attention transformer

    A simple but complete full-attention transformer with a set of promising experimental features from various papers. Proposes adding learned memory key/values prior to attending. They were able to remove feedforwards altogether and attain a similar performance to the original transformers. I have found that keeping the feedforwards and adding the memory key/values leads to even better performance. Proposes adding learned tokens, akin to CLS tokens, named memory tokens, that is passed through...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    ...It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice cloning — meaning it can mimic a target speaker’s voice from a short reference sample — making it versatile for multi-voice uses. Compared to many open-source TTS tools, IndexTTS emphasizes efficiency and controllability: it offers faster inference, simpler training pipelines, and controllable speech parameters (like duration, pitch, and prosody), which is critical for production use.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    LLM Foundry

    LLM Foundry

    LLM training code for MosaicML foundation models

    Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Large language models (LLMs) are changing the world, but for those outside well-resourced industry labs, it can be extremely difficult to train and deploy...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ...ESPnet provides many ready-to-run recipes for popular academic benchmarks, making it straightforward to reproduce published results or serve as baselines for new research. The toolkit also hosts numerous pretrained models and example configs, ranging from Transformer and Conformer architectures to various attention-based encoder-decoder models.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Data management solutions for confident marketing Icon
    Data management solutions for confident marketing

    For companies wanting a complete Data Management solution that is native to Salesforce

    Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.
    Learn More
  • 10
    FireRedASR

    FireRedASR

    Open-source industrial-grade ASR models

    ...The project includes multiple model variants to meet different application needs, such as high-accuracy end-to-end interaction using an encoder-adapter-LLM framework and efficient real-time recognition using attention-based encoder-decoder architectures, giving developers flexibility in balancing performance and resource constraints. FireRedASR not only excels in traditional speech recognition tasks but also demonstrates strong capability in challenging scenarios like singing lyrics recognition, where accurate transcription is often difficult for conventional models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TimesFM

    TimesFM

    Pretrained time-series foundation model developed by Google Research

    TimesFM is a pretrained time-series foundation model from Google Research built for forecasting tasks, designed to generalize across many domains without requiring extensive per-dataset retraining. It provides a decoder-only model approach to forecasting, aiming for strong performance even in zero-shot or low-data settings where traditional models often struggle. The project includes code and an inference API intended to make it practical to run forecasts programmatically, with options to use different backends such as Torch or Flax depending on your environment and performance needs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Bard API

    Bard API

    The unofficial python package that returns response of Google Bard

    The Python package returns a response of Google Bard through the value of the cookie. This package is designed for application to the Python package ExceptNotifier and Co-Coder. Please note that the bardapi is not a free service, but rather a tool provided to assist developers with testing certain functionalities due to the delayed development and release of Google Bard's API. It has been designed with a lightweight structure that can easily adapt to the emergence of an official API....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    DALL-E 2 - Pytorch

    DALL-E 2 - Pytorch

    Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis

    ...To train CLIP, you can either use x-clip package, or join the LAION discord, where a lot of replication efforts are already underway. Then, you will need to train the decoder, which learns to generate images based on the image embedding coming from the trained CLIP.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 15
    ConsistencyDecoder

    ConsistencyDecoder

    Consistency Distilled Diff VAE

    ConsistencyDecoder is a Python package from OpenAI that introduces an improved decoding method for variational autoencoders (VAEs) used in Stable Diffusion pipelines. Instead of relying solely on the standard GAN or VAE decoder, this approach leverages a Consistency Distilled Diff VAE, designed to produce higher-quality and more stable outputs from encoded latents. The project provides a simple API for encoding with a Stable Diffusion VAE and decoding using the new consistency model, allowing for side-by-side comparisons with traditional decoders. It demonstrates how consistency models can enhance visual fidelity while maintaining efficiency, reducing artifacts common in GAN-decoded outputs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Basaran

    Basaran

    Basaran, an open-source alternative to the OpenAI text completion API

    ...The open source community will eventually witness the Stable Diffusion moment for large language models (LLMs), and Basaran allows you to replace OpenAI's service with the latest open-source model to power your application without modifying a single line of code. Stream generation using various decoding strategies. Support both decoder-only and encoder-decoder models. Detokenizer that handles surrogates and whitespace. Multi-GPU support with optional 8-bit quantization. Real-time partial progress using server-sent events. Compatible with OpenAI API and client libraries. Comes with a fancy web-based playground. Docker images are available on Docker Hub and GitHub Packages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    OpenNMT-tf

    OpenNMT-tf

    Neural machine translation and sequence learning using TensorFlow

    ...Models are described with code to allow training custom architectures and overriding default behavior. For example, the following instance defines a sequence-to-sequence model with 2 concatenated input features, a self-attentional encoder, and an attentional RNN decoder sharing its input and output embeddings. Sequence to sequence models can be trained with guided alignment and alignment information are returned as part of the translation API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    NÜWA - Pytorch

    NÜWA - Pytorch

    Implementation of NÜWA, attention network for text to video synthesis

    Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way to condition the video generation based on segmentation mask(s). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Karlo

    Karlo

    Text-conditional image generation model based on OpenAI's unCLIP

    Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details only in the small number of denoising steps. We train all components from scratch on 115M image-text pairs including COYO-100M, CC3M, and CC12M. In the case of Prior and Decoder, we use ViT-L/14 provided by OpenAI’s CLIP repository. Unlike the original implementation of unCLIP, we replace the trainable transformer in the decoder into the text encoder in ViT-L/14 for efficiency. In the case of the SR module, we first train the model using the DDPM objective in 1M steps, followed by additional 234K steps to fine-tune the additional component.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    CPT

    CPT

    CPT: A Pre-Trained Unbalanced Transformer

    A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. We replace the old BERT vocabulary with a larger one of size 51271 built from the training data, in which we 1) add missing 6800+ Chinese characters (most of them are traditional Chinese characters); 2) remove redundant tokens (e.g. Chinese character tokens with ## prefix); 3) add some English tokens to reduce OOV. Position Embeddings We extend the max_position_embeddings from 512 to 1024. We...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    ...The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.
    Downloads: 39 This Week
    Last Update:
    See Project
  • 22
    LaMDA-pytorch

    LaMDA-pytorch

    Open-source pre-training implementation of Google's LaMDA in PyTorch

    Open-source pre-training implementation of Google's LaMDA research paper in PyTorch. The totally not sentient AI. This repository will cover the 2B parameter implementation of the pre-training architecture as that is likely what most can afford to train. You can review Google's latest blog post from 2022 which details LaMDA here. You can also view their previous blog post from 2021 on the model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    Mask2Former is a unified segmentation architecture that handles semantic, instance, and panoptic segmentation with one model and one training recipe. Its core idea is to cast segmentation as mask classification: a transformer decoder predicts a set of mask queries, each with an associated class score, eliminating the need for task-specific heads. A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial support. This leads to accurate masks with sharp boundaries and strong small-object performance while remaining efficient on high-resolution inputs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Deep learning time series forecasting

    Deep learning time series forecasting

    Deep learning PyTorch library for time series forecasting

    ...Historically, this repository provided open-source benchmarks and codes for flash flood and river flow forecasting. Full transformer (SimpleTransformer in model_dict): The full original transformer with all 8 encoder and decoder blocks. Requires passing the target in at inference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MAE (Masked Autoencoders)

    MAE (Masked Autoencoders)

    PyTorch implementation of MAE

    ...It trains a Vision Transformer (ViT) by randomly masking a high percentage of image patches (typically 75%) and reconstructing the missing content from the remaining visible patches. This forces the model to learn semantic structure and global context without supervision. The encoder processes only the visible patches, while a lightweight decoder reconstructs the full image—making pretraining computationally efficient. After pretraining, the encoder serves as a powerful backbone for downstream tasks like image classification, segmentation, and detection, achieving top performance with minimal fine-tuning. The repository provides pretrained models, fine-tuning scripts, evaluation protocols, and visualization tools for reconstruction quality and learned features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB