Robust Speech Recognition via Large-Scale Weak Supervision
Provides code for running inference with the SegmentAnything Model
A Foundation Model for the Language of Financial Markets
Accurate × Fast × Comprehensive
Industrial-level controllable zero-shot text-to-speech system
End-to-end speech processing toolkit
Pretrained time-series foundation model developed by Google Research
TorchMultimodal is a PyTorch library
AV1 Image File Format Specification - ISO-BMFF/HEIF derivative
A Conversational Speech Generation Model
Blazing fast and correct x86/x64 disassembler, assembler, decoder, etc
Singing Voice Synthesis via Shallow Diffusion Mechanism
Code release for "Masked-attention Mask Transformer
PyTorch implementation of MAE
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Adversarial Latent Autoencoders
Toolkit for efficient experimentation with Speech Recognition
A general-purpose encoder-decoder framework for Tensorflow
toneDetect