Framework for building realtime multimodal voice AI agents apps
Library for OCR-related tasks powered by Deep Learning
Easy-to-use and powerful NLP library with Awesome model zoo
Implementation of Phenaki Video, which uses Mask GIT
State-of-the-art TTS model under 25MB
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Open source no-code system for text annotation and building of text
Voice Recognition to Text Tool
Deep Research framework, combining language models with tools
AI-powered tool for generating, optimizing, and translating subtitles
Claude Code skill implementing Manus-style persistent planning
Speech-AI-Forge is a project developed around TTS generation model
The simplest, fastest repository for training/finetuning models
Free, high-quality text-to-speech API endpoint to replace OpenAI
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
A nearly-live implementation of OpenAI's Whisper
Automated translation solution for visual novels
Accurate × Fast × Comprehensive
High-Quality Voice Cloning TTS for 600+ Languages
An Open Source text-to-speech system built by inverting Whisper
Spark-TTS Inference Code
Industrial-level controllable zero-shot text-to-speech system
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Unifying 3D Mesh Generation with Language Models
A sound cloning tool with a web interface, using your voice