Framework for building realtime multimodal voice AI agents apps
Easy-to-use and powerful NLP library with Awesome model zoo
Library for OCR-related tasks powered by Deep Learning
Implementation of Phenaki Video, which uses Mask GIT
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Open source no-code system for text annotation and building of text
State-of-the-art TTS model under 25MB
Voice Recognition to Text Tool
Claude Code skill implementing Manus-style persistent planning
Speech-AI-Forge is a project developed around TTS generation model
AI-powered tool for generating, optimizing, and translating subtitles
The simplest, fastest repository for training/finetuning models
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Free, high-quality text-to-speech API endpoint to replace OpenAI
Deep Research framework, combining language models with tools
A nearly-live implementation of OpenAI's Whisper
Automated translation solution for visual novels
Accurate × Fast × Comprehensive
High-Quality Voice Cloning TTS for 600+ Languages
Spark-TTS Inference Code
An Open Source text-to-speech system built by inverting Whisper
Unifying 3D Mesh Generation with Language Models
Industrial-level controllable zero-shot text-to-speech system
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Controllable & emotion-expressive zero-shot TTS