Document (PDF, Word, PPTX ...) extraction and parse API
Hypernetworks that adapt LLMs for specific benchmark tasks
Practical productivity tools for Claude Code, Codex-CLI
Open source plain text editor designed for writing novels
Qwen3-TTS is an open-source series of TTS models
Text and image to video generation: CogVideoX and CogVideo
Awesome multilingual OCR toolkits based on PaddlePaddle
Generate audiobooks from EPUBs, PDFs and text with captions
Chat with it via text and voice
A TTS that fits in your CPU (and pocket)
A robust, efficient, low-latency speech-to-text library
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Python bindings for MuPDF's rendering library.
PDF to Markdown with vision models
A Web UI for easy subtitle using whisper model
Reading book source
Tools to ease the creation of snippets, syntax definitions, etc.
A fast TTS architecture with conditional flow matching
Audiocraft is a library for audio processing and generation
Converts text to speech in realtime
Enhances Tesseract OCR output using LLMs (local or API)
Framework for building real-time voice and multimodal AI agents
Offline Text To Speech synthesis for python
Edit PDF files with Nano Banana
Deep Research framework, combining language models with tools