Document (PDF, Word, PPTX ...) extraction and parse API
Hypernetworks that adapt LLMs for specific benchmark tasks
Practical productivity tools for Claude Code, Codex-CLI
Text and image to video generation: CogVideoX and CogVideo
Awesome multilingual OCR toolkits based on PaddlePaddle
Generate audiobooks from EPUBs, PDFs and text with captions
Qwen3-TTS is an open-source series of TTS models
A TTS that fits in your CPU (and pocket)
A robust, efficient, low-latency speech-to-text library
Chat with it via text and voice
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Reading book source
Framework for building real-time voice and multimodal AI agents
A Web UI for easy subtitle using whisper model
World's first open-source, agentic video production system
Enhances Tesseract OCR output using LLMs (local or API)
Deep Research framework, combining language models with tools
A fast TTS architecture with conditional flow matching
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Offline Text To Speech synthesis for python
Audiocraft is a library for audio processing and generation
Official inference repo for FLUX.1 models
Converts text to speech in realtime
Context-aware desktop AI assistant that understands screen content
Persian NLP Toolkit