Document Image Parsing via Heterogeneous Anchor Prompting”
An Open Source text-to-speech system built by inverting Whisper
The official Python library for the OpenAI API
Build multimodal language agents for fast prototype and production
Generate audiobooks from e-books
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Official repository for LTX-Video
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Build AI-powered semantic search applications
Get your documents ready for gen AI
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Spring AI Alibaba examples for building and testing AI apps
Private AI platform for agents, enterprise search and RAG pipelines
Qwen3-TTS is an open-source series of TTS models
AI-powered tool for generating, optimizing, and translating subtitles
High-resolution models for human tasks
Build Vision Agents quickly with any model or video provider
A Telegram RSS bot that cares about your reading experience
A lightweight text-to-speech model with zero-shot voice cloning
StreamSpeech is a seamless model for offline speech recognition
Official PyTorch Implementation
Instill Core is a full-stack AI infrastructure tool for data
State-of-the-art diffusion models for image and audio generation
Improve human sleep through scientifically
An AI for Music Generation