Faster Whisper transcription with CTranslate2
Ready-to-use OCR with 80+ supported languages
Enhances Tesseract OCR output using LLMs (local or API)
Conversational voice AI agents
Voice Recognition to Text Tool
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
OCR expert VLM powered by Hunyuan's native multimodal architecture
StreamSpeech is a seamless model for offline speech recognition
Repo of Qwen2-Audio chat & pretrained large audio language model
Capable of understanding text, audio, vision, video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Chat & pretrained large vision language model
A framework to enable multimodal models to operate a computer
Visual Causal Flow
Framework for building real-time voice and multimodal AI agents
A very simple framework for state-of-the-art NLP
Advanced NLP with spaCy: A free online course
Qwen3-omni is a natively end-to-end, omni-modal LLM
Real-time voice interactive digital human
Fast multimodal LLM for real-time voice interaction and AI apps
High-Resolution Image Synthesis with Latent Diffusion Models
Shared repository for open-sourced projects from the Google AI Lang
Qwen3-ASR is an open-source series of ASR models
Models for the spaCy Natural Language Processing (NLP) library
Pre-trained Deep Learning models and demos