Robust Speech Recognition via Large-Scale Weak Supervision
Offline inference engine for art, real-time voice conversations
Official MiniMax Model Context Protocol (MCP) server
Chemcrow
Speakr is a personal, self-hosted web application
Build and run agents you can see, understand and trust
Synthesizing and manipulating 2048x1024 images with conditional GANs
Repo of Qwen2-Audio chat & pretrained large audio language model
A Systematic Framework for Interactive World Modeling
Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant
AI assistant based on large models that can actively think and plan
A TTS model capable of generating ultra-realistic dialogue
An Open Source text-to-speech system built by inverting Whisper
AI-powered tool for efficient abstract and PDF screening
A fast TTS architecture with conditional flow matching
SDG is a specialized framework
A generative speech model for daily dialogue
A specialized Claude Code workspace for creating long-form
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Inference code for CodeLlama models
Sharp Monocular View Synthesis in Less Than a Second
Context-aware desktop AI assistant that understands screen content
Run a full local LLM stack with one command using Docker
Framework for building realtime multimodal voice AI agents apps
AI-Researcher: Autonomous Scientific Innovation