Synthesizing and manipulating 2048x1024 images with conditional GANs
A Systematic Framework for Interactive World Modeling
Build and run agents you can see, understand and trust
Repo of Qwen2-Audio chat & pretrained large audio language model
Refractoring ChatBot+LLM, Gpt-3.5-turbo, ChatGPT Bot/Voice Assistant
AI assistant based on large models that can actively think and plan
AI-powered tool for efficient abstract and PDF screening
A TTS model capable of generating ultra-realistic dialogue
An Open Source text-to-speech system built by inverting Whisper
A fast TTS architecture with conditional flow matching
SDG is a specialized framework
A generative speech model for daily dialogue
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
Inference code for CodeLlama models
A specialized Claude Code workspace for creating long-form
Context-aware desktop AI assistant that understands screen content
Run a full local LLM stack with one command using Docker
AI-Researcher: Autonomous Scientific Innovation
Framework for building realtime multimodal voice AI agents apps
Flowly is 100x faster than OpenClaw
PyTorch3D is FAIR's library of reusable components for deep learning
AI Slack bot for reading, summarizing, and chatting with content
MARS5 speech model (TTS) from CAMB.AI
Management of Yandex Station and other smart home devices
SoTA open-source TTS