StarVector is a foundation model for SVG generation
A text-to-speech, speech-to-text and speech-to-speech library
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training
Spark-TTS Inference Code
A high-performance ML model serving framework, offers dynamic batching
Python tool for browser-based interactive data apps in one file
A lightweight text-to-speech model with zero-shot voice cloning
Interface for OuteTTS models
A personal context-agent that learns how you work
Pokee Deep Research Model Open Source Repo
🐈 nanobot: The Ultra-Lightweight Clawdbot / OpenClaw
Retrieval Augmented Generation (RAG) framework
GPT4V-level open-source multi-modal model based on Llama3-8B
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
Build Vision Agents quickly with any model or video provider
ChatGLM2-6B: An Open Bilingual Chat LLM
Build cross-modal and multimodal applications on the cloud
Get started w/ building Fullstack Agents using Gemini 2.5 & LangGraph
A TTS that fits in your CPU (and pocket)
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Private chat with local GPT with document, images, video, etc.
Qwen3-omni is a natively end-to-end, omni-modal LLM
Capable of understanding text, audio, vision, video
Repo of Qwen2-Audio chat & pretrained large audio language model