Multimodal embedding and reranking models built on Qwen3-VL
Free, high-quality text-to-speech API endpoint to replace OpenAI
Parse files for optimal RAG
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
State-of-the-art diffusion models for image and audio generation
Open-Sora: Democratizing Efficient Video Production for All
Fast-stable-diffusion + DreamBooth
Sample code and notebooks for Generative AI on Google Cloud
Unified Multimodal Understanding and Generation Models
Virtual AI anchor that combines state-of-the-art technology
Phi-3.5 for Mac: Locally-run Vision and Language Models
The data structure for multimodal data
Large-language-model & vision-language-model based on Linear Attention
Official implementation of DreamCraft3D
Open source personal AI Assistant for Linux, Windows and Mac
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Open source libraries and APIs to build custom preprocessing pipelines
Pretrained model hub for Keras 3
Framework for building neural networks
Generate Any 3D Scene in Seconds
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Simplest working implementation of Stylegan2
Open source demo platform where you can easily showcase your AI models