Official Python inference and LoRA trainer package
Dealing with all unstructured data, such as reverse image search
Free, high-quality text-to-speech API endpoint to replace OpenAI
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Python project template generator with batteries included
"Big Model" trains a visual multimodal VLM with 26M parameters
Implementation of "MobileCLIP" CVPR 2024
A Systematic Framework for Interactive World Modeling
Implementation of 'lightweight' GAN, proposed in ICLR 2021
Parse files for optimal RAG
Multimodal embedding and reranking models built on Qwen3-VL
Unified Multimodal Understanding and Generation Models
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
State-of-the-art diffusion models for image and audio generation
Open-Sora: Democratizing Efficient Video Production for All
Fast-stable-diffusion + DreamBooth
Sample code and notebooks for Generative AI on Google Cloud
Multimodal AI chat app with dynamic conversation routing
Virtual AI anchor that combines state-of-the-art technology
Phi-3.5 for Mac: Locally-run Vision and Language Models
The data structure for multimodal data
Large-language-model & vision-language-model based on Linear Attention
Official implementation of DreamCraft3D
Open source personal AI Assistant for Linux, Windows and Mac
Extract one time password (OTP) secrets from QR codes