Qwen3-omni is a natively end-to-end, omni-modal LLM
High-Resolution Image Synthesis with Latent Diffusion Models
Code for running inference and finetuning with SAM 3 model
An open source implementation of CLIP
AutoGluon: AutoML for Image, Text, and Tabular Data
Director, Screenwriter, Producer, and Video Generator All-in-One
Chinese and English multimodal conversational language model
Tensor search for humans
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
NLP Cloud serves high performance pre-trained or custom models for NER
21 Lessons, Get Started Building with Generative AI
Deep Research framework, combining language models with tools
Accurate × Fast × Comprehensive
An Open Source text-to-speech system built by inverting Whisper
Integrate ChatGPT into your own discord bot
Multilingual sentence & image embeddings with BERT
ComfyUI wrapper nodes for WanVideo and related models
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Official Python inference and LoRA trainer package
Dealing with all unstructured data, such as reverse image search
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Implementation of 'lightweight' GAN, proposed in ICLR 2021
"Big Model" trains a visual multimodal VLM with 26M parameters
Implementation of "MobileCLIP" CVPR 2024
A Systematic Framework for Interactive World Modeling