A library for converting HTML into PDFs using ReportLab
Collection of Gemma 3 variants that are trained for performance
Automatically find issues in image datasets
Fast image augmentation library and an easy-to-use wrapper
Easily compute clip embeddings and build a clip retrieval system
A Pioneering Open-Source Alternative to GPT-4o
ImageBind One Embedding Space to Bind Them All
Diffusion Transformer with Fine-Grained Chinese Understanding
Usable Implementation of "Bootstrap Your Own Latent" self-supervised
Lets make video diffusion practical
Gracefully face hCaptcha challenge with multimodal llms
AutoGluon: AutoML for Image, Text, and Tabular Data
An Inkscape extension: Latex/Tex editor for Inkscape
Recovering the Visual Space from Any Views
Capable of understanding text, audio, vision, video
GPT4V-level open-source multi-modal model based on Llama3-8B
RGBD video generation model conditioned on camera input
State-of-the-art diffusion models for image and audio generation
An unsupervised and free tool for image and video dataset analysis
AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim
Python data, Leaflet.js maps
YOLOv5 is the world's most loved vision AI
Contexts Optical Compression
21 Lessons, Get Started Building with Generative AI
A Unified Framework for Image Customization