Multilingual Document Layout Parsing in a Single Vision-Language Model
LongBench v2 and LongBench (ACL 25'&24')
A system for agentic LLM-powered data processing and ETL
Flexible Photo Recrafting While Preserving Your Identity
OpenAI swift async text to image for SwiftUI app using OpenAI
Large Multimodal Models for Video Understanding and Editing
Concatenate a directory full of files into a single prompt
Biomni: a general-purpose biomedical AI agent
Ollama JavaScript library
A memory upgrade for your coding agent
Implementation of "MobileCLIP" CVPR 2024
Dealing with all unstructured data, such as reverse image search
LLM-based agent for general purpose software engineering tasks
Official PyTorch Implementation
LLM training code for MosaicML foundation models
Open-source all-in-one platform for engineering AI products
WhatsApp tool for chatbots with advanced features
Advanced NLP with spaCy: A free online course
Stable Diffusion web UI
95% token savings. 155x faster queries. 16 languages
Framework for building neural networks
The visual feedback tool for agents
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
LLM-based Reinforcement Learning audio edit model
PPTAgent: Generating and Evaluating Presentations