Evaluate and compare LLM outputs, catch regressions, improve prompts
A gallery that showcases on-device ML/GenAI use cases
Local AI coding agent CLI with multi-agent orchestration tools
Evaluation and Tracking for LLM Experiments
The smallest, simplest JavaScript pixel-level image comparison library
A multi-platform desktop application to evaluate and compare LLM
An easy-to-use & supercharged open-source experiment tracker
Framework and no-code GUI for fine-tuning LLMs
An open-source, modern-design AI training tracking and visualization
A reinforcement learning package for Julia
A Claude skill that writes the accurate prompts for any AI tool
Open source codebase for Scale Agentex
Test and evaluate LLMs and model configurations
Lightweight Python library for adding real-time multi-object tracking
https://github.com/iterative/vscode-dvc
Open source platform for the machine learning lifecycle
Interactively analyze ML models to understand their behavior
A powerful Zotero AI and MCP plugin with ChatGPT, Gemini 3.1, Claude
The repository provides code for running inference with SAM 2
Debug, evaluate, and monitor your LLMapps, RAG systems, and agentic AI
Advanced RAG cookbooks for building accurate LLM applications
Audit, track usage, and compare your Claude Code skills
Tool for visualizing and tracking your machine learning experiments
Deploy and share agents with open infrastructure
Advanced techniques for RAG systems