Latitude
Latitude is an open-source prompt engineering platform designed to help product teams build, evaluate, and deploy AI models efficiently. It allows users to import and manage prompts at scale, refine them with real or synthetic data, and track the performance of AI models using LLM-as-judge or human-in-the-loop evaluations. With powerful tools for dataset management and automatic logging, Latitude simplifies the process of fine-tuning models and improving AI performance, making it an essential platform for businesses focused on deploying high-quality AI applications.
Learn more
Vivgrid
Vivgrid is a development platform for AI agents that emphasizes observability, debugging, safety, and global deployment infrastructure. It gives you full visibility into agent behavior, logging prompts, memory fetches, tool usage, and reasoning chains, letting developers trace where things break or deviate. You can test, evaluate, and enforce safety policies (like refusal rules or filters), and incorporate human-in-the-loop checks before going live. Vivgrid supports the orchestration of multi-agent systems with stateful memory, routing tasks dynamically across agent workflows. On the deployment side, it operates a globally distributed inference network to ensure low-latency (sub-50 ms) execution and exposes metrics like latency, cost, and usage in real time. It aims to simplify shipping resilient AI systems by combining debugging, evaluation, safety, and deployment into one stack, so you're not stitching together observability, infrastructure, and orchestration.
Learn more
Weavel
Meet Ape, the first AI prompt engineer. Equipped with tracing, dataset curation, batch testing, and evals. Ape achieves an impressive 93% on the GSM8K benchmark, surpassing both DSPy (86%) and base LLMs (70%). Continuously optimize prompts using real-world data. Prevent performance regression with CI/CD integration. Human-in-the-loop with scoring and feedback. Ape works with the Weavel SDK to automatically log and add LLM generations to your dataset as you use your application. This enables seamless integration and continuous improvement specific to your use case. Ape auto-generates evaluation code and uses LLMs as impartial judges for complex tasks, streamlining your assessment process and ensuring accurate, nuanced performance metrics. Ape is reliable, as it works with your guidance and feedback. Feed in scores and tips to help Ape improve. Equipped with logging, testing, and evaluation for LLM applications.
Learn more
Respan
Respan is a self-driving observability and evaluation platform built specifically for AI agents. It enables teams to trace full execution flows, including messages, tool calls, routing decisions, memory usage, and outcomes. The platform connects observability, evaluations, and optimization into a continuous improvement loop. Metric-first evaluations allow teams to define performance standards such as accuracy, cost, reliability, and safety. Respan also includes capability and regression testing to protect stable behaviors while improving new ones. An AI-powered evaluation agent analyzes failures, identifies root causes, and recommends next steps automatically. With compliance certifications including ISO 27001, SOC 2, GDPR, and HIPAA, Respan supports secure, large-scale AI deployments across industries.
Learn more