Selene 1
Atla's Selene 1 API offers state-of-the-art AI evaluation models, enabling developers to define custom evaluation criteria and obtain precise judgments on their AI applications' performance. Selene outperforms frontier models on commonly used evaluation benchmarks, ensuring accurate and reliable assessments. Users can customize evaluations to their specific use cases through the Alignment Platform, allowing for fine-grained analysis and tailored scoring formats. The API provides actionable critiques alongside accurate evaluation scores, facilitating seamless integration into existing workflows. Pre-built metrics, such as relevance, correctness, helpfulness, faithfulness, logical coherence, and conciseness, are available to address common evaluation scenarios, including detecting hallucinations in retrieval-augmented generation applications or comparing outputs to ground truth data.
Learn more
GPT-5.2 Thinking
GPT-5.2 Thinking is the highest-capability configuration in OpenAI’s GPT-5.2 model family, engineered for deep, expert-level reasoning, complex task execution, and advanced problem solving across long contexts and professional domains. Built on the foundational GPT-5.2 architecture with improvements in grounding, stability, and reasoning quality, this variant applies more compute and reasoning effort to generate responses that are more accurate, structured, and contextually rich when handling highly intricate workflows, multi-step analysis, and domain-specific challenges. GPT-5.2 Thinking excels at tasks that require sustained logical coherence, such as detailed research synthesis, advanced coding and debugging, complex data interpretation, strategic planning, and sophisticated technical writing, and it outperforms lighter variants on benchmarks that test professional skills and deep comprehension.
Learn more
DataGemma
DataGemma represents a pioneering effort by Google to enhance the accuracy and reliability of large language models (LLMs) when dealing with statistical and numerical data. Launched as a set of open models, DataGemma leverages Google's Data Commons, a vast repository of public statistical data—to ground its responses in real-world facts. This initiative employs two innovative approaches: Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation (RAG). The RIG method integrates real-time data checks during the generation process to ensure factual accuracy, while RAG retrieves relevant information before generating responses, thereby reducing the likelihood of AI hallucinations. By doing so, DataGemma aims to provide users with more trustworthy and factually grounded answers, marking a significant step towards mitigating the issue of misinformation in AI-generated content.
Learn more
Ferret
An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response.
Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.
Learn more