The machine learning toolkit for time series analysis in Python
AI-powered Jupyter spreadsheet that converts workflows into Python
The open-source data curation platform for LLMs
Data and tools for generating and inspecting OLMo pre-training data
Extract schema, statistics and entities from datasets
Code for running inference and finetuning with SAM 3 model
Synthetic data curation for post-training and data extraction
A Model Context Protocol (MCP) server that enables AI assistants
Personal AI, On Personal Devices
Data Infrastructure providing an approach to multimodal AI workloads
Test Suites for validating ML models & data
Supercharge Your Model Training
Synthetic data generators for tabular and time-series data
Efficient Triton Kernels for LLM Training
High-level training, data augmentation, and utilities for Pytorch
⚡ Building applications with LLMs through composability ⚡
PandasAI is a Python library that integrates generative AI
The data structure for multimodal data
Parse files for optimal RAG
A curated list of data mining papers about fraud detection
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
Making Enterprise Data Intelligent and Responsive for AI
PaddlePaddle End-to-End Development Toolkit
Open-source deep-learning framework
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models