A framework for real-life data science
Extensible, parallel implementations of t-SNE
Master the essential skills needed to recognize and solve problems
The fastest way to build data pipelines
Detecting silent model failure. NannyML estimates performance
Training data (data labeling, annotation, workflow) for all data types
A modular, primitive-first, python-first PyTorch library
A high performance implementation of HDBSCAN clustering
Uncover insights, surface problems, monitor, and fine tune your LLM
Label Studio is a multi-type data labeling and annotation tool
Python Optimal Transport
The RF and reverse engineering framework for everyone
A Python Automated Machine Learning tool that optimizes ML
Data science on data without acquiring a copy
Functional Machine Learning
Python package for AutoML on Tabular Data with Feature Engineering
MLOps simplified. From ML Pipeline ⇨ Data Product without the hassle
machine learning tutorials (mainly in Python3)
Helps scientists define testable, modular, self-documenting dataflow
A curated list of data mining papers about fraud detection
Topic Modelling for Humans
AutoGluon: AutoML for Image, Text, and Tabular Data
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine
Effortless data labeling with AI support from Segment Anything