Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure content into rich datasets tailored for downstream LLM training needs. The system includes automated question-generation capabilities, hierarchical label trees, and answer generation pipelines that use LLM APIs to produce coherent paired data with customizable templates. Beyond dataset creation, Easy-dataset also provides a built-in evaluation system with model testing and blind-test features, helping teams validate model performance using curated test sets.

Features

  • Document ingest and intelligent parsing (PDF, DOCX, more)
  • Automatic dataset generation for fine-tuning
  • Question and answer generation using LLMs
  • Built-in model evaluation and testing systems
  • Multiple export formats (JSON/JSONL, Hugging Face)
  • Support for diverse dataset types (dialogue, image QA)

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Easy DataSet

Easy DataSet Web Site

Other Useful Business Software
The ultimate digital workspace. Icon
The ultimate digital workspace.

Axero Intranet is an award-winning intranet and employee experience platform.

Hundreds of companies and millions of employees use Axero’s intranet software to communicate, collaborate, manage tasks and events, organize content, and develop their company culture.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Easy DataSet!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

JavaScript

Related Categories

JavaScript Large Language Models (LLM)

Registered

2026-02-04