Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available. Datasets naturally frees the user from RAM memory limitation, all datasets are memory-mapped using an efficient zero-serialization cost backend (Apache Arrow). Smart caching: never wait for your data to process several times.

Features

  • Learn the basics and become familiar with loading, accessing, and processing a dataset
  • Practical guides to help you achieve a specific goal.
  • Solve real-world problems
  • Technical descriptions of how Datasets classes and methods work
  • High-level explanations for building a better understanding about important topics such as the underlying data format
  • Find your dataset today on the Hugging Face Hub

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Datasets

Datasets Web Site

Other Useful Business Software
The Most Powerful Software Platform for EHSQ and ESG Management Icon
The Most Powerful Software Platform for EHSQ and ESG Management

Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Datasets!