The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). It also contains example analyses—spatial and temporal visualizations like maps, time-series plots, and hotspot detection—highlighting insights such as patterns of demand, peak times, and geospatial distributions. The repository is often used as a benchmark dataset and example for teaching, benchmarking, and demonstration purposes in the data science and urban analytics communities.

Features

  • Large-scale NYC taxi trip dataset with structured schemas
  • Data-cleaning and preprocessing scripts for handling raw trip data
  • Aggregation and summarization pipelines (hourly, daily, spatial bins)
  • Example notebooks/analyses for visualization, heatmaps, and demand patterns
  • Support for efficient I/O (Parquet/CSV handling, chunked reading)
  • Educational benchmark for urban analytics, modeling, and demonstration use

Project Samples

Project Activity

See All Activity >

Categories

Libraries

License

MIT License

Follow NYC Taxi Data

NYC Taxi Data Web Site

Other Useful Business Software
Premier Construction Software Icon
Premier Construction Software

Premier is a global leader in financial construction ERP software.

Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of NYC Taxi Data!

Additional Project Details

Programming Language

R

Related Categories

R Libraries

Registered

2025-10-01